2023: Week 14 - Solution

Solution by Tom Prowse and you can download the workflow here


Now that we have completed our themed weeks for the start of the year, we now have Alfred Chan bringing us a challenge all about World Trade data. We are using publicly available data from the World Integrated Trade Solutions to examine the trade data across countries.


Step 1 - Input Country Data

First up we want to input the data for each of our countries using the 'Union multiple tables' option within the input step. 

We want to add all of the CSV files from the folder so we can select 'Union Multiple Tables' option from the Tables tab. Then we also need to ensure that Prep is reading in the first row as a header, therefore we need to make sure that is selected from the 'Settings' tab: 

Settings

Tables


Once we have input all of the tables we can make sure that the Reporter and Partner fields have the Country/Region data role so the table should look like this: 


Step 2 - Country Codes

Next we want to extract the country code from each of the File Paths. To so this we can split the field and remove the 2 occurrence after the '_': 
 

The automatic split should also work here, but we should now have a field with just the Country codes. We can rename this to Country Code and then remove the other fields relating to File Paths.



Step 3 - Imports and Exports

We now want to keep only the import and export data only. To do this we want to select Import and Export from the Indicator Type field and keep only these two values. 

Then we want to remove European Union, Occ.Pal.Terr, Other Asia, nes, and World from the Reporter field. 

Then finally remove '...', Special Categories, and World from the Partner field. 

Once we have filtered the table we can pivot the table to ensure that all the years are in the same column. 

For this we want to use a Columns to Rows pivot with all of the years within the pivoted fields: 


Then from here we can remove any null values from the Pivot Values and rename the Pivot Names to Year, then change that to a Date format. 

Finally we want to pivot the data again, but this time we want to include the indicators as separate fields. Therefore, we want to pivot the data using a Rows to Columns pivot where the indicators are in the pivoted fields and the sum of Pivot Values is the aggregate.



After transforming the shape of the data our table should now look like this:



Step 4 - Country Geo Data

We are now in a position where we can incorporate the geo data for each of the countries. First, we need to input the countries-codes data source and keep only the ISO3 CODE, LABEL EN, and geo_point_2d fields. 

From the geo_point_2d field we can split the lat and long out using the ',' as a separator. Then make sure we rename the fields to Latitude and Longitude so that the table now looks like this: 


We can now join this to our workflow using a left join so that all of the country information is included and we can join where Country Code = ISO3 Code. 


We now have the lat and long for the reporter countries but we also want to include them for the Partner countries as well. 

First we need to ensure that each partner has a country code. Therefore, we can go back in our workflow to the step where we split the country codes from the file path. From here we can create a new aggregate step, on a separate branch, where we can just return all of the Reporters (rename this to Country) and Country Codes. 

This table can now act as a lookup table to add the country codes to our Partner fields. We can join this using a left join where Partner = Country: 



Then finally we can join on the geo data for the partners using a left join where Country Code_Partner = ISO3 Code


After this final join we can rename and remove some fields so that our table is now ready to output. 


You can download all the output from here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text