Now that we have completed our themed weeks for the start of the year, we now have Alfred Chan bringing us a challenge all about World Trade data. We are using publicly available data from the World Integrated Trade Solutions to examine the trade data across countries.
Step 1 - Input Country Data
First up we want to input the data for each of our countries using the 'Union multiple tables' option within the input step.
We want to add all of the CSV files from the folder so we can select 'Union Multiple Tables' option from the Tables tab. Then we also need to ensure that Prep is reading in the first row as a header, therefore we need to make sure that is selected from the 'Settings' tab:
Settings
Tables
Once we have input all of the tables we can make sure that the Reporter and Partner fields have the Country/Region data role so the table should look like this:
Step 2 - Country Codes
Next we want to extract the country code from each of the File Paths. To so this we can split the field and remove the 2 occurrence after the '_':
The automatic split should also work here, but we should now have a field with just the Country codes. We can rename this to Country Code and then remove the other fields relating to File Paths.
Step 3 - Imports and Exports
We now want to keep only the import and export data only. To do this we want to select Import and Export from the Indicator Type field and keep only these two values.
Then we want to remove European Union, Occ.Pal.Terr, Other Asia, nes, and World from the Reporter field.
Then finally remove '...', Special Categories, and World from the Partner field.
Once we have filtered the table we can pivot the table to ensure that all the years are in the same column.
For this we want to use a Columns to Rows pivot with all of the years within the pivoted fields:
Then from here we can remove any null values from the Pivot Values and rename the Pivot Names to Year, then change that to a Date format.
Finally we want to pivot the data again, but this time we want to include the indicators as separate fields. Therefore, we want to pivot the data using a Rows to Columns pivot where the indicators are in the pivoted fields and the sum of Pivot Values is the aggregate.
After transforming the shape of the data our table should now look like this:
Step 4 - Country Geo Data
We are now in a position where we can incorporate the geo data for each of the countries. First, we need to input the countries-codes data source and keep only the ISO3 CODE, LABEL EN, and geo_point_2d fields.
From the geo_point_2d field we can split the lat and long out using the ',' as a separator. Then make sure we rename the fields to Latitude and Longitude so that the table now looks like this:
We can now join this to our workflow using a left join so that all of the country information is included and we can join where Country Code = ISO3 Code.
We now have the lat and long for the reporter countries but we also want to include them for the Partner countries as well.
First we need to ensure that each partner has a country code. Therefore, we can go back in our workflow to the step where we split the country codes from the file path. From here we can create a new aggregate step, on a separate branch, where we can just return all of the Reporters (rename this to Country) and Country Codes.
This table can now act as a lookup table to add the country codes to our Partner fields. We can join this using a left join where Partner = Country:
Then finally we can join on the geo data for the partners using a left join where Country Code_Partner = ISO3 Code
After this final join we can rename and remove some fields so that our table is now ready to output.
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year...
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will u...
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in ...