2021: Week 12 - Solution

 


Solution by Tom Prowse and you can download the full workflow here. 


The challenge this week took a look at a data set showed to us by Lorna Brown, and was all about different economic indicators related to tourism in the Maldives. This had a bit of a quirky structure so the challenge was transform and clean this up so that it'll be more useful to create a visualisation.

Step 1 - Pivot Months

The first step is to bring all of our month fields into a single column using the pivot tool. Using the Columns to Rows pivot, we have dragged all of the month fields into the pivot section so we only have our non month fields left: 

After the pivot we want to rename the fields, remove any nulls, and change the data types for the Number of Tourists and Month fields. We also only want to focus on the Tourists information from each country, therefore we can also Keep Only the 'Tourists' from Unit-Detail, and we can Exclude 'Total Tourist Arrivals' from the Series-Measure field. 

Our table should now look like this: 


Step 2 - Countries Branch

The next step is to split the workflow into two sections, one for countries, one for continents. First we are going to focus on the Countries section, therefore we need a new step and to filter to exclude 'Real Sector / Tourism / Tourist Arrivals' from the Hierarchy-Breakdown field. This will leave us with just the arrivals from the different countries. 

Now we have each of the countries, we want to extract the country name from each of the Series-Measure fields. We can do this in a couple of steps, first we want a custom split with the following setup:

This leaves us with the following countries: 


Where we can just double click on 'the United Kingdom' and remove the 'the' so we have our desired country name. 

Now we have the country name, we also want the continent as well. This time we can just use an automatic split on the Hierarchy-Breakdown field which will extract each of the continents from the end of the string. 

After renaming the relevant fields our table should now look like this: 


The final part of this step is use an aggregation tool to find the totals for each continent on each month: 


Step 3 - Continent Branch

We can turn our attention to the Continents now by creating a new branch from the step where we cleaned the data types previously. From here we are going to keep only the 'Real Sector / Tourism / Tourist Arrivals' field from the Hierarchy Breakdown. This will leave only the continents/areas that we want to focus on. 

The next task is to use a custom split to extract the continents from the Series-Measure field. First we want to replace the '-' with a 'from' in the UN rows, so that it will be easier for us to split later on.

Series-Measure
REPLACE([Series-Measure],'- ', 'from ')

We're now ready to use the custom split to extract the continents, this is similar the to technique used in the countries branch: 



Our table should now look like this: 


Step 4 - Combine Countries & Continents

The next step is to join the two separate branches together. As we have aggregated the countries field to a continent level, this will join nicely to our continent branch using the following conditions: 


There's a left join here because we want to include all fields from the continent branch that don't appear in the countries branch (eg, The Middle East & UN passport holders). As a result of the join, we can now compare the total tourists from the continent with the total from the individual countries. We are going to assume the difference between the two are from 'unknown' countries. 

Missing Values 
[Number of Tourists] - zn([Continent Tourists])

The difference between the two are our 'unknown' countries, therefore we want to create a new country field with just the string 'Unknown'. Our table should now look like this: 


Step 5 - Union 

The final step is to bring back in the details for each of the countries along with the unknown countries. As both tables are in a similar structure we can use the union tool to stack them on top of each other. 

After the union the tables look like this (this might depend on what you have called your field names): 


You'll notice that the Number of Tourists field and Missing Values don't quite match up due to the different names, therefore we can use the 'Merge fields' functionality to overcome this. This can be found in the mismatched fields section in the bottom left. All we need to do is select the 'Number of Tourists' field and then hover over Missing Values and press on the +. 


After removing the table names field, we have finished and now have the desired output: 


The full output can be downloaded here for comparisons. 

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank