A weekly challenge to help you learn to prepare data and use Tableau Prep
2021: Week 12 - Solution
Get link
Facebook
X
Pinterest
Email
Other Apps
Solution by Tom Prowse and you can download the full workflow here.
The challenge this week took a look at a data set showed to us by Lorna Brown, and was all about different economic indicators related to tourism in the Maldives. This had a bit of a quirky structure so the challenge was transform and clean this up so that it'll be more useful to create a visualisation.
Step 1 - Pivot Months
The first step is to bring all of our month fields into a single column using the pivot tool. Using the Columns to Rows pivot, we have dragged all of the month fields into the pivot section so we only have our non month fields left:
After the pivot we want to rename the fields, remove any nulls, and change the data types for the Number of Tourists and Month fields. We also only want to focus on the Tourists information from each country, therefore we can also Keep Only the 'Tourists' from Unit-Detail, and we can Exclude 'Total Tourist Arrivals' from the Series-Measure field.
Our table should now look like this:
Step 2 - Countries Branch
The next step is to split the workflow into two sections, one for countries, one for continents. First we are going to focus on the Countries section, therefore we need a new step and to filter to exclude 'Real Sector / Tourism / Tourist Arrivals' from the Hierarchy-Breakdown field. This will leave us with just the arrivals from the different countries.
Now we have each of the countries, we want to extract the country name from each of the Series-Measure fields. We can do this in a couple of steps, first we want a custom split with the following setup:
This leaves us with the following countries:
Where we can just double click on 'the United Kingdom' and remove the 'the' so we have our desired country name.
Now we have the country name, we also want the continent as well. This time we can just use an automatic split on the Hierarchy-Breakdown field which will extract each of the continents from the end of the string.
After renaming the relevant fields our table should now look like this:
The final part of this step is use an aggregation tool to find the totals for each continent on each month:
Step 3 - Continent Branch
We can turn our attention to the Continents now by creating a new branch from the step where we cleaned the data types previously. From here we are going to keep only the 'Real Sector / Tourism / Tourist Arrivals' field from the Hierarchy Breakdown. This will leave only the continents/areas that we want to focus on.
The next task is to use a custom split to extract the continents from the Series-Measure field. First we want to replace the '-' with a 'from' in the UN rows, so that it will be easier for us to split later on.
Series-Measure
REPLACE([Series-Measure],'- ', 'from ')
We're now ready to use the custom split to extract the continents, this is similar the to technique used in the countries branch:
Our table should now look like this:
Step 4 - Combine Countries & Continents
The next step is to join the two separate branches together. As we have aggregated the countries field to a continent level, this will join nicely to our continent branch using the following conditions:
There's a left join here because we want to include all fields from the continent branch that don't appear in the countries branch (eg, The Middle East & UN passport holders). As a result of the join, we can now compare the total tourists from the continent with the total from the individual countries. We are going to assume the difference between the two are from 'unknown' countries.
Missing Values [Number of Tourists] - zn([Continent Tourists])
The difference between the two are our 'unknown' countries, therefore we want to create a new country field with just the string 'Unknown'. Our table should now look like this:
Step 5 - Union
The final step is to bring back in the details for each of the countries along with the unknown countries. As both tables are in a similar structure we can use the union tool to stack them on top of each other.
After the union the tables look like this (this might depend on what you have called your field names):
You'll notice that the Number of Tourists field and Missing Values don't quite match up due to the different names, therefore we can use the 'Merge fields' functionality to overcome this. This can be found in the mismatched fields section in the bottom left. All we need to do is select the 'Number of Tourists' field and then hover over Missing Values and press on the +.
After removing the table names field, we have finished and now have the desired output:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year...
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in ...
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will u...