2020: Week 43 - Solution
Solution by Tom Prowse and our full solution can be downloaded here.
Step 1 - Input Files
The first task is to input all of the tables for each gender. At first we are only given the option to use a single table from the input, and when bringing this into our workflow, a lot of the months are combined within the same fields. This isn't ideal for Prep!
Therefore, the first step is to use the Data Interpreter. This identifies the different tables within the spreadsheet and breaks these down so that each can be a separate input.
Now we have separate inputs for each section we can identify which ones have the same structure, these include:
Boys
- Table 5 A6:O18 = Jan - Apr
- Table 5 A20:O32 = May-Aug
- Table 5 A34:O46 = Sept-Dec
Step 2 - Clean Additional Rows
Within the Girls input there are a few additional rows which don't quite match the same format as our initial input. Therefore we will need to clean these separately and ensure that they are in the same format so we can combine them later.
First, we want to focus on Table 5 A32:C33 & Table 5 M32:O33 which have extra date for months May & Aug. Both of these tables are in a similar structure and require the same set of steps to transform the data.
Starting with the May input we need to follow these steps:
- Add Month field - Use a calculated field to add a string called 'May'
- Pivot - Use a columns to rows pivot on all of the fields apart from Month
- Identify Values Names - Using the following calculation we can identify whether the row is a Name, Rank or Count:
This calculation can be broken down in the following way:
[Month]+' '+ - this adds the month and a space to the start of the string
IF REGEXP_MATCH([Pivot1 Names],'[[:alpha:]]') THEN "Name" - identifies if there are letters and returns 'Name'
ELSEIF int([Pivot1 Names])<=10 - If the number is below 10 then return Rank
Then all other rows will be returned as 'Count'.
We need to repeat these steps for the Aug table as well. Therefore, instead of repeating the process manually, we can use Tableau Prep's 'Save Steps as Flow' feature. To use this, we need to highlight the steps that we want to use again, then Right-Click and choose 'Save Steps as Flow':
You have the option to save these to a file and also to your server. When saving to the Tableau Server, other people within your organisation can also access these steps, so this can be a really useful feature if you have a lot of repeatable processes.
When adding the saved steps onto our Aug input, we do this by pressing the '+' then choosing 'Insert Flow'.
You can then select the steps that we have just created, and this will be input into the workflow. All we need to do is update the Month field in the first step from May to August.
As these are similar steps, there is also the option to Group these together so that our workflow looks a bit cleaner. This was a feature released in 2020.3 and we can highlight multiple steps, and then right-click and Group them together.
Now we have prepared and grouped our steps, we can combine these together using a union tool. This creates two columns (Metric & Fields) but we want the metrics to be in separate columns therefore we now need to use a rows to columns pivot:
Our data now looks like this:
The last input table that we need to clean is the additional rows for December. This is in a slightly different structure to the other additional rows, so we can't use the Saved Steps. Instead we are going to remove any null values, and then rename the fields to the following:
- F1 - December Rank
- F2 - December Name
- F3 - December Count
Step 3 - Pivot & Clean
Now we have everything in a single table, it's time to make some transformations so that we can start building towards the desired outputs.
First, we want to pivot the data so that the monthly columns are in rows instead. However, we want to maintain columns for our Rank, Name, & Count so we can use a multiple pivot for this. Using the wildcard pivot entry, we want to add three columns (these can be added using the '+' in the top right corner of the Pivoted Fields section) with the following setup:
Now we have the data pivoted into the correct structure, we need to complete the following steps to clean the fields up a bit:
- Filter Nulls - Remove any null values from the Name field
- Split Month - Using an automatic split on the Pivot1 Names field will split out the Month. We can also rename this field Month and remove the Pivot1 Names field.
- Split Gender - Using the Tables Names field, create a custom split to split off the first field before the full stop.
- In the newly split field, we want to remove any numbers (the 2019), and then use a Replace calculation to remove the 'names' part of the string: Gender= REPLACE([Gender],'names','')
- Capitalise the first letter. We can use the following calculation to make the first letter a capital:
Step 4 - Prepare Outputs
To create our second input we need to use an aggregation to calculate the totals for the year. Using the aggregation tool we want to group by Gender & Name, then Sum the Count fields: