2022: Week 21 - Solution


Solution by Tom Prowse and you can download the workflow here


The challenge this week focussed on a real world data structure that needed some cleaning in Tableau Prep so that we could easily visualise the data within Tableau Desktop. There are lots of sheets and some key metrics that we want to focus on... let's see how we completed it!

Step 1 - Input Sheets

The first task is to input all of the different sheets which are based on the different stores. After connecting to the Input file, we then need to use the Data Interpreter to clean the sheets so the table is easier for us to work with. The Data Interpreter will split the sheets into two tables, Key Metrics and Additional Metrics - we want to just focus on the Key Metrics.

As we are using the Data Interpreter, we can't use the Wildcard union, therefore we need to input all of the tables (with A4:T15) in the title, until we have all of the stores in a single table.

The workflow will look something like this: 


As a result, we should now have a table that looks like this: 


Now we have all the stores data in a single table, we can extract the Store name from the Table Names field. The Table Names is automatically created from the Union step.

The extract the store name we can use the Regex_Extract function:

Shop 
REGEXP_EXTRACT([Table Names],'.*?/(.*?)\$.*')

As a result we should now have each of the store names extracted into a single field: 


Finally to clean our table and have a nice input, we can remove the fields that contain quarter totals. We can remove the fields named 'FY22 Q1', 'FY22 Q2', and 'FY22 Q3'.

Step 2 - Reshape

Next up we want to reshape the data so that we have all of our date fields in a single column. To do this want to pivot the data using a Columns to Rows pivot that includes all of the date fields.

We can use the Wildcard Pivot at this stage to bring in all fields that contain a '-'. This will also allow us to future-proof out workflow by bringing in new fields that contain a '-' when they are added.


After the pivot we can rename the Pivot Names to Date, so our table now will look like this:


Step 3 - Orders, Returns, or Complaints

We now need to filter our data so that we are focussing on the correct values. Within the Breakdown field we want to filter so that we keep only: 
  • # Received
  • % Processed in 3 Days
  • % Processed in 5 Days
  • % Shipped in 3 Days
  • % Shipped in 5 Days
We then want to extract the '#' or '%' from each, so we can use a custom split on ' ' (space) to do this: 


Then we want to use that field to remove the symbol from the original Breakdown field. To do this we can use a Replace calculation (REPLACE([Breakdown],[Breakdown - Split 1],'')) or alternatively we can use the Remove Punctuation from the Clean options. 

We can now use these fields to recreate a Measure Name field, using this calculation: 

Measure Name
[Breakdown - Split 1]+" "+[Department]+[Breakdown]

After these changes our table should now look like this: 



Step 4 - Actual or Target

At this stage we need to split the workflow into separate branches, an Actual and a Target branch. 

Actual
First we can remove the 'Target' field, and then we want to pivot the data using a Rows to Columns pivot. This time we going to include all of the Measure Name fields and then sum the pivot values in the aggregation field.


As a result we should now have a table that looks like this with our measure in separate fields: 

Target
Next we can create a new branch and focus on the target values. 

First we need to include Target in each of the Measure Names using this calculation: 

Measure Name
'Target - '+[Measure Name]

Then as we are only interested in each of the Target values, we can use an aggregate step group by Measure Name & Target and we don't need an aggregated field here. 


Next, we want to retain just the numbers from the target field. To do this we can use the remove punctuation from the Target field, then use this calculation to remove '>' as well: 

Target 

FLOAT(REPLACE([Target],'>',''))
/
100

At this stage our Target branch table should look like this: 


Step 5 - Combine Branches

The final step this week is to combine both branches so we have a single table with all of our Key Measures included.

First we need to create a dummy field to join our branches on, this can be anything but we have create a calculation on each branch with a 1 in.

Before we join the branches, we need to ensure that the Targets branch is pivoted so each measure is in a separate field. We do this after the Dummy field have been created or else an error will be created due to not having any fields remaining in the field list.

The pivot is a Rows to Columns with the following setup: 


Now that both branches have the same shape, we can join these together using the Dummy field and an inner join. This will allow us to append the row of Targets onto each of the different Actual values so that we can compare them.


After the join we should be ready to output our data: 



You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text