2022: Week 45 - Solution


Solution by Tom Prowse and you can download the workflow here


The challenge this week is to turn data set that is easy for humans to read into a data set that is easy for Tableau to read. Changing the structure is vital to being able to interrogate the data more effectively and is an important step when learning what the most optimal shape of your data should be.

Step 1 - Split Year

When we first look at this data set, my initial thought was to use the Data Interpreter however this did quite work as expected so we are going to have to go down a different route.

First we want to clean split off the year from the first row. To do this we can filter the F1 field so that we keep only 'Table 1: Store Sales 2022'.

From here we only need to keep the F1 field, and then use the following calculation to keep the year:

Year 
RIGHT([F4],4)

Then we can remove F4 and change our year to a whole number. We should now just have a single field with the year 2022.


Step 2 - Pivot Rows

Now we have extracted the year, we want to turn our attention to the other rows. Therefore, we can create a new branch and then exclude the first row (where F2 is null). This allows us to have all of our other rows remaining and we can pivot these using a columns to rows pivot. 

We want to pivot all of our rows apart from the F1 which contains the information about the stores. Therefore, we drag in all of the other fields apart from F1: 


After the pivot we should have 3 fields 


Step 3 - Row Numbers

Next we want to remove the F's from the Pivot Names field so that we have a just a row number for each. 

We can use the Remove Letters functionality to remove the F from each row. Then change to a whole number. 

We want to use these row numbers to group together each of the different months. If you look at the data table you can see that each row is split into either Sales or Profit for a given month. Therefore, we want to 'group' these together by changing the odd numbers to the previous even number. Eg, 3 becomes 2, 5 becomes 4 etc.

We can do this by using the Modulo (%) function that helps to identify any remainders.

Grouping 
IF [Pivot1 Names]%2=1
THEN [Pivot1 Names]-1
ELSE [Pivot1 Names]
END

As a result we now have all even numbers and have grouped together each month.



Step 4 - Reshape Months

Now we have the months associated with the different row number groupings, we can start to reshape the data so it becomes more Tableau friendly. 

First we just want to look at each of the months so we filter F1 to keep only the null values then we can remove this field. 

Next we want to identify the max from the pivot values field for each of the groupings. This will help to replace the null values with the given month. We can do this by using a Fixed LOD: 

Month 


This has now provided us with a list of different months and their associated row numbers.

Next we need to join the rest of the data back to the corresponding month. Therefore, we can create a new branch and then exclude the null from F1. This returns all of the rows apart from the month.

We can then join both tables together using an inner join where Grouping = Grouping and Pivot1 Names = Pivot1 Names. 


After this we can remove the duplicated Pivot1 Names-1 and Grouping-1 fields so our table looks like this: 



Step 5- Reshape Headers

Now we have reshaped the months, we can now focus on the headers (Sales & Profit).

We want to do a similar technique here as we want to isolate the headers in a new clean step by keeping only the 'Store' from the F1 field. 

We can then remove the Month and F1 fields and rename the Pivot1Values to Headers. Luckily we don't have any null values here so we don't need to use the LOD calculation.

Our table should look like this:


Now on a separate branch, from the previous step, we want to bring through all of the other rows. Therefore, we can exclude 'Store' from the F1 field, rename F1 to Store and Pivot1 Values to Values, then make sure the Values is a whole number. 

This gives us the values table that we can use to join back to our headers table that we have just created. We can use an inner join on Grouping and Pivot1 Names again: 



Then we can remove the duplicated fields as a result of the join and our table should look like this: 



Now we have both the months and the headers in a better shape, we can use the pivot tool to make the final shape adjustment that we require. This time we want to use a Rows to Columns pivot where we pivot the Headers and then aggregate the Sum of Values. 


This brings our Profit and Sales into separate columns and provides a better structure for use within Tableau.

Step 6 - Create Date Field

The last task for this week is to create a Date field using the month and year from the start of our flow. 

From our pivot field we can join the year (2022) to every row within our data set. We can use the join condition Year != Grouping but if there were more rows of data (more than 2022 groupings) then we might want to think of using a different join condition. But for now this works just fine.



Now we can remove the Grouping field, and then use the following calculation to create the Date: 

Date 
(IF [Month] = 'Sep' THEN 'Sept'
ELSE [Month] END)
+ ' ' 
+ STR([Year])

Then we can make this a date field and remove the Month and Year so our table should now look like this and ready to output: 



You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text