2023: Week 51 - Solution


Solution by Tom Prowse and you can download the workflow here.



Step 1 - Toy Fill Down

After inputting the data into our workflow, we want to utilise the 'Data Interpreter' to help us clean the initial input from the Excel sheet. This will help to remove any headers and make the table nicer for us to clean.

Next, we want to use the Toy Building Tracker sheet as our first input. This is the sheet that is formatted 'nicely' in Excel, but doesn't transfer across well into a data prep tool. First we want to identify which list each toy belongs to by identifying whether the headers within the 'Toy' field. 

Which Nice List?
IF STARTSWITH([Toy],'Number')
THEN [Toy]
END

We can then use this field along with a Fill Down calculation to fill in the Null values. 

Which Nice List?


After this calculation our table should now look like this with a new field that contains the headers



Then to finish this off we can remove any rows where the Which Nice List? value is equal to the Toy value, which will remove the headers from the table. This means that we lose the Number of Children value from the headers, therefore we need to add these back in as a new field by creating a new branch from the initial output. 

In the new branch we filter to contain only the headers (values starting with 'Number' in the Toy field) then keep only the Toy and Quota fields (we can rename quota to 'Number of Children'). Now we are ready to join this back to our workflow using an inner join where Toy = Which Nice List?



Then finally we want to clean the 'Which Nice List?' field by extracting the type of list (eg, Nice, Nicest etc). 

To do this we can use an automatic spilt which will remove all other words and leave us with the type of list that we require.

Our table should now look like this



Step 2 - Quotas & Staff

Now we can update the Quota field so that it represents how many toys are needed. For this we can multiply the number of children by the quota  

Quota
[Quota]*[Number of Children]


Then we can focus on the workers (elves). First we want to ensure the different types of jobs are in a single column, therefore we need to pivot the data using a columns to rows pivot with each of the Job Names in the pivot fields:



Then we can extract the initials for each by using the LEFT & RIGHT functions.

1st Initial
LEFT([Elf Name],1)

2nd Initial 
RIGHT([Elf Name],1)

Now we have the initials we can match up the Elf Names from the Lookup table. We do this by using a join on each of the different initials. 

First we need to clean the Elf Name Lookup table by using a split field on Name to extract the first initial and the name of the elf. For this we can use a custom split with a '-' as a separator.


We can then join these as separate joins to each of the initials from the other workflow.

1st Initial
Inner join where 1st Initial = Initial



2nd Initial 



From here we can now create the Elf Name by combining the names from the initials. 

Elf Name
[Name]+" "+[Name-1]

Then finally we want to identify the Production Managers so we can filter the Elf Job field to keep only the 'Production Manager' role.

At this stage our table should look like this: 



Step 3 - Dates

We are now ready to reshape the data so that we can create a data field. Currently, the dates are across the table as separate fields, this is useful within Excel but not too useful when we want to visualise our data. 

To change the shape of the table we first want to pivot the dates using a columns to rows pivot with each of the dates. We can utilise the 'Wildcard' pivot here by using the '-' as the identifier.


We can then use the Week field to create a date by adding the Year 2023 onto the end then converting to a date field. 

Week
DATE([Week]+"-2023")


Step 4 - Output 1

For the first output we want to create a running sum of the toys that have been produced: 

Running Sum



Then the can use this field to determine if it is over or under the quota: 

Over or Under Quota 
IF [Running Sum of Toys Produced]>[Quota]
THEN 'Over'
ELSE 'Under'
END

Our first output should look like this:



Step 5 - Output 2

For the 2nd output we want to go back to the pivot step and then create a new branch by creating an aggregate step. 

In this aggregate we want to calculate the total number of toys produced, therefore we group by List, Number of Children, Toy, & Quota then Sum Toys Produced. 



From here we can calculate the top level figures for the year:

Toys Over/Under Quota
[Toys Produced]-[Quota] 


Total by List
Using a Fixed LOD to calculate the total for each List



Overproduced
[Total by List]-[Number of Children]

Toys Ready for Christmas
First we can identify the list that has the most overproduction by using a Fixed LOD to find the max value for each list: 



Then we can use this to identify the toys that are ready to by gifts

Toys Ready to be Gifts 
IF [Toys Over/Under Quota]=[Most overproduction]
THEN [Toys Produced]-[Overproduced]
ELSE [Toys Produced]
END

Then finally calculate the toys that are spare: 

Spare Toys
[Toys Produced]-[Toys Ready to be Gifts]

After removing the fields that are no longer required, our 2nd output should look like this: 



You can download the outputs from here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Then head on over to Workout Wednesday to build a fun dashboard with the data.

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text