2021: Week 1 - Solution
Firstly, a big thank you for everyone who has got involved with the challenge this week. The response on social media has been amazing and we have loved seeing all of the different solutions, especially across various tools.
For anyone who is new to Preppin' Data, the solutions come in following parts:
- Downloadable Workflow - these are Tableau Prep packaged workflows that you can download and unpick our solutions.
- Written Solution - a step by step blog post on how to solve the challenge. This is linked to the workflow that you can download.
- Video Solution - a walkthrough guide on how to solve the challenge.
Solution
The solution is by Tom Prowse and you can download the workflow here.
As a recap this week's challenge has a focus on cleaning data so that it's ready to answer some questions from our stakeholders.
Step 1 - Input Data
Step 2 - Split Store-Bike Field
Now we have the data in the workflow, we can add a new Clean step which will help us view what data is within the table.
From the requirements, we need to split the 'Store-Bike' field into two separate columns. There are a couple of ways that we can do this:
1 - Custom Split
Using a custom split with a separator on hyphen ('-') then choosing all values will split this into the two separate columns.
Step 3 - Clean Bike Field
The next requirement is to clean the 'Bike' field so that we only have three unique values (Road, Mountain, Gravel).
You'll notice some spelling mistakes on a few of the bikes, therefore we can use the grouping functionality within Tableau Prep to group these together.
By pressing the three dots on the 'Bike' field, we can then choose 'Group Values' then 'Pronunciation'.
This will automatically group different words that are pronounced in a similar way, therefore we can save time instead of manually grouping or writing out a calculation! As a result we now have our 'Bike' field cleaned with the correct three values:
Overall our workflow now looks like this:Note: This can be achieved within a single step, however I have split them out so that the process is easier to follow.
Step 4 - Quarter and Day of Month
For this step we need to format our dates in a slightly different way and we can use Tableau Prep's native features to help us out with this.
First, we need to duplicate the date field and then we can use the 'Convert Dates' option (under the three dots) and choose 'Day of Month'. This transforms our date and just returns the day of the month. We can repeat this again with the duplicated date, but this time for Quarter.
To understand what's going on 'under the hood', we can take a look at the changes pane on the left-hand side to see the calculations that Tableau Prep has created for us. By double clicking on one of the convert dates options, this will open the calculation window with the Datepart function that has been created. You can change this if you want to modify the process slightly.
Step 5 - Remove Test Orders
You can manually deselect the first 10 Order IDs and exclude them. This is quite manual and could take a very long time if we were working with a larger data set.
You can also use Tableau Prep's visual interface to select a range of values. From here you can use just a min/max, therefore we are going to use a Minimum of 11 and this will give us the required filter.
As a result of the filter, our data is now ready for the final output which looks like this:
The full output can be viewed here.
If you are wanting to now take this into Tableau Desktop to create the visualisation, then you will need to add an Output step, then output the data as a Tableau Hyper file or CSV.
That's it - 2021 Week 1 complete!!
Thank you for getting involved this week. Now you have finished the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1