2021: Week 1 - Solution

Firstly, a big thank you for everyone who has got involved with the challenge this week. The response on social media has been amazing and we have loved seeing all of the different solutions, especially across various tools.

For anyone who is new to Preppin' Data, the solutions come in following parts: 

  1. Downloadable Workflow - these are Tableau Prep packaged workflows that you can download and unpick our solutions.
  2. Written Solution - a step by step blog post on how to solve the challenge. This is linked to the workflow that you can download.
  3. Video Solution - a walkthrough guide on how to solve the challenge.
The written and video solutions may differ slightly as we want to show potential different ways of solving the problem. 

Let's get started with 2021: Week 1's solution!

Solution



The solution is by Tom Prowse and you can download the workflow here.

As a recap this week's challenge has a focus on cleaning data so that it's ready to answer some questions from our stakeholders.

Step 1 - Input Data

The first step this week is to input the data into the workflow. We do this by clicking, 'Connect to Data' then choosing 'Text File'. We can then choose the selected file and this should be automatically placed into our workflow canvas so that we can get started!


Alternatively, you can just drag and drop the input from your file explorer! 

Step 2 - Split Store-Bike Field

Now we have the data in the workflow, we can add a new Clean step which will help us view what data is within the table. 

From the requirements, we need to split the 'Store-Bike' field into two separate columns. There are a couple of ways that we can do this: 

1 - Custom Split 
Using a custom split with a separator on hyphen ('-') then choosing all values will split this into the two separate columns. 

2 - Automatic Split
Tableau Prep will automatically recognise what the separator should be and identify to split this into two columns as required. This works for this example, but for more complex problems this technique may not be possible.

After the split we now have two new columns which contain our Store and Bike information:


To complete this step, we want to rename the fields to 'Store' & 'Bike'. This can be done by double clicking on the field header. Then we can remove the 'Store - Bike' field, by right-clicking and choosing remove. 

Our table should now look like this:


Step 3 - Clean Bike Field

The next requirement is to clean the 'Bike' field so that we only have three unique values (Road, Mountain, Gravel). 

You'll notice some spelling mistakes on a few of the bikes, therefore we can use the grouping functionality within Tableau Prep to group these together. 

By pressing the three dots on the 'Bike' field, we can then choose 'Group Values' then 'Pronunciation'.

This will automatically group different words that are pronounced in a similar way, therefore we can save time instead of manually grouping or writing out a calculation! As a result we now have our 'Bike' field cleaned with the correct three values: 

Overall our workflow now looks like this: 

Note: This can be achieved within a single step, however I have split them out so that the process is easier to follow. 

Our data should now look like this: 


Step 4 - Quarter and Day of Month

For this step we need to format our dates in a slightly different way and we can use Tableau Prep's native features to help us out with this. 

First, we need to duplicate the date field and then we can use the 'Convert Dates' option (under the three dots) and choose 'Day of Month'. This transforms our date and just returns the day of the month. We can repeat this again with the duplicated date, but this time for Quarter.


To understand what's going on 'under the hood', we can take a look at the changes pane on the left-hand side to see the calculations that Tableau Prep has created for us. By double clicking on one of the convert dates options, this will open the calculation window with the Datepart function that has been created. You can change this if you want to modify the process slightly. 


After converting our dates and renaming the fields, our data table should now look like this: 


Step 5 - Remove Test Orders

The final step this week is to remove the first 10 orders from the list as these were test orders. 

There are a few different ways that this can be achieved with a filter: 

1. Selected Values
You can manually deselect the first 10 Order IDs and exclude them. This is quite manual and could take a very long time if we were working with a larger data set. 

2. Calculation
We can use a calculation which only returns the Order IDs that are greater than 10. This would look like this: 


3. Range of Values
You can also use Tableau Prep's visual interface to select a range of values. From here you can use just a min/max, therefore we are going to use a Minimum of 11 and this will give us the required filter.


As a result of the filter, our data is now ready for the final output which looks like this: 


The full output can be viewed here

If you are wanting to now take this into Tableau Desktop to create the visualisation, then you will need to add an Output step, then output the data as a Tableau Hyper file or CSV. 

That's it - 2021 Week 1 complete!!

Thank you for getting involved this week. Now you have finished the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1


You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text