2021: Week 14 - Solution
Solution by Tom Prowse and you can download the full workflow here.
Prep Air was back this week and we were taking a look at the in-flight purchases within a few of the selected flights. This included combining multiple data sets , parsing strings, and identifying what seat purchases the most on our flights.
Hopefully this challenge allowed you to combine various core data prep techniques and put this in a real world scenario to help make everything easier to put together. Let's have a look at how we solved it!
Step 1 - Where is each seat located?
The first step this week was to identify where each seat was located on the plane (Window, Middle, or Aisle). Using the Seat List table we want to first pivot each seat letter so that we have all of the Seat Letters within a single column with the corresponding Passenger Number and Row:
We can now add a label for each of the seats positions:
Seat Position
CASE [Row Letter]
WHEN 'A' THEN 'Window'
WHEN 'F' THEN 'Window'
WHEN 'B' THEN 'Middle'
WHEN 'E' THEN 'Middle'
WHEN 'C' THEN 'Aisle'
WHEN 'D' THEN 'Aisle'
END
Our table now looks like this:
Step 2 - Combine Seat List and Passenger List
Next, we want to combine the Seat Positions and the Passenger List tables using an inner join on Passenger No.
Step 3 - Parse Flight Details
Now we can turn our focus onto the Flight Details table. This is structured in a single string that can be broken apart by a '|'. First we need to split each part out to a separate field using the custom split field:
After splitting each part of the string into a separate field we need to rename each of the field headers so that they are easily recognisable. We also need to clean each field with the following:
Split 1
- Rename to Flight ID
- Remove punctuation
- Change Data Type to Whole Number
Split 2
- Rename to Dep Airport
Split 3
- Rename to Arr Airport
Split 4
- Rename to Dep Date
- Change to Date field
Split 5
- Rename to Dep Time
- Change to Date Time
The final step is to combine the Dep Date and Time with the following calculation:
Dep Date Time
MAKEDATETIME([Dep Date],[Dep Time])
And finally we can create a label for what time of day each flight had departed with the following:
Depart Time of Day
IF
DATEPART('hour',[Dep Date Time])<12
THEN 'Morning'
ELSEIF
DATEPART('hour',[Dep Date Time])>=12 AND DATEPART('hour',[Dep Date Time])<18
THEN 'Afternoon'
ELSEIF
DATEPART('hour',[Dep Date Time])>=18
THEN 'Evening'
END
After parsing the strings our table now looks like this:
Step 4 - Combine all tables
The final table that we need to clean before combine them all together is the Plane Details table. This provides information about what seats are Business or Economy Class.
First, we need to split the range of seats so that we have a min and max for each range. We can do this by splitting the Business Class field using a custom split:
We can then remove any extra fields, rename to Max and Min, and then create a label of 'Business Class' so our table now looks like this:
We're now ready to combine our tables. First we want to combine the Plane Details and the Seat List & Passenger List part of the workflow by using the following join conditions:
We need this to be a left (or right depending which table you have joined first) join where we include all of the fields from the Seat & Passenger lists. This is because we want to keep all of the records and not just the business class seats. After the join we can then rename the Nulls to Economy Class so our table now looks like this:
The last join that we need to do is to join the Flight Details with the rest of our workflow. This is an inner join on Flight No. = Flight ID:
As a result of this final join our table now looks like this:
Step 5 - Answer Questions
After combining all of the tables we can now start to answer the questions in the challenge.
Question 1 - Time of Purchases
First we need to filter to remove the Business Class passengers, because we don't charge these purchases.
We can then use an aggregate tool to group by Depart Time of Day then a distinct count of Flight ID and Sum Purchase Amount.
Avg per Flight
ROUND([Purchase Amount]/[Flight ID],2)
The final output looks like this:
Then finally rank the flights:
The output for question 1 is:
Question 2 - Highest Amount per Seat Position
From the same step where we excluded the Business Class passengers, we want to create another aggregation tool with the following setup:
Again we want to rank the seat positions after rounding the values to 2 decimal places:
The output for question 2 is:
Question 3 - Business & Economy Sales
The final question is to see how much we are making from both business and economy sales. Therefore we need to go back a step (to before we excluded business sales) and then create another aggregation tool with the following setup:
Then we want to round to 2 decimal places and then rank:
Bonus - Export to Excel
In version 2020.1.2 Tableau Prep released the ability to export your workflow to Excel so I thought this would be a good bonus challenge this week.
Exporting to Excel is a little bit different to exporting to a hyper or a csv and has a few different options.
As normal we have the option to choose the file name and type but with an Excel output we can also choose which worksheet we want to export the data to. This can be an existing sheet or we can create a new one, and we can also choose how we want to refresh the table. More details about the different options can be found here.
For our outputs, we want three different output tools with each one having a different sheet name. When we press 'run' on these workflows then this will create three different sheets within the same Excel file so this is a nice simple way to update those spreadsheets!
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!