2024: Week 5 - Getting the right data
Challenge by: Jenny Martin
It's the final week of beginner month and we're going to spend a little more time diving deeper into joins, calculations and outputs.
Prep Air are interested in creating a workflow that has multiple outputs depending on user requirements. They want users to be able to answer the following questions:
- What are the details of the customers who have booked flights and which routes are they travelling on?
- Which customers are yet to book a flight in 2024?
- Which flights are yet to be booked by customers in 2024?
The datasets you'll be working with are fairly large so you'll need to decide which tables to join (and when) to be as efficient as possible. You may wish to use this as an opportunity to explore the sampling options in Tableau Prep too!
Inputs
There are 3 tables to connect to for this challenge:
Prep Air Ticket Sales
Prep Air 2024 Flights
Prep Air Customers
Requirements
- Input the data
- For the first output:
- Create a dataset that gives all the customer details for booked flights in 2024. Make sure the output also includes details on the flights origin and destination
- When outputting the data, create an excel file with a new sheet for each output (so 1 file for all outputs this week!)
- For the second output:
- Create a dataset that allows Prep Air to identify which flights have not yet been booked in 2024
- Add a datestamp field to this dataset for today's date (31/01/2024) so that Prep Air know the unbooked flights as of the day the workflow is run
- When outputting the table to a new sheet in the Excel Workbook, choose the option "Append to Table" under Write Options. This means that if the workflow is run on a different day, the results will add additional rows to the dataset, rather than overwriting the previous run's data
- For the third output:
- Create a dataset that shows which customers have yet to book a flight with Prep Air in 2024
- Create a field which will allow Prep Air to see how many days it has been since the customer last flew (compared to 31/01/2024)
- Categorise customers into the following groups:
- Recent fliers - flown within the last 3 months
- Taking a break - 3-6 months since last flight
- Been away a while - 6-9 months since last flight
- Lapsed Customers - over 9 months since last flight
- Output the data to a new sheet in the Excel Workbook
Outputs
2024 Booked Flights Output:
- 11 fields
- Date
- From
- To
- Flight Number
- Customer ID
- Last Date Flown
- first_name
- last_name
- gender
- Ticket Price
- 44,768 rows (44,769 including headers)
Unbooked Flights Output:
- 5 fields
- Flight unbooked as of
- Date
- Flight Number
- From
- To
- 296 rows (297 including headers)
Customers Yet to Book in 2024 Output:
- 8 fields
- Customer ID
- Customer Category
- Days Since Last Flown
- Last Date Flown
- first_name
- last_name
- gender
- 1,260 rows (1,261 including headers)
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!