2020: Week 47 - Solution
Solution by Tom Prowse and you can download our complete workflow here.
This week we continued expanding our Prep Air use cases by looking into flight delays from some key airports, and also combined this with an interesting structure proposed by Michael.
Step 1 - Flag Airports
The first step this week is to identify which of our rows is an airport. Due to the table structure, we can see that all of the rows with a Null in the Airport field are either a Type or Delay. Therefore, we can use the following calculation to flag the airports:
Row Number
IF ISNULL([Airport]) THEN 0
ELSE 1
END
Step 2 - Fill Down Row Category
Now we have flagged which rows are Airports, we want to use this and fill-down the below rows to identify which category each row is in.
The first step is to create a self-join on the Record ID where the RecordID >= RecordID. So we need to create a new Step, then join the new step to our Flag Airports step.
The join setup looks like this:
Our data table now looks like this (notice how the Record ID is now grouped):
Step 3 - One Row per Flight Delay
Now we have categorised our Row IDs, we now want to aggregate the 3 rows together so that we only have 1 row per flight delay.
To achieve this we are going to use two aggregation steps:
1, Group by RecordID, Sum Row Number, Max Airport, Type & Delay.
2, Group by Row Number, Max Airport, Type & Delay
As a result of the two aggregations, our data now looks like this:
Notice how we only have a single row for each flight delay, instead it being split out onto three different rows.
Step 4 - Group Airports
The next part of the challenge is to make sure all of the airport codes are valid. To do this we can use Tableau Prep's in-built data roles to identify any of the invalid Airport codes.
By clicking on the lightbulb icon in the Airport field, Tableau Prep provides us with a recommendation about changing the Data Role, and then grouping the JKF airport code.
Step 5 - Total Delay
Next we want to calculate the total delay and number of delayed flights for each Airport, for each journey type. We again want to use an aggregation tool to do this with the following setup:
After renaming the Number of Rows field, we should now have a table that looks like this:
Step 6 - Combine On-Time Flights
We are now ready to join the On-Time table to our workflow using the join step. Within the join setup we want to join on Airport and Type:
Our table looks like this as we have added the total number of flights:
Step 7 - Calculations
Within this final step we are going to calculate the final metrics relating to average delay and % of delayed flights. We use the following calculations:
Number of Flights
[Number of flights]+[Number of delayed flights]
This gives us the total number of flights (on-time and delayed)
Avg Delay (mins)
ROUND([Delay]/[Number of flights],2)
Calculates the average delay and rounds to 2 decimal places.
% Flights Delayed
ROUND(100*[Number of delayed flights]/[Number of flights],2)
This calculates the % of total flights that were delayed to 2 decimal places.
After calculating these metrics and removing any additional fields, we are now ready to output our table:
You can download the output here.
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14, @JonathanAllenby & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!