2022: Week 24 - Solution


Solution by Tom Prowse and you can download the workflow here. 


The challenge this week came from Jorge Supelano who wanted to know about the longest flights ever taken. Let's see how we can prepare the data, ready to build his viz!

Step 1 - Remove Airport Names

First we want to remove all of the airport names from the To and From fields so we are left with just the city names. 

All of the additional airport names come at the end of the string, therefore we want to create a common separator that we can use to split these values off. Currently, there is a mix of '-', '–' and '/', therefore we can use the Replace function to make these the same.

From
REPLACE(REPLACE([From],'-','/'),'–','/')

This replaces both of the '-' and '–' with a '/', that we can then use to split off the first part of the string:


Then tidy the From field by removing the original, then renaming the newly split field to From. 

We need to do the same process but for the To field and we should have a table that looks like this: 


Step 2 - Route & Distance

Now we have the To and From locations, we can create the route for each of the flights:

Route
[From]+" - "+[To]

Then we can split the Distance fields so we have a field in km and miles. To do this we need to use a couple of different splits. First we can split out the km distance by using a custom split on '(':


Then on the 2nd split field we can split off the first value before the ';':


We can now remove the original Distance field and 'Distance - Split 2'. Then rename the 'Distance - Split 1' to 'Distance - km', and 'Distance - Split 2 - Split 1' to 'Distance - mi'. 

After renaming the fields we want to remove the km and mi from each of the rows so we can use the clean functionality to Remove Letters from each and change the data type to a Whole Number.

At this stage our table should look like this: 


Step 3 - Rank & Schedule

Next we need to rank the flights based on the distance travelled. To do this we want to use Tableau Prep's inbuilt rank functionality to create a dense rank: 


Now we want to extract the time from the Schedule Duration field. First, we need to update the data type to a string, and then split on the ' ' to return the last value after the space. 


Then we just need to remove the Schedule Duration and rename the split field to Schedule Duration. We also need to update the First Flight field so that it is a Date.

Our latest table should look like this:


Step 4 - Join Lat & Long

The final step is to join the Lat & Long data from the World Cities data source. We need to do this in a couple of joins, starting with From: 


Make sure you rename Lat to From Lat and Lng to From Lng. We can also remove the City field.

The we need to do the same for To: 


Then finally renaming the Lat to To Lat, Lng to To Lng, and removing City.

After these changes we are ready to output our data:


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text