2021: Week 30 - Solution

 


Solution by Tom Prowse and you can download the workflow here


Step 1 - Create TripID

The first step is to create a tripID using the time of day and the date (12th July 2021). We can use a MAKEDATETIME function here.

Time
MAKEDATETIME(
MAKEDATE(2021,7,12),
MAKETIME([Hour],[Minute],0)
)

Now we have the time of day that the lift made a journey, we can use a Row Number to get our tripID:

TripID 

{ORDERBY [Time] ASC : ROW_NUMBER()}

As a result we now have our TripID for each journey made: 


Step 2 - Calculate Floors Between Trips

Next we need to calculate how many floors are between each trip. Before we do that, we need to categorise whether the floor is numerical or a letter (B, G), so we use an aggregate tool to return a list of the floors by just grouping by Floor: 


Now we want to distinguish whether a floor is numerical or a letter. We can just do this manually by selecting the B & G, but if we wanted an automate way of categorising these we can use the following calculation: 

Numeric?
REGEXP_MATCH([Floor],'\d+')

This returns a True or False if Floor field has one or more digits (\d+). 

After categorising, we now want to split into two separate branches, one for Numeric and another for B&G. 

For the numeric brand, we want a new clean step, then to keep only True from the Numeric? field, and finally change the Floor field type to a number. 

For the floors with letters, we need to keep only False Numeric? values this time, and then we can rank the floors to give us a row number for each of the letters


The reason for doing this is because we need to know how many 'letter' floors we have so that they can be included in the number of floors. Therefore, to get the total number of floors we can use an aggregate tool which returns the maximum of the Order field: 


After calculating both branches, we need to bring these back together and can do this by using a join on a dummy field (eg, 1). We will need to create the dummy on both branches and then use these within an inner join: 


At this stage our table looks like this, where the number of lettered floors has been appended to our numerical rows:


To adjust the floor numbers to account for the letter values, we need to add these together to get the actual number of floors that the lift has travelled: 

Order
[Order]+[Floor]

We can then combine this back to our lettered floor branch so that we have all of the floors with their correct order. For this we use the union tool: 


Our table now looks like this, where we have realigned the order of the floor numbers: 



Now we have the new floor list, it's time to calculate how far the lift has travelled between stops. To do this we want to join on both the To and From fields on our original data set. 

From Join


Before completing the To join, we want to remove the Floor and From fields, then rename the Order field to From. 

To Join


Then we want to do the similar clean up like after the from join. This time removing Floor & To, and then renaming Order to To. 

We've now got the To and From floors, we need to get the next trip onto the same row so that we can use this within a calculation. To do this, we need to subtract 1 from the TripID, and then using an Left join to join the trip and next trip information together. 


From here we can rename From-1 to Next Trip and then use the following calculation to calculate the Floors Travelled between each trip: 

Travel between trips
ABS([To]-[Next Trip])

Our table should now look like this: 



Step 3 - Most Frequent Starting Floor

Next we want to calculate what the most frequent floor the trip starts on, which we will call the default position. 

To calculate this we want to use an aggregate tool to calculate a distinct count of the From floors. 


From here we can calculate what the most frequent starting floor is by using a Fixed LOD to find the maximum on the Count field: 


Then we can use a filter to only return the Floor with the highest number, where [Most frequent]=[Count]. We can then clean the table by removing the Count & Most Frequent fields, then renaming the From to Default Position. Our default position is Floor G (remember this is number 2 in the floor order). 

Step 4 - Default Starting Position

The next step is to calculate the number of floors the lift would need to travel if there was a default starting place. For this we are going to use the default floor as floor G.

First we need to join the trips onto our default floor workflow by using the dummy join technique from earlier. From here we can then calculate the Travel Between Trips from the default position to the From field: 

Travel Between Trips 
ABS([Default Position]-[From])

After this our table looks like this:


From here we can now calculate the average number of floors that the lift has made from the default floor, using an aggregate tool: 


After rounding the value to 2 decimal places (ROUND([Travel between trips],2)) our table should be a single row with the avg floors travelled and the default position. 


We now want to bring back the Floor (letter G) from the union step earlier in the workflow. To do this we can join on Default Position = Order: 


We only want to keep the Floor and Avg Travel fields, so can remove any others. The table should be a single row: 


Step 5 - Travelled From Floor 

Now we want to compare the difference in floors travelled if the lift starts on the 'To' floor. For this we want a new branch from just before where we started the most frequent aggregation. This time the aggregation will be just calculating the avg floors travelled: 


Next we want to round the avg to 2 decimal places and then join this back to our Default Position branch by using a Dummy join like before. 

Step 6 - Travel Difference

The final step is to calculate the avg difference between each of the starting locations. We can do this by just subtracting one from the other: 

Difference
[Avg travel from default position]-[Avg Travel between trips currently]

That is the last step of this week's challenge so we can now output our data: 



After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text