2022: Week 42 - Solution

 


Solution by Tom Prowse and you can download the workflow here.


This week we look at sports training data and filling in the missing gaps where the players don't train. This helps to fill in the gaps and allows us to create some more realistic averages when there is missing data.

Step 1 - Date of Next Session

The first task is to calculate when the next training session is for each player - this will help us identify what days are missing a session.

To fill in the missing dates we first need to create an ID field for each Player and Session combination. We can do this using a Rank calculation: 

ID 


This ID allows us to give an identifier to each of the sessions and therefore we can use this to identify the next or previous one by using a self join.

Before the join we need to calculate the next ID by subtracting 1 from the ID:

Next ID 

[ID] - 1 

This will allow us to bring the current and next session onto the same row and therefore use this within row-level calculations.

Along with the ID, we want to repeat this process with the Date as well. We want to find the next date therefore we want to subtract 1 day from the existing date: 

Next Date 

DATE(
DATEADD('day',-1,[Next Date])
)

As a result of these two calculations we should now have the existing and next date and ID on each row: 


From here we only want to focus on the Next elements therefore can remove the Date and ID fields.

We are now ready to join back to our original step where we created the ID field. Here we want to include everything from the ID step so we'll need to use a Left Join where Player = Player, Session = Session, and ID = Next ID.


After the join we can remove Next ID, Player-1, Session-1, and ID so our table now looks like this: 


Step 2 - Most Recent Session

For the most recent session we want to ensure the next session is the latest date within the data set. To do this we can use a Fixed LOD to calculate the latest date: 

Max Date 


From here we can fill in the Nulls (latest session) from the Next Date field with the maximum date: 

Next Date

IF ISNULL([Next Date])
THEN [Max Date]
ELSE [Next Date]
END

Then we can remove the Max Date field. Alternatively, we could make use of the Merge functionality within Tableau Prep here.

Step 3 - Scaffold Rows

We are now ready to scaffold our data so that we have a row for each player, date, and session. For this we can make use of the New Rows tool where we want to create rows between the Date & Next Date with an increment of 1 day. 


As a result we now have our missing dates filled in so our table looks like this:


Step 4 - Final Tidying

Now we have our table scaffolded out nicely, we want to identify the sessions that have been carried over. To do this we want to create a flag for Actual or Carried Over sessions: 

Flag 
IF [Date]=[Session Date]
THEN 'Actual'
ELSE 'Carried over'
END

Then we no longer need the Date or Next Date fields so we can remove these. 

Finally, we want to remove any Weekends from our table. To do this we can duplicate the Session Date field, then convert the duplicated date so that it shows Weekdays. 


Then we can Exclude Saturday and Sunday from our list of weekdays.

We are then ready to output our final table that should look like this:


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank