2020: Week 18 - Solution



You can find our full solution workflow below and download it from the Preppin' Data Community page!


For this week's challenge we took a look at the line-ups for Liverpool FC in this season's Premier League and ultimately wanted to answer the following questions:
  • How many times has each player been in the squad? 
  • How many appearances has each player made?
  • What is their minute per games ratio?

Step 1 - Clean Input

Our input this week is a nicely formatted spreadsheet that looks nice if we were viewing it in Excel, however when we input this into Tableau Prep we get a lot of unwanted columns and field headings:


This isn't ideal when working with the data in Prep, therefore our first task is using the Data Interpreter to give us a helping hand! 


Once we have this box ticked Tableau Prep automatically renames our headers and organises our table into something a lot more useful:


Now we have a clean input, we can go ahead and clean some of the main fields within the table. These are split into key areas, including:
  • Match Details
  • Starting XI
  • Substitutions
The data interpreter has left the category in the field header, eg 'Match Details Date' so we want to remove this from each of our headers. One way of doing this is to rename the headers one-by-one, but a more dynamic way is as follows. 

First, we will clean the Match Details headers. 

Pivot - C2R
Use a Columns to Rows Pivot, with the wildcard search on 'Match Details'. 


This is now how our data looks: 

Replace
After the pivot step, we can then use a Replace calculation to remove the Match Details from each row. The calculation is: 

Pivot1 Names
REPLACE([Pivot1 Names],'Match Details','')

Pivot - R2C
It is now time to use a Rows to Columns Pivot to pivot our results back so that the rows in our Pivot Names field become our headers. We do this by Grouping on Pivot1 Names and finding the Min (Max also works) of Pivot1 Values


We now have nice clean headers that have Match Details removed! This is also dynamic so if a new field is added within the Match Details section, then it will automatically be cleaned just like the other columns. 

Our data now looks like this: 


Now our Match Details headers are clean, we can turn our attention to the Substitutes. We are going to use a similar process to clean these; Pivot (C2R), Replace, Pivot(R2C).

Pivot (C2R) - Subs
First we have some columns that include information about when substitutions were made. Therefore, we will use the wildcard pivot on the word 'sub'. Be careful, as this is case sensitive and you don't want to include Substitutes. 

Our column headers are now in the row, like this:


Replace
Now, unlike Match Details these values (sub1, sub1 1 etc) don't really make much sense. Therefore, we are going to update these so that they are easier to understand. The first step is use a Custom Split on the Pivot Names field, so we split all values by ' ' to parse the numbers away from the subX.

After the split we are left with this: 


From here we can follow the renaming convention within the challenge requirements to rename these to something useful. To do this we can use the following calculation: 

Pivot Names 

IF
[Pivot1 Names - Split 2] = ''
THEN [Pivot1 Names - Split 1]+' off'

ELSEIF    
[Pivot1 Names - Split 2] = '1'
THEN [Pivot1 Names - Split 1]+' on'

ELSEIF
[Pivot1 Names - Split 2] = '2'
THEN [Pivot1 Names - Split 1]+' time'

END


This essentially is replacing the number after the space with the correct off/on/time so that we have a better understanding of what the metric means. The last step before the pivot is to remove the fields that were created after the split, and our data now looks like this: 


Pivot - R2C
Again, we are going to use another pivot to get our sub rows to be headers. This time we will Group by Pivot1 Names and then Sum Pivot1 Values: 


Now we have our nicely renamed field headers, and we can remove any Nulls from the Date field so we are left with the following:


The final headers that we need to clean is the Starting XI and Substitute player names. Again we are going to use a columns to rows pivot for this, with the following setup: 


Notice how we have used a wildcard input for the 'Starting XI' fields, and then we have had to manually drag the 'Substitute' fields in afterwards. As a result our data looks like this: 


This is how we want our final table to be structured so we're all done with pivots! All we have to do is tidy up the following fields: 
  • Rename Pivot Names to Player Number
  • Rename Starting XI to Player Name
  • Remove Letters for Player Number - This removes the 'Starting XI/Substitute text)
  • Change Player Number to data type Number (Whole) - This removes any additional spaces
We now have our table formatted and cleaned in a nice way, now it is time to turn our attention to answering the questions. Our first task is to identify whether or not a player was subbed on or off, we can calculate this with the following calculations: 

Subbed Off?
IF [Player Number] = [sub1 off] 
OR [Player Number] = [sub2 off]
OR [Player Number] = [sub3 off] 
THEN TRUE
ELSE FALSE 
END

Subbed On?
IF [Player Number] = [sub1 on]
OR [Player Number] = [sub2 on]
OR [Player Number] = [sub3 on]
THEN TRUE 
ELSE FALSE 
END


We now have a boolean field for whether the player was subbed on or off; looking like this: 


Now we want to identify the timing of each substitution so we use the following calculation: 

Sub On/Off Time

IF 
([Subbed Off?] = TRUE 
OR [Subbed On?] = TRUE )
AND ([Player Number]=[sub1 off] 
    OR [Player Number] = [sub1 on])
THEN [sub1 time]

ELSEIF 
([Subbed Off?] = TRUE 
OR [Subbed On?] = TRUE )
AND ([Player Number]=[sub2 off] 
    OR [Player Number] = [sub2 on])
THEN [sub2 time]

ELSEIF 
([Subbed Off?] = TRUE 
OR [Subbed On?] = TRUE )
AND ([Player Number]=[sub3 off] 
    OR [Player Number] = [sub3 on])
THEN [sub3 time]

ELSE NULL
END

This allows us to identify what timing each sub number was made. It returns a Null if a sub wasn't made, and also if the sub numbers are not equal. We are left with data that looks like this:


Finally, we need to calculate what order each substitute happened in. There can be a maximum of 3 subs in a game so we use the following formula: 

Sub Number

IF ([Subbed Off?] = TRUE 
OR [Subbed On?] = TRUE )
AND ([Player Number]=[sub1 off] 
    OR [Player Number] = [sub1 on])
THEN 1

ELSEIF ([Subbed Off?] = TRUE 
OR [Subbed On?] = TRUE )
AND ([Player Number]=[sub2 off] 
    OR [Player Number] = [sub2 on]) 
THEN 2

ELSEIF ([Subbed Off?] = TRUE 
OR [Subbed On?] = TRUE )
AND ([Player Number]=[sub3 off] 
    OR [Player Number] = [sub3 on]) 
THEN 3

ELSE NULL
END


We can now remove the fields relating to Sub1, Sub1 1 etc and we are left with the following: 


Next we need to calculate how many minutes each player played and also how many appearances they made. We can use the following calculations:

Mins Played

IF [Subbed Off?] = TRUE 
THEN [Subbed On/Off Time]

ELSEIF [Subbed On?] = TRUE 
THEN 90-[Subbed On/Off Time]

ELSEIF [Player Number]>=12
AND 
    ISNULL([Sub Number]) 
    THEN 0

ELSE 90
END

Because there is 90 mins in each match and 11 players starting, we can say that if the player number is greater than or equal to 12 and the Sub Number is null then they didn't play any minutes in the match. 

Appearances 
IF [Player Number] <= 11
OR 
NOT ISNULL([Sub Number])
THEN 1
ELSE 0
END


For appearances we use a similar logic; if the player number is less than or equal to 11 then they started or if the Sub Number is not Null then they also played as a sub. We want to give both scenarios a 1 so we can then total these up in the next step. 

To calculate the total minutes played and appearances made, we need to use an aggregate tool and group by player name, then sum Number of Rows, Mins Played, Appearances. 


The final steps are to calculate the Mins per Game for each player. First, we can rename the Number of Rows field to In Squad as this is the total times that each player was included in the squad. Next, we can use the following calculation to calculate the Mins per Game:

Mins per Game
zn([Mins Played]/[Appearances])

Make sure you wrap the calculation in a ZN as this will any Nulls into 0.

That's week 18 complete! Your output should look like this:


Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text