2021: Week 41 - Solution

Solution by Tom Prowse and you can download the solution here.


This week we are looking at football data with a focus on the historical results for Southend United. Southend had only won 18 times in their last 108 games, so we look back at the historical results which were a lot more promising compared to recent times. 

Step 1 - Tidy Input

The first task after inputting the data is to clean some of the fields, including renaming the P1 field to Pts and excluding any null rows.

We can rename the P1 field right in the input step by double clicking on P1 and then renaming to Pts



We can use a filter to remove any of the Null values from the Season field, and our table should now look like this: 


Next we can create the special circumstances field using the following calculation: 

Special Circumstances
IF ISNULL([POS])
THEN "Incomplete"
ELSEIF [POS]="ABAND"
THEN "Abandoned due to WW2"
ELSE "N/A"
END

Then the final part of initial cleaning is to make sure that the POS field only has data for full seasons, so we can replace the ABAND values with Null. 


Step 2 - Extract Numeric Values from League

Next we want to extract the numeric values from each of the leagues so that we can compare how good each of the leagues is. Following the requirements, we need to extract the numbers from each of the leagues and then classify the FL-CH as 0 and the NAT-P as 5. We can do this using the following calculation: 

Numeric League 
IF REGEXP_MATCH([LEAGUE],".*\d.*")
THEN int(REGEXP_EXTRACT([LEAGUE],'(\d)'))
ELSEIF [LEAGUE]="FL-CH" THEN 0
ELSE 5
END

This calculation first identifies whether there is a number within the string by using the REGEX_Match function. If this is true, then we use the Regex Extract function to pull the numbers, and if not then we return a 0 or 5 depending on the league name. 

Here's what the table should look like: 


Step 3 - Create Outcome

Next we need to classify whether or not Southend were promoted, relegated, or stayed in the same league in each of the seasons. We can do this by comparing whether the next seasons league was higher or lower than the previous one. 

First we need to classify what year the seasons started in, we can do this by using a custom split to split the season field to return the first 4 digits before the '-': 


From here we can change the season start to a number, then create a rank of the seasons so we have an order that they occurred: 

Full Season Order

Next we want to use this order field to bring the current and next season onto the same row. We can do this by first creating a new step, then using a calculation to subtract 1 from the Season Order. 


From here we can join this back onto the previous step by using the Full Season Order fields. 




Now we have the current and next season on the same row we can now calculate the outcome of that season using the following calculation: 

Outcome 
IF [Special Circumstances]="Incomplete"
THEN NULL
ELSEIF [Numeric League Next Season]=[Numeric League]
THEN "Same League"
ELSEIF [Numeric League Next Season]>[Numeric League]
THEN "Relegated"
ELSE "Promoted"
END

After doing some further tidying the table should now look like this:




Step 4 - Special Circumstances

We can now fill in the missing rows that occurred due to WW1 and WW2 by using the New Rows step. Within this step we add rows from the Season Start field where any new rows should be null: 


Then after completing the missing rows we can update null values in the season, special circumstances and outcome fields with the following calculations: 

Season 
IF ISNULL([SEASON])
THEN str([Season Start])+"-"+str(int(right(str([Season Start]),2))+1)
ELSE [SEASON]
END

Special Circumstances 
IF ISNULL([Special Circumstances])
THEN 
(IF [Season Start]<1919
THEN "WW1"
ELSE "WW2"
END)
ELSE [Special Circumstances]
END

Outcome 
IF ISNULL([Outcome])
THEN "N/A"
ELSE [Outcome]
END

After these calculations we are ready to output our data: 


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank