2021: Week 41 - Solution
Solution by Tom Prowse and you can download the solution here.
This week we are looking at football data with a focus on the historical results for Southend United. Southend had only won 18 times in their last 108 games, so we look back at the historical results which were a lot more promising compared to recent times.
Step 1 - Tidy Input
The first task after inputting the data is to clean some of the fields, including renaming the P1 field to Pts and excluding any null rows.
We can rename the P1 field right in the input step by double clicking on P1 and then renaming to Pts
We can use a filter to remove any of the Null values from the Season field, and our table should now look like this:
Next we can create the special circumstances field using the following calculation:
Special Circumstances
IF ISNULL([POS])
THEN "Incomplete"
ELSEIF [POS]="ABAND"
THEN "Abandoned due to WW2"
ELSE "N/A"
END
Then the final part of initial cleaning is to make sure that the POS field only has data for full seasons, so we can replace the ABAND values with Null.
Step 2 - Extract Numeric Values from League
Next we want to extract the numeric values from each of the leagues so that we can compare how good each of the leagues is. Following the requirements, we need to extract the numbers from each of the leagues and then classify the FL-CH as 0 and the NAT-P as 5. We can do this using the following calculation:
Numeric League
IF REGEXP_MATCH([LEAGUE],".*\d.*")
THEN int(REGEXP_EXTRACT([LEAGUE],'(\d)'))
ELSEIF [LEAGUE]="FL-CH" THEN 0
ELSE 5
END
This calculation first identifies whether there is a number within the string by using the REGEX_Match function. If this is true, then we use the Regex Extract function to pull the numbers, and if not then we return a 0 or 5 depending on the league name.
Here's what the table should look like:
Step 3 - Create Outcome
Next we need to classify whether or not Southend were promoted, relegated, or stayed in the same league in each of the seasons. We can do this by comparing whether the next seasons league was higher or lower than the previous one.
First we need to classify what year the seasons started in, we can do this by using a custom split to split the season field to return the first 4 digits before the '-':
From here we can change the season start to a number, then create a rank of the seasons so we have an order that they occurred:
Full Season Order
Next we want to use this order field to bring the current and next season onto the same row. We can do this by first creating a new step, then using a calculation to subtract 1 from the Season Order.
From here we can join this back onto the previous step by using the Full Season Order fields.
Now we have the current and next season on the same row we can now calculate the outcome of that season using the following calculation:
Outcome
IF [Special Circumstances]="Incomplete"
THEN NULL
ELSEIF [Numeric League Next Season]=[Numeric League]
THEN "Same League"
ELSEIF [Numeric League Next Season]>[Numeric League]
THEN "Relegated"
ELSE "Promoted"
END
After doing some further tidying the table should now look like this:
Step 4 - Special Circumstances
We can now fill in the missing rows that occurred due to WW1 and WW2 by using the New Rows step. Within this step we add rows from the Season Start field where any new rows should be null:
Then after completing the missing rows we can update null values in the season, special circumstances and outcome fields with the following calculations:
Season
IF ISNULL([SEASON])
THEN str([Season Start])+"-"+str(int(right(str([Season Start]),2))+1)
ELSE [SEASON]
END
Special Circumstances
IF ISNULL([Special Circumstances])
THEN
THEN "WW1"
ELSE "WW2"
END)
ELSE [Special Circumstances]
END
Outcome
IF ISNULL([Outcome])
THEN "N/A"
ELSE [Outcome]
END
After these calculations we are ready to output our data:
You can download the full output here.
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!