2024: Week 30 - Solution
Solution by Tom Prowse and you can download the workflow here.
Step 1 - Qualification & Football Association
Using the results input we first want to filter to remove any qualifiers from the tournament field. For this we can use a wildcard filter to remove any fields containing 'qualifying'
Then we want to extract the Football Association from the tournament field. For this we want to extract any of the string that is in capitals and has a length of 2 or more. The regex calculation for this looks like this:
Football Association
REGEXP_EXTRACT([tournament],'([A-Z]{2,}).*')
Then finally we can clean the competition field by removing the football association from the title or just returning the tournament name:
Competition
TRIM(
IF ISNULL([Football Association])
THEN [tournament]
ELSE REPLACE([tournament],[Football Association],'')
END
)
Then finally we want to change any competitions containing the word African and replace it with 'Africa':
Competition
REPLACE([Competition],'African','Africa')
After these calculations the table should look like this:
Step 2 - International Competitions
We can now combine the International Competitions table by using an inner join on the Competition field:
Now we've combined the tables we identify whether it was a Football Association match or not by comparing the fields from each table:
Football Association Match?
[Football Association]=[Football Association-1]
Then we can exclude the False values.
Then we want to ensure that the correct Football Association is being shown if there is a null:
Football Association
IF ISNULL([Football Associations Match?])
THEN [Football Association-1]
ELSE [Football Association-1]
END
We can then remove the fields Football Association Match?, Competition-1, Football Association-1, Dates, & Tournament then ensure that CONCACAF is showing correctly:
Competition
IF [Football Association]='CONCACAF'
THEN 'Gold Cup'
ELSE [Competition]
END
Next we want to filter for matches since 1950s onwards so we need to create a decade field:
Decade
INT(LEFT(STR(YEAR([date])),3)+'0')
Then we can use this field to filter a minimum of 1950
Then using the date we can create a match ID using a Rank calculation:
Match ID
And finally we want to calculate how many matches were in each decade using a Fixed LOD:
Matches in Decade
The table should now look like this:
Step 3 - Goal Scorers & Segment
Next we want to combine the goalscorers table but first we want to remove any values where the Minute field is Null.
We can combine the tables using an inner join on the date, home_team, and away_team fields.
Then we can remove the duplicated fields and then prepare thee Segment input so that it is ready to combine.
Within the Segment table we want to split the Segment field using the '-' as a separator so that we have an upper and lower bound. There are a couple of null values so we can replace these with 90 (lower bound) and 180 (upper bound).
We can then join this to our existing workflow by using an inner join where minute >= Lower Bound and minute < Upper Bound:
Step 4 - Prepare Output
The last step in the challenge is to aggregate where we group by Competition, Decade, Segment, and Matches in a Decade per Competition, then Sum the Number of Rows:
Then finally we can rename the Number of Rows to Total Goals and then calculate the Expected Number of Goals:
Expected Number of Goals
ROUND(
[Total Goals]
/
[Matches in a Decade per Competition]
,2)
Our final output should look like this:
You can view the outputs here.
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!