2024: Week 30 - Solution



Solution by Tom Prowse and you can download the workflow here.



Step 1 - Qualification & Football Association

Using the results input we first want to filter to remove any qualifiers from the tournament field. For this we can use a wildcard filter to remove any fields containing 'qualifying'



Then we want to extract the Football Association from the tournament field. For this we want to extract any of the string that is in capitals and has a length of 2 or more. The regex calculation for this looks like this:

Football Association
REGEXP_EXTRACT([tournament],'([A-Z]{2,}).*')

Then finally we can clean the competition field by removing the football association from the title or just returning the tournament name: 

Competition
TRIM(
IF ISNULL([Football Association])
THEN [tournament]
ELSE REPLACE([tournament],[Football Association],'')
END
)

Then finally we want to change any competitions containing the word African and replace it with 'Africa': 

Competition 
REPLACE([Competition],'African','Africa')

After these calculations the table should look like this: 



Step 2 - International Competitions

We can now combine the International Competitions table by using an inner join on the Competition field: 


Now we've combined the tables we identify whether it was a Football Association match or not by comparing the fields from each table: 

Football Association Match?  
[Football Association]=[Football Association-1]

Then we can exclude the False values.

Then we want to ensure that the correct Football Association is being shown if there is a null: 

Football Association
IF ISNULL([Football Associations Match?])
THEN [Football Association-1]
ELSE [Football Association-1]
END

We can then remove the fields Football Association Match?, Competition-1, Football Association-1, Dates, & Tournament then ensure that CONCACAF is showing correctly: 

Competition 
IF [Football Association]='CONCACAF'
THEN 'Gold Cup'
ELSE [Competition]
END

Next we want to filter for matches since 1950s onwards so we need to create a decade field: 

Decade 
INT(LEFT(STR(YEAR([date])),3)+'0')

Then we can use this field to filter a minimum of 1950


Then using the date we can create a match ID using a Rank calculation:

Match ID 



And finally we want to calculate how many matches were in each decade using a Fixed LOD: 

Matches in Decade 


The table should now look like this: 



Step 3 - Goal Scorers & Segment

Next we want to combine the goalscorers table but first we want to remove any values where the Minute field is Null. 

We can combine the tables using an inner join on the date, home_team, and away_team fields. 



Then we can remove the duplicated fields and then prepare thee Segment input so that it is ready to combine.

Within the Segment table we want to split the Segment field using the '-' as a separator so that we have an upper and lower bound. There are a couple of null values so we can replace these with 90 (lower bound) and 180 (upper bound).

We can then join this to our existing workflow by using an inner join where minute >= Lower Bound and minute < Upper Bound:



Step 4 - Prepare Output

The last step in the challenge is to aggregate where we group by Competition, Decade, Segment, and Matches in a Decade per Competition, then Sum the Number of Rows:



Then finally we can rename the Number of Rows to Total Goals and then calculate the Expected Number of Goals: 

Expected Number of Goals 
ROUND(
[Total Goals]
/
[Matches in a Decade per Competition]
,2)

Our final output should look like this: 


You can view the outputs here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text