2024: Week 30 - International Football Special

Challenge by: Jenny Martin

It's been a big month for International Football, with both the Euros and the Copa América. Personally, I'm not a big football fan, so the question that I'm looking to answer is:

If I'm only going to watch 15 minutes of a match, which time interval will be the most exciting? How does that vary by competition and has it changed over the years?

Inputs

The data this week comes from an incredible data source I found on Kaggle which contains over 47,000 results of international football matches. Thanks to Mart Jürisoo for collecting this data! 
  1. Results table 

  2. International Competitions table 

  3. Goal Scorers table - currently only available for World Cup and Continental Championships so this is where we will focus our analysis 

  4. Segment table 

Requirements

  • Input the data
  • Filter out Qualification rounds from the Results table
  • Split out the Football Association and Competition from the tournament field
    • e.g. For UEFA Euro, UEFA is the Football Association and Euro is the Competition
    • Not all tournaments contain information about the Football Association
  • Join to the International Competitions table
    • All 8 competitions should be included
    • For the CONCACAF Championship ensure the correct Football Association and Competition are joined
  • Create a field for the Decade the competition took place in
  • Filter the data to 1950s onwards
  • Create a Match ID field so every row in the data had a unique identifier
  • Calculate the number of matches in each Decade, in each Competition
  • Filter out the nulls from the Goal Scorers table and join to the dataset
  • Join on the Segment table based on what segment of time the goal was scored in
    • e.g. a goal scored 25 minutes into the game should be in the 15-30 segment
  • Count how many goals were scored in each Segment, for each Competition and Decade
  • Calculate the Expected number of Goals for each Segment, Competition and Decade
  • Output the data

Output


  • 6 fields
    • Competition
    • Decade
    • Segment
    • Total Goals
    • Matches in a Decade per Competition
    • Expected number of Goals
  • 351 rows (352 including headers)

You can view the outputs here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text