2021: Week 13 - Premier League Statistics

Challenge by: Simon Evans

Before Simon joined The Data School in the UK, he was a professional sporting performance analyst. Simon has reached into his previous professional life to come up with a football (read soccer) based challenge for this week. 

Simon is channelling his inner fanalyst to use data to understand more about the game that he enjoys. 

This week we want to create a data set that allows us to analyse 'Open Play Goals' scored. We will rank the players overall and by their position. 

Input

5 csv files, all with a similar structure. There are a lot of columns in these data sets.

Small part of one of the five files

Requirements

Open play goal scoring prowess in the Premier League 2015-2020
  1. Input all the files
  2. Remove all goalkeepers from the data set
  3. Remove all records where appearances = 0
  4. In this challenge we are interested in the goals scored from open play
    • Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored that weren’t penalties or freekicks)
    • Note some players will have scored free kicks or penalties with their left or right foot
    • Be careful how Prep handles null fields! (have a look at those penalty and free kick fields) 
    • Rename the original Goals scored field to Total Goals Scored
  5. Calculate the totals for each of the key metrics across the whole time period for each player, (be careful not to lose their position)
  6. Create an open play goals per appearance field across the whole time period
  7. Rank the players for the amount of open play goals scored across the whole time period, we are only interested in the top 20 (including those that are tied for position) – Output 1
  8. Rank the players for the amount of open play goals scored across the whole time period by position, we are only interested in the top 20 (including those that are tied for position) – Output 2
  9. Output the data – in your solution on twitter / the forums, state the name of the player who was the only non-forward to make it into the overall top 20 for open play goals scored

Output

Overall Rank

Rank by Position


Two files:
  1. Overall Rank
    • 22 Rows (23 including headers)
    • 10 Fields:
      • Open Play Goals
      • Goals with Right Foot
      • Goals with Left Foot 
      • Position
      • Appearances
      • Rank
      • Total Goals
      • Open Play Goals / Game
      • Headed Goals
      • Name
  2. Rank by Position
    • 65 Rows (66 including headers)
    • 10 Fields : as per the first output file

The full outputs can be downloaded here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 




Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text