2024: Week 41 - Forbes Highest Paid Athletes

Challenge by: Robbin Vernooij 

Recently, one of the Data School Coaches, Robbin, set the following challenge. It seemed perfect for a Preppin' Data, so over to Robbin:

We'd like to get historical data on the highest paid athletes so we can do temporal analysis.

Lucky us, it turns out Wikipedia has been tracking the Forbes list of the world's highest-paid athletes. Unlucky us, it is in an HTML table format with human readable symbols and table by table basis. Now it's time for you to clean it up into one single dataset, so that it's ready for analysis.

Inputs

The data for this challenge comes from this Wikipedia page. There is a table for each year that looks like this (2024 example):

As well as a source table: 

Requirements

  • Input the data
  • Bring all the year tables together into a single table
  • Merge any mismatched fields (there should not be any Null values) 
  • Create a numeric Year field
  • Clean up the fields with the monetary amounts 
    • One way of doing this could be pivot all 3 columns into a single column to do these cleaning calculations once and then pivot back to 3 columns
    • Make sure that any value in millions is translated to that amount 
      • e.g. $6 million becomes 6,000,000
  • Bring in the source information so that it is associated with each row
  • Remove unnecessary fields
  • Output the data

Output


  • 9 fields
    • Year
    • Rank
    • Name
    • Sport
    • Country
    • Total earnings
    • Salary/Winnings
    • Endorsements
    • Source
  • 130 rows (131 including headers)
You can view the output here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text