2024: Week 27 - Tour de France special

Created by: Carl Allchin

The Tour de France starts this week. The world's biggest and most popular cycling race is celebrated and we wanted to mark the occasion. 

Here at Preppin' Data, we have our own cycling brand: Allchains. It's a fake company so we've never had any success. As a British cycling fan, a lot of my joy in the sport has come from Mark Cavendish's wins that have made him the real world most successful stage winner in the Tour de France's history. 

This Preppin' uses real world data to create a data source so you can explore what it takes to become the greatest sprinter in Tour de France history. In this challenge you will need to combine the Tour de Frances Cavendish has raced, the stages he's won, what types of stages they were and where he finished overall in the race that year (if he finished at all). 

Inputs

There are four data sets:

1. Stages — All the stages in the years that Cavedish raced the Tour.

2. Stage Type — The type of stages raced

3. Wins — All of Cavendish's career wins

4. Grand Tour Starts — All the Grand Tours (there's 3 major races in a year) Cavendish has made and where he's finished


Requirements

  • Input the data sets
  • Join together the Stages and Stage Type table
  • Create a field called 'Time Trial ?' to determine if the stage was a Time Trial
    • Time trials are shown as either Individual Time Trials (ITT)s or Team Time Trials (TTT)s
  • Create an 'Origin - Destination' field and a 'Stage Number' field
    • For the Stage Number, capture the Prologue stage as 0
  • Join in the 'Wins' data. Keep all of the stages and add additional details of the stages Cavendish has won
    • You'll need to prepare the data to be able to join to the data set you've built thus far
  • Remove all duplicate fields created within the Joins
  • Input the Grand Tour Starts but keep only the Tour de France starts
  • Rename:
    • GC to 'General Classification Finishing Position'
    • Points to 'Points Finishing Position' 
  • Join the races started to the overall data set to provide the overall finishing position and points finishing position
  • Create a 'Stages Won?' field from the Stage Number Won to a 'Yes' when Cavendish won the stage or NULL if not. 
  • Output the data

Output



9 data fields:
  • Stage Won?
  • Stage Number
  • Origin - Destination
  • Time Trial?
  • General Classification
  • Points Finishing
  • Year
  • KM
  • Stage Type
294 rows (295 rows incl. headers)

You can view the outputs here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text