2024: Week 42 - Strictly Come Dancing

Challenge by: Jenny Martin

Autumn is always the time of year that Strictly Come Dancing returns to our TVs in the UK. I always find it interesting which songs are chosen for the couples to dance to - particularly when there are repeats. With 22 seasons, this repetition is not surprising, so I set about gathering a dataset that would allow me to see what the most common song choices are. 

Inputs

I used this as an opportunity to learn to webscrape Wikipedia using Python (with a lot of help from ChatGPT!), so the resulting table is a combination of each dance from each series:

Requirements

  • Input the data
  • One thing the data is missing is a year field for when the Series took place
    • Series 1 and 2 were both in 2004
    • All following series happen annually
      • Series 3 in 2005 etc.
  • The webscraping isn't quite perfect and the table headers are repeated throughout the dataset, make sure these are removed
  • Split the Week field into a numeric value and put extra details in the theme week 
    • Split this theme week further, so that it it's the Final/Semi Final/Quarter Final this detail is in a Stage field instead
  • The Score field is made up of the Total Score and individual judges scores. Since the number of judges can vary depending on the series/week, split the Score field into these 2 categories
  • In certain weeks there can be a group dance. These can be identified by the word group or marathon in the Dance field. Update the Couple field to be Group and ensure there is only 1 row for these dances so the music choice is only counted once
  • There can be more than 1 song in the Music field. Make sure there is a row for each song, as well as the song and artist being in separate fields
  • You may notice we have some additional fields such as Film and Musical. These correspond with the theme weeks. Since there will only be a maximum of 1 theme per week, combine these fields into 1
  • Remove unnecessary fields
  • Output the data

Output

  • 13 fields
    • Year
    • Series
    • Week
    • Stage
    • Theme
    • Theme Detail
    • Couple
    • Score
    • Judges Scores
    • Dance
    • Song
    • Artist
    • Result
  • 2,524 rows (2,525 including headers)
You can view the output here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 



Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text