Posts

2024: Week 30 - International Football Special

Image
Challenge by: Jenny Martin It's been a big month for International Football, with both the Euros and the Copa América. Personally, I'm not a big football fan, so the question that I'm looking to answer is: If I'm only going to watch 15 minutes of a match, which time interval will be the most exciting? How does that vary by competition and has it changed over the years? Inputs The data this week comes from an incredible data source I found on Kaggle which contains over 47,000 results of international football matches. Thanks to  Mart Jürisoo  for collecting this data!  Results table  International Competitions table  Goal Scorers table - currently only available for World Cup and Continental Championships so this is where we will focus our analysis  Segment table  Requirements Input the data Filter out Qualification rounds from the Results table Split out the Football Association and Competition from the tournament field e.g. For UEFA Euro, UEFA is the Football Associat

2024: Week 29 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - End of Q1 We want to calculate the Q1 times so first need to make sure the Start Date field is recognised as a date field type. We can do this can changing the field type from a string to a date time. Next we want to calculate the starting time by finding the earliest time across the whole data set. We can calculate this using a fixed LOD:  Starting Time    Then calculate the end time by adding 20 mins:  End of Q1   DATEADD('minute',20,[Starting Time]) Step 2 - Q1 Times We are now in a position to work out the qualifying times within Q1. For this we want to filter all of the rows where the date_start field is less than the end of Q1 -  [date_start]<[End of Q1] Then aggregate the rows so we find the minimum lap_duration for each driver: Then finally we want to rank the lap_duration to find the position: Position   Then finally filter so we only return the bottom positions lower than 16 -  [Position]>=

2024: Week 29 - Formula 1 Qualifying Special

Image
Challenge by: Dan Wade Dan is part of Data School cohort DS42 and has recently set himself a challenge to create 24 Formula 1 vizzes , one for each race week of the season. Naturally, this means he's had to do a lot of data preparation so he's been very generous to also prepare a challenge for our Sports themed July. Over to Dan to explain the challenge. In Formula 1, each driver participates in a qualifying session which sets the starting grid for the race. A good qualifying result can give a driver a better starting position, which can be crucial for a strong race performance and being in front reduces the risk of being involved in accidents. Qualifying is made up of three rounds, Q1, Q2 and Q3. It begins with Q1, where all 20 drivers compete for 18 minutes; the five slowest are eliminated and placed in positions 16-20. Q2 follows, lasting 15 minutes, with the remaining 15 drivers vying for the fastest times; the five slowest are then eliminated and placed in positions 11-15.

2024: Week 28 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Singles Championship First we focus on cleaning the Singles winners and getting the data into a place where we can union it with the other tables.  Initially we want to remove any null values from the Men field, and then create an ID field based on the year:  ID   IF LEN([year])>4 THEN RIGHT([year],1) END We can then use this to make sure the year is numeric:  Year   IF ISNULL([ID]) THEN [year] ELSE LEFT([year],LEN([year])-1) END After this we can change the Year to a whole number and then filter to keep the ID that equal 3 or null.  At this stage we can remove any extra fields so that we have the Year, Men, and Women fields. Next we need to pivot the data so that we have the men and women in a single column. For this we use a columns to rows pivot:  We can then rename any fields and rename the Men and Women rows to be Men's or Women's Singles:  Tournament   IF [Tournament]='men' THEN "Men&

2024: Week 28 - Wimbledon Special

Image
Challenge by: Jenny Martin Carl's Tour de France challenge last week has inspired a sports themed Preppin' Data month! This week the focus turns to Wimbledon. I thought it would be interesting to understand the crossover of Singles and Doubles Champions. Carl thought there wouldn't be many players who had become champions as both Singles players and Doubles players, but I thought the data could prove otherwise. Who's right? You'll have to complete the challenge to find out! Inputs The data for the Wimbledon Champions is found in 3 separate tables: Singles Champions  Taken from Britannica Doubles Champions  Taken from  Britannica Mixed Doubles Champions  Taken from Wikipedia Requirements Input the data Filter out the years where the championship did not take place Ensure the Year field is numeric Reshape the data so there is a row for each Champion, for each Year, even where there are 2 winners in the Doubles Make sure it's clear which tournament they were the Ch

2024: Week 27 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Join Stage & Stage Type First we want to input the Stages and Stage Type tables in our workflow and we can join these together using an inner join where Stage Type = Stage Type ID We can then remove the fields F4, Date, Day, Stage Type, and Stage Type ID so our table look like this:  Step 2 - Time Trial? Next we want to identify whether the stage was a time trial or not. For this we can use an IF statement alongside the Contains function to identify the ITT or TTT notation within the Stage field:  Time Trial?   IF Contains([Stage],'(ITT)') THEN 'Individual' ELSEIF Contains([Stage],'(TTT)') THEN 'Team' END  We will then have a Individual, or Team if the stage was a time trial or Null if it was not a time trial.  Step 3 - Origin, Destination & Stage Number The information for the origin, destination and stage number are contained within the Stage field. Therefore we can use a

2024: Week 27 - Tour de France special

Image
Created by: Carl Allchin The Tour de France starts this week. The world's biggest and most popular cycling race is celebrated and we wanted to mark the occasion.  Here at Preppin' Data, we have our own cycling brand: Allchains. It's a fake company so we've never had any success. As a British cycling fan, a lot of my joy in the sport has come from Mark Cavendish's wins that have made him the real world most successful stage winner in the Tour de France's history.  This Preppin' uses real world data to create a data source so you can explore what it takes to become the greatest sprinter in Tour de France history. In this challenge you will need to combine the Tour de Frances Cavendish has raced, the stages he's won, what types of stages they were and where he finished overall in the race that year (if he finished at all).  Inputs There are four data sets : 1. Stages — All the stages in the years that Cavedish raced the Tour. 2. Stage Type — The type of sta