Posts

Showing posts from July, 2024

2024: Week 31 - Olympics Special

Image
Challenge by: Zak Saucede For the final challenge in Preppin' Data's sports themed month we're turning our attention towards the Olympics. More specifically, we're looking at the Women's Heptathlon, using data from the World Athletics Championship in August 2023.   Zak, a Data School Consultant from DS39, recently created the viz below, after realising that despite ranking in the Top 3 for 5 out of the 7 events, it wasn't enough for Anna Hall to win gold. Therefore, he wanted to do further analysis to see if it's better to be a specialist in a few events, or a generalist across all the events. Click here to interact with the viz Inputs 1. Results Table - at a first glance, our data looks very clean, and analysis ready - but 3 data points in a single row is going to require some attention, as well as the position being in brackets for each event 2. Event Lookup Table - Let's add in some additional detail about the events, such as making them more human ...

2024: Week 30 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Qualification & Football Association Using the results input we first want to filter to remove any qualifiers from the tournament field. For this we can use a wildcard filter to remove any fields containing 'qualifying' Then we want to extract the Football Association from the tournament field. For this we want to extract any of the string that is in capitals and has a length of 2 or more. The regex calculation for this looks like this: Football Association REGEXP_EXTRACT([tournament],'([A-Z]{2,}).*') Then finally we can clean the competition field by removing the football association from the title or just returning the tournament name:  Competition TRIM( IF ISNULL([Football Association]) THEN [tournament] ELSE REPLACE([tournament],[Football Association],'') END ) Then finally we want to change any competitions containing the word African and replace it with 'Africa':  Competit...

2024: Week 30 - International Football Special

Image
Challenge by: Jenny Martin It's been a big month for International Football, with both the Euros and the Copa América. Personally, I'm not a big football fan, so the question that I'm looking to answer is: If I'm only going to watch 15 minutes of a match, which time interval will be the most exciting? How does that vary by competition and has it changed over the years? Inputs The data this week comes from an incredible data source I found on Kaggle which contains over 47,000 results of international football matches. Thanks to  Mart Jürisoo  for collecting this data!  Results table  International Competitions table  Goal Scorers table - currently only available for World Cup and Continental Championships so this is where we will focus our analysis  Segment table  Requirements Input the data Filter out Qualification rounds from the Results table Split out the Football Association and Competition from the tournament field e.g. For UEFA Euro, UEFA is the...

2024: Week 29 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - End of Q1 We want to calculate the Q1 times so first need to make sure the Start Date field is recognised as a date field type. We can do this can changing the field type from a string to a date time. Next we want to calculate the starting time by finding the earliest time across the whole data set. We can calculate this using a fixed LOD:  Starting Time    Then calculate the end time by adding 20 mins:  End of Q1   DATEADD('minute',20,[Starting Time]) Step 2 - Q1 Times We are now in a position to work out the qualifying times within Q1. For this we want to filter all of the rows where the date_start field is less than the end of Q1 -  [date_start]<[End of Q1] Then aggregate the rows so we find the minimum lap_duration for each driver: Then finally we want to rank the lap_duration to find the position: Position   Then finally filter so we only return the bottom positions lower ...

2024: Week 29 - Formula 1 Qualifying Special

Image
Challenge by: Dan Wade Dan is part of Data School cohort DS42 and has recently set himself a challenge to create 24 Formula 1 vizzes , one for each race week of the season. Naturally, this means he's had to do a lot of data preparation so he's been very generous to also prepare a challenge for our Sports themed July. Over to Dan to explain the challenge. In Formula 1, each driver participates in a qualifying session which sets the starting grid for the race. A good qualifying result can give a driver a better starting position, which can be crucial for a strong race performance and being in front reduces the risk of being involved in accidents. Qualifying is made up of three rounds, Q1, Q2 and Q3. It begins with Q1, where all 20 drivers compete for 18 minutes; the five slowest are eliminated and placed in positions 16-20. Q2 follows, lasting 15 minutes, with the remaining 15 drivers vying for the fastest times; the five slowest are then eliminated and placed in positions 11-15....

2024: Week 28 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Singles Championship First we focus on cleaning the Singles winners and getting the data into a place where we can union it with the other tables.  Initially we want to remove any null values from the Men field, and then create an ID field based on the year:  ID   IF LEN([year])>4 THEN RIGHT([year],1) END We can then use this to make sure the year is numeric:  Year   IF ISNULL([ID]) THEN [year] ELSE LEFT([year],LEN([year])-1) END After this we can change the Year to a whole number and then filter to keep the ID that equal 3 or null.  At this stage we can remove any extra fields so that we have the Year, Men, and Women fields. Next we need to pivot the data so that we have the men and women in a single column. For this we use a columns to rows pivot:  We can then rename any fields and rename the Men and Women rows to be Men's or Women's Singles:  Tournament   IF [Tourn...