Preppin' Data

Posts

2025: Challenge 1 - Console Game Sales

July 11, 2025

Challenge by: Rosh Khan Context This week, we are focusing on an online marketplace for buying and selling used games. The marketplace wants to identify its top performing games by their rating and sales so they can reach out to game developers for potential partnership opportunities. But first, the data needs to be cleaned up! Inputs There are two inputs: 1. Sales Jan-Feb 25 2. GBP Exchange Rates Requirements Input the 2 files The field ‘Name’ contains game titles and their developers. Fix this by creating two fields: ‘Game Title’ and ‘Developer’. The ‘User Review’ field also contains two pieces of information; a star rating out of 5 and any comments left by customers. Extract only the star rating. If no rating is present, keep it as N/A. Keep only 5 star rated games. Connect the GBP Exchange Rates file to the Sales Jan-Feb 25 table and use the ‘Exchange Rate’ field to convert all sales into GBP. Name this field ‘Sales (£)’. Sales currency based on the local currency of where the sale...

2024: Week 52 - Solution

January 02, 2025

Solution by Tom Prowse and you can download the workflow here . Step 1 - Split Nice or Naughty After inputting the data we can then start to split out the names on the Naughty or Nice list and the order in which they are on the list. First we want to split the List and Order from the File Paths field. We can do this using an automatic split or using a split on the space. We can then rename split 1 to List and split 2 to List Order. Step 2 - Index & Score Next we want to create an index field by combing the File Path number and id. Index ([List Order]*100)+[id] From here we can then calculate a Nice and Naughty Score: Nice Score {FIXED [first_name]: SUM(IF [List] = 'Nice' THEN 1 ELSE 0 END)} Naughty Score {FIXED [first_name]: SUM(IF [List] = 'Naughty' THEN 1 ELSE 0 END)} Then finally we can determine what list each name should be on based on their scores. Nice or Naughty? IF [Nice Score] > [Naughty Score] THEN 'Nice' ELSEIF [Naugh...

2024: Week 51 - Solution

January 02, 2025

Solution by Tom Prowse and you can download the workflow here . Step 1 - Clean Week First we can clean the table so that we remove any rows within the Couple field that are equal to 'Couple'. These are the headers from where the web scraping wasn't quite accurate. We can then focus on the Week field and making sure that it's a numeric field. First we can split the field so that we only have the information from before the first ':' We can then use this field to remove any punctuation and letters so that we only have a number remaining. We can then remove the other fields, rename this one to Week, and make sure it's a whole number. Step 2 - Finalists Next we want to filter so that we're only looking at the finalists from each series. First we want to make sure that the 'Runners-Up' values are grouped together and have the same spelling. Then we want to create a field to identify the finalists: Finalist IF [Result]='Winners' OR [Res...

2024: Week 52 - Naughty or Nice?

December 25, 2024

Created by: Carl Allchin Well another year of Preppin' Data comes to an end with a perfect challenge of whether you are on the naughty or nice list? Santa has received a number of files that even he has been able to union together but needs some help to work out whether each person is on the naughty or nice list. Santa has spotted there is a number of names that have been duplicated as records of niceness and naughty deeds have been submitted throughout the year so can you help him solve who should be in each list based on: If a name is only on the nice or naughty list, then that's what list they are part of. If a name is on both the naughty and nice list, the number of times they feature on each list will determine whether they are naughty or nice. (Whichever list has the most mentions determines the end result) If there is an even split between naughty and nice mentions, the latest mention will determine which list the person will end up on. …and there was you ...

2024: Week 50 - Solution

December 23, 2024

Solution by Tom Prowse and you can download the workflow here . Step 1 - Reshape Data Our first task is to combine our two tables so that we have a single table to work from. This involves a few different steps and joins to get the data in the right shape. First we want to input both of the tables and join these together using an inner join where country = entity. From here we can remove the entity field and rename the Life Expectancy at Birth to Country: Life Expectancy at Birth. We also want to include the continent information, so from the List of Countries input, we can create an aggregate step and group by Continent. This will create a list of all the continents which we can use to join back to our Life Expectancy table. With the second join this time we want to use an inner join where entity = continent: We can then remove the Entity and Code fields and rename the Life Expectancy at Birth field to Continent: Life Expectancy at Birth. Now we have both of these tab...

2024: Week 49 - Solution

December 23, 2024

Solution by Tom Prowse and you can download the workflow here . Step 1 - Input Files The first step is to input all of the sheets from the Excel file. For this we can connect up to the Excel document, connect to one of the sheets and then use the 'Union Multiple Tables' in the input step to bring all of the files into a single input. We don't need to change any of the settings as we want to bring in everything: After the sheets are all in we can make some changes to extract the jersey number and clean so other fields. To get the jersey number, we can duplicate the Name field and then use the in-built functionality to remove any letters and punctuation from the duplicated field. Once we have done this we can rename to number and make sure it's a whole number. We can also create a 10cm grouping based on the heights of players. We'll use this later in the flow so it's good to calculate this earlier so that it can be reused. Height ROUND(FLOAT(LEFT([HT],...

2024: Week 48 - Solution

December 23, 2024

Solution by Tom Prowse and you can download the workflow here . Step 1 - Ranking Field & Tie Breaker The first task is to create a Ranking field for each of the different sports. After inputting the table for each sport then we want to create a separate branch for each of them (4 in total). Within each of these branches we want to rename the corresponding field to ranking field as follows: NBA Results WWins = Ranking Field NFL Results WWins = Ranking Field Premier League PtsPoints = Ranking Field Rugby Aviva Premiership PTS = Ranking Field Once we have renamed each of these fields we can then create a tie breaker for each branch. NBA Results First we want to rename the GBGames behind field to Tie Breaker 1 and then use the first part of ConfConference record field as our Tie Breaker 2. In order to get just the first part of the field we can use a Split calculation: Tie Breaker 2 TRIM( SPLIT( [ConfConference record], "-", 1 ) ) Next we want to keep on...