Posts

2025: Challenge 1 - Solution

Image
Solution by: Rosh Khan  You can download the workbook from here Step 1- Input Data  As with all challenges, the data can be downloaded from the challenge requirements. As we are working with Excel files, we can choose Microsoft Excel as our connection type. Step 2 - Split the field ‘Name’ into Game Title and Developer We know that the field ‘Name’ contains two pieces of information: Game Title and Developer, and therefore needs to be separated. The characters separating the two in this case are “ (made by “ , therefore, we can click on More Options under the field ‘Name’ -> Split Value -> Custom Split -> and define our separator to be “ (made by “  and Split off the first 2 fields. Please note: There is a space before ‘)‘ and after ‘by’ After applying the following formatting: Change field names to be ‘Game Title’ and ‘Developer’ Removing trailing and leading spaces from both fields Removing punctuation from the ‘Developer’ field (More Options -> Clean -> R...

2025: Challenge 1 - Console Game Sales

Challenge by: Rosh Khan Context This week, we are focusing on an online marketplace for buying and selling used games. The marketplace wants to identify its top performing games by their rating and sales so they can reach out to game developers for potential partnership opportunities. But first, the data needs to be cleaned up! Inputs There are two inputs: 1. Sales Jan-Feb 25 2. GBP Exchange Rates Requirements Input the 2 files The field ‘Name’ contains game titles and their developers. Fix this by creating two fields: ‘Game Title’ and ‘Developer’. The ‘User Review’ field also contains two pieces of information; a star rating out of 5 and any comments left by customers. Extract only the star rating. If no rating is present, keep it as N/A. Keep only 5 star rated games. Connect the GBP Exchange Rates file to the Sales Jan-Feb 25 table and use the ‘Exchange Rate’ field to convert all sales into GBP. Name this field ‘Sales (£)’. Sales currency based on the local currency of where the sale...

2024: Week 52 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Split Nice or Naughty After inputting the data we can then start to split out the names on the Naughty or Nice list and the order in which they are on the list.  First we want to split the List and Order from the File Paths field. We can do this using an automatic split or using a split on the space. We can then rename split 1 to List and split 2 to List Order.  Step 2 - Index & Score Next we want to create an index field by combing the File Path number and id.  Index   ([List Order]*100)+[id] From here we can then calculate a Nice and Naughty Score:  Nice Score {FIXED [first_name]: SUM(IF [List] = 'Nice' THEN 1 ELSE 0 END)} Naughty Score   {FIXED [first_name]: SUM(IF [List] = 'Naughty' THEN 1 ELSE 0 END)} Then finally we can determine what list each name should be on based on their scores. Nice or Naughty?    IF [Nice Score] > [Naughty Score] THEN 'Nice' ELSEIF [Naugh...

2024: Week 51 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Clean Week First we can clean the table so that we remove any rows within the Couple field that are equal to 'Couple'. These are the headers from where the web scraping wasn't quite accurate. We can then focus on the Week field and making sure that it's a numeric field. First we can split the field so that we only have the information from before the first ':'  We can then use this field to remove any punctuation and letters so that we only have a number remaining. We can then remove the other fields, rename this one to Week, and make sure it's a whole number.  Step 2 - Finalists Next we want to filter so that we're only looking at the finalists from each series. First we want to make sure that the 'Runners-Up' values are grouped together and have the same spelling.  Then we want to create a field to identify the finalists:  Finalist   IF  [Result]='Winners' OR  [Res...

2024: Week 52 - Naughty or Nice?

Image
 Created by: Carl Allchin Well another year of Preppin' Data comes to an end with a perfect challenge of whether you are on the naughty or nice list? Santa has received a number of files that even he has been able to union together but needs some help to work out whether each person is on the naughty or nice list.  Santa has spotted there is a number of names that have been duplicated as records of niceness and naughty deeds have been submitted throughout the year so can you help him solve who should be in each list based on: If a name is only on the nice or naughty list, then that's what list they are part of.  If a name is on both the naughty and nice list, the number of times they feature on each list will determine whether they are naughty or nice. (Whichever list has the most mentions determines the end result) If there is an even split between naughty and nice mentions, the latest mention will determine which list the person will end up on.  …and there was you ...

2024: Week 50 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Reshape Data Our first task is to combine our two tables so that we have a single table to work from. This involves a few different steps and joins to get the data in the right shape. First we want to input both of the tables and join these together using an inner join where country = entity.  From here we can remove the entity field and rename the Life Expectancy at Birth to Country: Life Expectancy at Birth. We also want to include the continent information, so from the List of Countries input, we can create an aggregate step and group by Continent. This will create a list of all the continents which we can use to join back to our Life Expectancy table. With the second join this time we want to use an inner join where entity = continent:  We can then remove the Entity and Code fields and rename the Life Expectancy at Birth field to Continent: Life Expectancy at Birth.  Now we have both of these tab...

2024: Week 49 - Solution

Image
Solution by Tom Prowse and you can download the workflow here .  Step 1 - Input Files The first step is to input all of the sheets from the Excel file. For this we can connect up to the Excel document, connect to one of the sheets and then use the 'Union Multiple Tables' in the input step to bring all of the files into a single input. We don't need to change any of the settings as we want to bring in everything:  After the sheets are all in we can make some changes to extract the jersey number and clean so other fields.  To get the jersey number, we can duplicate the Name field and then use the in-built functionality to remove any letters and punctuation from the duplicated field.  Once we have done this we can rename to number and make sure it's a whole number.  We can also create a 10cm grouping based on the heights of players. We'll use this later in the flow so it's good to calculate this earlier so that it can be reused. Height   ROUND(FLOAT(LEFT([HT],...