Posts

Showing posts from December, 2024

2024: Week 52 - Naughty or Nice?

Image
 Created by: Carl Allchin Well another year of Preppin' Data comes to an end with a perfect challenge of whether you are on the naughty or nice list? Santa has received a number of files that even he has been able to union together but needs some help to work out whether each person is on the naughty or nice list.  Santa has spotted there is a number of names that have been duplicated as records of niceness and naughty deeds have been submitted throughout the year so can you help him solve who should be in each list based on: If a name is only on the nice or naughty list, then that's what list they are part of.  If a name is on both the naughty and nice list, the number of times they feature on each list will determine whether they are naughty or nice. (Whichever list has the most mentions determines the end result) If there is an even split between naughty and nice mentions, the latest mention will determine which list the person will end up on.  …and there was you ...

2024: Week 50 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Reshape Data Our first task is to combine our two tables so that we have a single table to work from. This involves a few different steps and joins to get the data in the right shape. First we want to input both of the tables and join these together using an inner join where country = entity.  From here we can remove the entity field and rename the Life Expectancy at Birth to Country: Life Expectancy at Birth. We also want to include the continent information, so from the List of Countries input, we can create an aggregate step and group by Continent. This will create a list of all the continents which we can use to join back to our Life Expectancy table. With the second join this time we want to use an inner join where entity = continent:  We can then remove the Entity and Code fields and rename the Life Expectancy at Birth field to Continent: Life Expectancy at Birth.  Now we have both of these tab...

2024: Week 49 - Solution

Image
Solution by Tom Prowse and you can download the workflow here .  Step 1 - Input Files The first step is to input all of the sheets from the Excel file. For this we can connect up to the Excel document, connect to one of the sheets and then use the 'Union Multiple Tables' in the input step to bring all of the files into a single input. We don't need to change any of the settings as we want to bring in everything:  After the sheets are all in we can make some changes to extract the jersey number and clean so other fields.  To get the jersey number, we can duplicate the Name field and then use the in-built functionality to remove any letters and punctuation from the duplicated field.  Once we have done this we can rename to number and make sure it's a whole number.  We can also create a 10cm grouping based on the heights of players. We'll use this later in the flow so it's good to calculate this earlier so that it can be reused. Height   ROUND(FLOAT(LEFT([HT],...

2024: Week 48 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . Step 1 - Ranking Field & Tie Breaker The first task is to create a Ranking field for each of the different sports. After inputting the table for each sport then we want to create a separate branch for each of them (4 in total). Within each of these branches we want to rename the corresponding field to ranking field as follows:  NBA Results WWins = Ranking Field NFL Results   WWins = Ranking Field Premier League PtsPoints = Ranking Field Rugby Aviva Premiership   PTS = Ranking Field Once we have renamed each of these fields we can then create a tie breaker for each branch. NBA Results   First we want to rename the GBGames behind field to Tie Breaker 1 and then use the first part of ConfConference record field as our Tie Breaker 2. In order to get just the first part of the field we can use a Split calculation:  Tie Breaker 2 TRIM( SPLIT( [ConfConference record], "-", 1 ) ) Next we want to keep on...

2024: Week 47 - Solution

Image
  Solution by Tom Prowse and you can download the workflow here .  Step 1 - Deduplicate Rows As we have used the Tableau Agent to help us write this challenge, we thought it would be good to use it to help us solve it as well! Although the identify duplicates functionality is already available in Tableau Prep, we can also use the Tableau Agent within the calculation window. To use this all we need to do is create a new calculated field and then write in what we want the task to be. In our case we want to identify duplicate values within the table, so we can use this as our prompt:  The Tableau Agent will then return a calculated field that we need and an explanation on how/why it has returned this value. We can then use the 'Replace Calculation' button and rename the calculated field to create the calculation.  Now we have this calculated field we can then exclude the True values. We can then remove the duplicate values field and then output the table. You can view t...

2024: Week 51 - Strictly Positive Improvements?

Image
Challenge by: Jenny Martin It was the final of this year's series of Strictly Come Dancing at the weekend, which made me have a few more questions I'd like to answer using the dataset we used in a challenge a few week's back . If you've yet to catch up on this year's final, don't worry this data doesn't include results from this year.  As I watched the final this year I found myself wondering, is it the contestant that makes the biggest improvement that is most likely to win? Or is it the contestant that has been consistently good since week one? Let's work out the percentage improvement of the finalists from week 1 in order to answer this question! One additional question, does getting perfect scores in the final guarantee you'll win?  Input  The input is the same as challenge 42. If you recall this data was webscraped and so it does require a little cleaning before we can start answering the questions.  Requirements Input the data The webscraping i...

2024: Week 50 - Life Expectancy

Image
 Challenge by: Thomas Duong Thomas has just finished his training at the Data School and has created a Preppin' Data challenge to celebrate! Over to Thomas to explain the challenge:

2024: Week 49 - Jersey Number Analytics

Image
Created by: Carl Allchin When watching basketball this weekend, I suddenly started to wonder what is the lowest number that isn't found on any NBA players jersey? Well there's only one way to work that out… data! But there are other questions that started flowing too: What's the average salary per 10 centimetres grown? (how much should my parents incentivised me to eat more broccoli?) What's the tallest team in the NBA? And who's the shortest team?  Do taller players wear larger numbers? Just like Week 4 2019 challenge, data preparation tools can often allow us to get to the answers we need. Can you find the answers too? Input There's one Excel workbook with 30 individual worksheets: Requirements  Input the data Combine all 30 worksheets together Answer the following questions through creating separate flows: What's the lowest number not used on a jersey in the NBA? What's the average salary per 10cm of height?  What's the shortest average height te...