Posts

Showing posts from June, 2021

2021: Week 26 Rolling Weekly Revenue

Image
Challenge by: Carl Allchin For any analyst, you are likely to get asked to use some complex calculations. For me, rolling or moving calculations are one that I've always preferred to do in the data preparation step rather than when trying to visualise the data if possible. It's saved me from some mistakes!  This week's challenge is looking at creating moving calculations. By this let's use the example below, where on 5th January (yes British date format), if we wanted to understand a rolling week's values, you can include 3 days before the 5th (ie the 2nd, 3rd and 4th) as well as 3 days after the 5th (ie the 6th, 7th and 8th).  Clearly you need to define what your rolling period should include or not. A rolling week could look backwards for 6 days inclusive of the current date or 7 days if you don't use the current date. You could look the same period forward but ultimately you have to articulate what you are covering to your audience. The nature of the data mig...

2021: Week 25 - Solution

Image
Solution by Tom Prowse and you can download the workflow here .  This week's challenge was a bit of a big one! Jenny took the inspiration from a YouTube video about how to categorise the least favourite Pokémon... and thought it would make a great Preppin' Data challenge. Let's look how to solve it.  Step 1 - Table Clean Up The first daunting part of this challenge is all of the different inputs. We have 9 in total so that's lots of different tables that we'll need to clean and combine to make our final output.  Gen 1 First we'll start by bringing in the Gen 1 table and then removing any null values and keeping only the '#' and 'Name' fields so the table looks like this:  Evolution Group Next we want to focus on the Evolution Group table, where we want to convert the '#' field to a number. We can't just change the data type as there are some leading spaces, therefore we can use the 'Trim' function to remove these, and then cha...

2021: Week 25 - The Worst Pokémon

Image
Challenge By: Jenny Martin Often our stakeholders can have very niche knowledge and their requests may baffle us. Luckily, as data preppers we have the tools to tackle any dataset, no matter how bizarre. Yes, Carl and Tom have allowed me to create another Pokémon challenge! The idea came from a  YouTube video that I stumbled across: Who Is Pokemon’s LEAST Favorite Pokémon? The logical steps applied in the video felt like they were screaming out for a Preppin' Data challenge to verify the results! But be warned, the answer to this challenge will differ from the conclusion of the video, due to differing datasets. Inputs We have multiple inputs for this challenge: Gen 1 Pokémon (from Pokémon Database ) Evolution Group (from Bulbapedia - also see Preppin' Data 2021 week 10 )  Evolutions (Bulbapedia) Mega Evolutions (Pokémon DB) Alolan Pokémon (Pokémon DB) Galarian Pokémon (Pokémon DB) Gigantamax Pokémon (from IGN ) Unattainable Pokémon in Sword & Shield (Pokémon DB) Ani...

2021: Week 24 - Solution

Image
Solution by Tom Prowse and you can download the  workflow here .  The challenge this week was to look at managing unscheduled time off at Chin & Beard Suds Co. In the challenge we are looking to analyse how many people have been off and if we have had someone off every day.  Step 1 - Build Out Dates The first part of the challenge is to build out our data set so that we cover every date listed between 1st April and 31st March. This is where the scaffold will come in useful as we can use this to fill in any dates that we might be missing in our Reasons data set.  To make sure we are starting on the correct date, we first need to find the first date across our Reasons table. We can do this by using an aggregate tool, and finding the Min Date:  We now want to join this to our scaffold. To do this we need to do a cross join so that each of the rows from table is joined with all of the rows from another. Therefore, we need to create a calculated field to do this,...

2021: Week 24 C&BS Co Absence Monitoring

Image
 Challenge by: Carl Allchin Chin & Beard Suds Co is just like any other company, people have unscheduled time off. Whilst this is expected in organisations, it can be difficult to manage. At C&BS Co, we have had a rough start to our financial year with lots of people being off for illness or sickness. How bad has it been and do we have people off every single day? This analysis can be tough in BI tools to look at the day-to-day reality when days off are recorded with just a start date and the number of days taken off. This week's challenge is producing a simple data set that will give us this view.  We are analysing the period 1st April to 31st May 2021. Input One Excel workbook , two sheets: 1. Absence Table 2. Scaffold Requirements Input data Build a data set that has each date listed out between 1st April to 31st May 2021 Build a data set containing each date someone will be off work Merge these two data sets together  Workout the number of people off each day ...

2021: Week 23 - Solution

Image
  Solution by Tom Prowse and you can download the workflow here .  We are looking to improve our service on Prep Air so want to calculate NPS from a survey and see how we compare.  Step 1 - Combine Data First we want to combine the data from both of the files. We could use the union tool for this, but I have used the Wildcard Union in the input tool. You don't need to add any matching patterns as we want to bring all of the files through: We should now have both of our inputs in a single table: Step 2 - Classify Customers Next we want to classify the responses that we want to compare, so the first step is to count the total number of customers for each Airline. To calculate this we can use a fixed LOD:  Number of Customers After counting the customers, we can then filter to only airlines with more than 50 customers. You can filter this by using a range filter on Number of Customers:  Finally, to classify the customers responses we can use the following IF statem...