Posts

2022: Week 27 - C&BSCo Clean and Aggregate

Image
 Created by: Carl Allchin Each January the Preppin' crew like to offer you the chance to learn the fundamentals. This year we are going further by using July and August to not just learn the basics but also learn the intermediate skill level data prep skills too. Some of these skills maybe familiar to you but hopefully the practice will make data prep perfections.  If you haven't taken part in one of these challenges before we offer up a weekly challenge set on a Wednesday. A solution post will be published the following Tuesday with a written and video solution. To take on the challenge, download the data, follow the requirements and check your output against our own. We will tell you field names and row counts to provide a quick overview. The order of the data fields and rows does not matter unless stated in the requirements.  All the challenges for the next two months will focus on Chin & Beard Suds Co (C&BSCo) a soap retailer that struggles with data cleanliness.  T

2022: Week 26 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . The challenge this week comes from Algirdas Grajauskas who has provided us with his Spotify data so we can tidy it up and then analyse it. Step 1 - Minutes Played Our first task is to create a new field that breaks down the milliseconds into minutes. To do this we can use the following calculation:  Mins Played   ROUND( [ms_played] / (1000*60) ,2) Multiplying by 60 allows us to convert into minutes, with a decimal based on the seconds. Note, after the decimal doesn't show how many seconds. This will need to be converted again if we need to calculate this. After this our table should look like this:  Step 2 - Extract Year Next we need to extract the year from the timestamp field. We can use the inbuilt functionality within Tableau Prep for this. First we want to duplicate the timestamp field and rename it to Year, then press on the menu (three dots) icon to go to Convert Dates -> Year Number. Step 3 - Rank Artists No

2022: Week 26 - Making Spotify Data Spotless

Image
Challenge By: Algirdas Grajauskas It's always great when a Data Schooler in training comes up with an idea for a Preppin' Data Challenge! Al is part of DS31 and has been in training for 3 months now. For this challenge we're going to clean up and analyse his Spotify data.  Input Unclean Spotify data – from Spotify  Requirements Input the data Create a new field which would break down milliseconds into seconds and minutes e.g. 208,168 turned into minutes would be 3.47min Extract the year from the timestamp field Rank the artists by total minutes played overall For each year, find the ranking of the artists by total minutes played Reshape the data so we can compare how artist position changes year to year Filter to the overall top 100 artists Output the data Output 10 fields Overall Rank Artist Name 2015 2016 2017 2018 2019 2020 2021 2022 100 rows (101 including headers) You can download the  full output here .  After you finish the challenge make sure to fill in the  partici

2022: Week 25 - Solution

Image
Solution by Tom Prowse and you can download the workflow here .  This week is all about using Prep to assign hotel rooms so we can find a suitable room for each of our guests. Step 1 - Additional Requests First we want to focus on the Guests table of our input, where we need to replace any of the 'N/A' additional requests with a Null value. We can do this by right-clicking on the 'N/A' row in the Profile Pane, and then selecting 'Replace with Null'. After this we want to split apart the Additional Requests field so that we have a field for each extra request. We can split these apart by using a ',' as a separator: As a result we now have 3 extra fields, each with an individual request in them:  Now we can reshape the data so that we have all of the requests in a single field. To do this we can use a Columns to Rows pivot, with each of the Split fields being used. After the pivot we can replace any of the blank ('') fields with a Null, then filter

2022: Week 25 - Housing Happy Hotel Guests

Image
Challenge by: Jenny Martin Imagine you're working at a hotel and the process of assigning guests to hotel rooms is very manual and often based off the expertise of those that have worked in the hotel for a long time. You've managed to pick the experts' brains to get a comprehensive list of the features of each hotel room and you want to build a simple proof of concept workflow that would allow users to immediately see which rooms would be suitable for each guest. We're not worrying about assigning guests to rooms just yet, this is just the first step to proving how an automated workflow could help to make life easier! For a room to be suitable for a guest it must: Have enough capacity for all adults and children in the party (no one's allowed to sleep on the floor!) Have either double or twin beds as per the guest's preference Be accessible if the guest has accessibility requirements Satisfy as many of the additional requirements as possible Inputs Hotel Rooms 

2022: Week 24 - Solution

Image
Solution by Tom Prowse and you can download the workflow here.   The challenge this week came from  Jorge Supelano  who wanted to know about the longest flights ever taken. Let's see how we can prepare the data, ready to build his viz! Step 1 - Remove Airport Names First we want to remove all of the airport names from the To and From fields so we are left with just the city names.  All of the additional airport names come at the end of the string, therefore we want to create a common separator that we can use to split these values off. Currently, there is a mix of '-', '–' and '/', therefore we can use the Replace function to make these the same. From REPLACE(REPLACE([From],'-','/'),'–','/') This replaces both of the '-' and '–' with a '/', that we can then use to split off the first part of the string: Then tidy the From field by removing the original, then renaming the newly split field to From.  We n

2022: Week 24 - Longest Flights

Image
Challenge by:  Jorge Supelano Have you ever wondered about the longest flights ever taken? Let's prep all the data required to answer that question and build a viz like Jorge's! Find Jorge's viz here Inputs Non-stop flights (top 30, by great-circle distance) - from Wikipedia   World City Latitudes and Longitudes - from SimpleMaps Requirements Input the data Remove the airport names from the From and To fields e.g. New York-JFK should just read New York Create a Route field which concatenates the From and To fields with a hyphen e.g. Dubai - Dallas Split out the Distance field so that we have one field for the Distance in km and one field for the Distance in miles Ensure these fields are numeric Rank the flights based on Distance Use a dense rank in order to match the wikipedia page The Scheduled duration is a Date/Time data type. Change this to a string so that we only keep the time element Update the First flight field to be a date Join on the lat & longs for the From