Preppin' Data

Posts

Showing posts from March, 2021

2021: Week 13 - Premier League Statistics

March 31, 2021

Challenge by: Simon Evans Before Simon joined The Data School in the UK, he was a professional sporting performance analyst. Simon has reached into his previous professional life to come up with a football (read soccer) based challenge for this week. Simon is channelling his inner fanalyst to use data to understand more about the game that he enjoys. This week we want to create a data set that allows us to analyse 'Open Play Goals' scored. We will rank the players overall and by their position. Input 5 csv files , all with a similar structure. There are a lot of columns in these data sets. Small part of one of the five files Requirements Open play goal scoring prowess in the Premier League 2015-2020 Input all the files Remove all goalkeepers from the data set Remove all records where appearances = 0 In this challenge we are interested in the goals scored from open play Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored...

2021: Week 12 - Solution

March 30, 2021

Solution by Tom Prowse and you can download the full workflow here. The challenge this week took a look at a data set showed to us by Lorna Brown , and was all about different economic indicators related to tourism in the Maldives. This had a bit of a quirky structure so the challenge was transform and clean this up so that it'll be more useful to create a visualisation. Step 1 - Pivot Months The first step is to bring all of our month fields into a single column using the pivot tool. Using the Columns to Rows pivot, we have dragged all of the month fields into the pivot section so we only have our non month fields left: After the pivot we want to rename the fields, remove any nulls, and change the data types for the Number of Tourists and Month fields. We also only want to focus on the Tourists information from each country, therefore we can also Keep Only the 'Tourists' from Unit-Detail, and we can Exclude 'Total Tourist Arrivals' from the Series-Measure fie...

2021: Week 12 - Maldives Tourism

March 24, 2021

Challenge By: Jenny Martin One of the best things about being a Dr Prepper is that people are always bringing interesting datasets to your attention. A little while ago, Tableau Zen Master Lorna Brown showed me a dataset with all kinds of information on tourism in the Maldives . This database has a lot of data on different Key Economic Indicators, but as you can imagine, it also has a bit of a quirky structure! For inspiration as to why we might want to clean this data up, check out Lorna's viz below: Link to viz Input Our input is very wide this week, with 136 fields and only 28 rows. It covers tourism in the Maldives from 2010 to 2020. The source of this data is here but you can download it in the usual way from here . Requirements Input the data Pivot all of the month fields into a single column ( help ) Rename the fields and ensure that each field has the correct data type Filter out the nulls ( help ) Filter our dataset so our Values are referring to Number of Tourists Our g...

2021: Week 11 - Solution

March 23, 2021

Solution by Tom Prowse and you can download the full workflow here . This week we were taking a look at cocktail recipes with a challenge put together by Viven Ho . This covered a lot of fundamental data prep skills so hopefully you found it useful to bring your Tableau Prep skills together. Step 1 - Split Ingredients & Measurements The first step is to input the Cocktails table, and then we want to split apart the 'Recipe (ml)' field so that each of the different ingredients have their own column. We can split these using a custom split on a ';' separator: As a result we have 4 columns that we can pivot into a single column using a Columns to Rows pivot. In the pivot setup we can use a Wildcard on 'Recipe' to bring through all of the fields containing our recipe ingredients: After the pivot our recipe ingredients are all have their own row and the table looks like this: Now the ingredients are in a single column, we can then extract the measurements fr...

2021: Week 11 - Cocktail Profit Margins

March 17, 2021

Challenge by Vivien Ho (@vivienho22) This week's challenge has been put together by Vivien to challenge a lot of the fundamental prep skills you have built up so far this year. Vivien's challenge is looking at cocktail pricing (a common trend over the years at Preppin' HQ) and whether you can determine how much profit you can make from certain cocktails because who doesn't talk about data preparation when you are in a bar? Input Cocktails: names, prices and their recipe with measurements Sourcing: ingredient prices, quantity per bottle, currency of price Conversion rates: currencies and their conversion rates (e.g. 1.14 euros = 1 pound) Cocktails Sourcing Conversion rates Requirements: Input the dataset Split out the recipes into the different ingredients and their measurements Calculate the price in pounds, for the required measurement of each ingredient Join the ingredient costs to their relative cocktails Find the total cost of each cocktail Include...

2021: Week 10 - Solution

March 16, 2021

Solution by Tom Prowse and you can download our workflow here . This week we looked at the Pokémon evolution hierarchies and group these into the correct evolution groups. Although we focused on Pokémon this week, it could be transferrable to pretty much any hierarchical data so hopefully it will help you out for those challenges at work as well. Step 1 - Filter Relevant Pokémon From our Pokémon data set, we are first going to give it a bit of a clean and then filter out the Pokémon that we aren't going to look at in the challenge. Therefore the first step is to duplicate the '#' field and then trim any spaces from the duplicated field ('#-1'). We can then change the '#-1' field to a decimal number and use the following Wildcard Match filter to exclude any rows that contain a '.' from the '#' field. This is going to exclude any of the 'Mega' evolutions that we aren't interested in. We then need a second filter on the '#-1...