Posts

2021: Week 14 - Solution

Image
  Solution by Tom Prowse and you can download the full workflow here . Prep Air was back this week and we were taking a look at the in-flight purchases within a few of the selected flights. This included combining multiple data sets , parsing strings, and identifying what seat purchases the most on our flights.  Hopefully this challenge allowed you to combine various core data prep techniques and put this in a real world scenario to help make everything easier to put together. Let's have a look at how we solved it!  Step 1 - Where is each seat located?  The first step this week was to identify where each seat was located on the plane (Window, Middle, or Aisle). Using the Seat List table we want to first pivot each seat letter so that we have all of the Seat Letters within a single column with the corresponding Passenger Number and Row:  We can now add a label for each of the seats positions:  Seat Position   CASE [Row Letter]       WHEN 'A' THEN 'Window'      WHEN &

2021: Week 14 - Prep Air In-Flight Purchases

Image
Challenge by Tom Prowse We are revisiting our Prep Air Airline this week by looking at some flight details and trying to provide some data driven answers. As part of running an airline we are always interested in how successful the in-flight service is and if we can make any improvements to try to boost sales.  For this week's challenge we have been provided with a selection of different data sources and we want to combine them to answer some questions that will help us to understand some purchasing patterns on the flights.  We have the following inputs :  Passenger List A list of all the passengers from a selection of flights. This includes their name, a passenger number, the flight number and the total they purchased whilst on the flight. Note, not all flights are full and can hold a maximum of 120 passengers.  Seat List A mapping of where each passenger sits within each flight. This is the same for all flights across our fleet and includes the row number and seat letter within e

2021: Week 13 - Solution

Image
Solution by Tom Prowse and you can download the full workflow here . This week's challenge was brought to us from Simon Evans. He used to work as an analyst at a professional sports team so we looked at a similar data set that a football analyst will have to make their way through. There were a lot of fields and lots of the information weren't important to the questions that we wanted to answer, therefore this week was good practice about how to deal with these larger data sets. Step 1 - Input All Files The first step is to input all of the files for the challenge. We are given data from 5 different seasons and all of these csv files have the same structure meaning that we can use the wildcard union input so that all of the tables are stacked on top of each other.  Notice how we have left the matching pattern section blank as we want to include all of the files. Step 2 - Filter & Calculate Open Play Goals Now we have all of the files, we can start to filter and clean our da

2021: Week 13 - Premier League Statistics

Image
Challenge by: Simon Evans Before Simon joined The Data School in the UK, he was a professional sporting performance analyst. Simon has reached into his previous professional life to come up with a football (read soccer) based challenge for this week.  Simon is channelling his inner fanalyst to use data to understand more about the game that he enjoys.  This week we want to create a data set that allows us to analyse 'Open Play Goals' scored. We will rank the players overall and by their position.  Input 5 csv files , all with a similar structure. There are a lot of columns in these data sets. Small part of one of the five files Requirements Open play goal scoring prowess in the Premier League 2015-2020 Input all the files Remove all goalkeepers from the data set Remove all records where appearances = 0 In this challenge we are interested in the goals scored from open play Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored that

2021: Week 12 - Solution

Image
  Solution by Tom Prowse and you can download the full workflow here.  The challenge this week took a look at a data set showed to us by Lorna Brown , and was all about different economic indicators related to tourism in the Maldives. This had a bit of a quirky structure so the challenge was transform and clean this up so that it'll be more useful to create a visualisation. Step 1 - Pivot Months The first step is to bring all of our month fields into a single column using the pivot tool. Using the Columns to Rows pivot, we have dragged all of the month fields into the pivot section so we only have our non month fields left:  After the pivot we want to rename the fields, remove any nulls, and change the data types for the Number of Tourists and Month fields. We also only want to focus on the Tourists information from each country, therefore we can also Keep Only the 'Tourists' from Unit-Detail, and we can Exclude 'Total Tourist Arrivals' from the Series-Measure fie