Posts

Showing posts from March, 2022

2022: Week 13 - Pareto Parameters

Image
 Challenge by: Jenny Martin If you haven't had a chance to try out Parameters in Prep yet then here's the perfect opportunity! Have a read of how to create parameters in this blog post . We're going to create a classic use case of the Pareto chart. How many customers make up 80% of our Sales? Input There is one input this week detailing customer sales: Requirements Input the data Aggregate the data to the total sales for each customer Calculate the percent of total sales each customer represents Calculate the running total of sales across customers Order by the percent of total in a descending order Round to 2 decimal places Create a parameter that will allow the user to decide the percentage of sales they wish to filter to Output the data , including the parameter in the output name Create a second output that describes the result in plain English e.g. 50% of Customers account for 80% of Sales  Outputs Output 1: 6 fields Customer ID First Name Last Name Sales % of Total Ru

2022 Week 12 - Solution

Image
Solution by Tom Prowse and you can download the workbook here . This week take inspiration from the Gender Pay Gap Bot on Twitter, and take a look at historical data to get this into a nice, easy to understand structure. Step 1 - Combine Files The first step is to combine each of the different files from the different years. We have 5 different CSV files so we need to use the Wildcard Union within the Input step to stack these on top on one another. After inputting all of the data files, we can then remove a lot of the fields that are included. We can remove fields straight from the input step by using the tick boxes on the right-hand side.  We want to keep the following fields (these will be ticked):  EmployerID EmployerSize DiffMedianHourlyPercent File Paths EmployerName Our table should now look like this:  Step 2 - Clean Report Year Next we want to parse the year from each of the file names. After the wildcard union, Tableau Prep automatically creates the File Paths field which pro

2022: Week 12 - Gender Pay Gap

Image
 Challenge by: Jenny Martin I spent my International Women's Day being really fascinated by the Gender Pay Gap Bot on Twitter. I decided to look at the publicly available data that it was based on and found it a little confusing. It made me realise that the thing I enjoyed most about the Pay Gap Bot, was that it made it the insight from the data clear in a succinct manner. So let's do that for the historical data too!  Inputs We're using the data currently available on the Gender Pay Gap Service from 2017 to 2022:  There are 5 input files. Requirements Input the data Combine the files Keep only relevant fields Extract the Report years from the file paths Create a Year field based on the the first year in the Report name Some companies have changed names over the years. For each EmployerId, find the most recent report they submitted and apply this EmployerName across all reports they've submitted Create a Pay Gap field to explain the pay gap in plain English You may en

2022: Week 11 - Solution

Image
Solution by Tom Prowse and you can download the workflow here . This week we looked at one of the frequently occurring problems in data preparation - missing data and dealing with the gaps! We returned to the Prep School for this weeks challenge so let's see how to solve it! Step 1 - Input Data First we need to input our table of data which is a single CSV file, and looks like this when input into Tableau Prep: Notice the null values for Lesson Name and Subject, these are the areas that we want to identify and fill in with the correct information from above.  Step 2 - Weekday Aggregate  After identifying the null values, we can then create an aggregate field to show us what Teacher and Lesson Time combinations happen on each day, whilst retaining the Min value for the others. The aggregate tool is set up as follows:  As the same lesson repeats on each week, we can remove the Week field at this stage. After the aggregate our table should now look like this:  Notice how we no longer

2022: Week 11 - The Prep School - Filling the Blanks

Image
 Challenge by: Carl Allchin Dealing with gaps in data sets is something any data prepper will have to deal with frequently. At the Prep School, some of our attendance recording isn't filled in well.  Each lesson is taught at the same time, on the same day.  Input One csv file Requirements Input the data set Fill in the blanks  Find the average Attendance per Lesson and Subject per weekday Output the data Output 8 Fields Weekday Week Teacher Lesson Time Lesson Name Subject Attendance  Avg. Attendance per Subject & Lesson 40 rows (41 incl. headers) You can download the   full output here .  After you finish the challenge make sure to fill in the  participation tracker , then share your solution on Twitter using  #PreppinData  and tagging  @Datajedininja ,  @JennyMartinDS14  &  @TomProwse1 You can also post your solution on the  Tableau Forum  where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

2022: Week 10 - Solution

Image
Solution by Tom Prowse and you can download the workflow here.  This week's challenge was celebrating International Women's Day by looking at the Bechdel Test. This looks at the interactions between women in films and if those conversations were relating to something other than a man! Step 1 - Parse Download Data Table First we want to focus on the Webscraping table, and split out each of the fields containing a movie title. To do this we want to split the Download Data field using a custom split on '>'.  This creates around 57 extra fields, but we can remove the majority of them so that we are left with the fields: DownloadData Split 2 - (Movie Title) DownloadData Split 5 - (Test Pass/Fail) Year Then finally we can clean the Movie title field with the following calculations:  Movie REPLACE([DownloadData - Split 5],'</a',"") Movie   REPLACE([Movie],'&amp;','&') Then we can remove the DownloadData Split 2 field so we are l