Preppin' Data

Posts

Showing posts from July, 2023

2023: Week 30 - % Difference From

July 26, 2023

Challenge by: Jenny Martin For this challenge we're continuing to play around with the new features in the latest 2023.2 version of Tableau Prep. These will be: % Difference From Lookup function Both these calculations will help us work across multiple rows of data in slightly different ways. We'll be using them to help AllChains analyse the performance of their Stores over the last financial year. Input We have one Excel file that contains a sheet for each Month of Sales. Requirements Input the data Bring all the Months together into a single table Merge together misaligned fields Create an Order Date field The financial year for Allchains starts in July. Convert the dates to Fiscal Quarters Aggregate the data to a Quarterly level for each Store, ignoring the product details Calculate the % Difference in Sales from the Previous Quarter Multiply this value by 100 and round to 1 decimal place Use the lookup function so that we can compare the % Difference for all Quarters in a

2023: Week 29 - Solution

July 25, 2023

Solution by Tom Prowse and you can download the workflow here . This week we take our first look at some new features within Tableau Prep! These include the Lookup Function, Moving Averages, and adding a TimeStamp to a file name when we output the data. Step 1 - Filter Dates First we want to input our data table, then we need to ensure that the date field is in the correct format. We only want to look at dates on a monthly basis, therefore we can use the DateTrunc function to ensure that all dates are rolled back to the 1st of the month. Date DATE( DATETRUNC('month',[Date])) We can then use a filter on this field to remove any dates after July Now our table should look like this: Step 2 - Monthly Sales We can now aggregate the table so that we have our sales at a monthly level. Within the aggregation we group by Date, Store, and Bike Type, then Sum Sales & Profit. Now we have the data at a monthly level, we can now fill in any months that might be missing. First we nee

2023: Week 29 - Moving Averages

July 19, 2023

Challenge by: Jenny Martin Since the 2023.2 version of Prep has been out for a while now, we thought it was high time we played around with a few features! These will be: Lookup function Moving Average Adding a time stamp to the file name We'll be using them to help AllChains perform a monthly analysis across their Stores. One of the key things to watch out for when using a Moving Average is that you have a complete dataset i.e. when you're averaging across 3 months, those 3 months should be consecutive. If not, you're going to want to make sure to scaffold your data, so you have a row for each month, even if there were no sales in that month. Input One input this week containing AllChains sales data: Requirements Input the data We only want to consider complete months so filter values up to June 2023 The values after July 2023 are forecasted values Aggregated the data to a monthly level for each Store and Bike Type Notice how not every Bike Type has a sale in every Store

2023: Week 28 - Solution

July 18, 2023

Solution by Tom Prowse and you can download the workflow here . We have another challenge focussed on the Prep School this week but this time we are looking at the athletic abilities of our students. The challenge is from Habeeb Gayle so lets look how we can solve it. Step 1 - Combine Tables First we want to combine all of our tables together so we have a single table to work from. First we want to join the Students and the Track Times using an inner join on the ID field: Then we can join the Benchmarks table using an inner join where Gender = Gender, Age = Age, Track_Event = Event: When we've joined these tables we can remove the duplicated fields and the table should look like this: Step 2 - Filter Results We can now start to filter our results to meet the requirements. First we want to remove anyone who hasn't met the benchmark by using a filter calculation [time]<=[Benchmark]. Next, we want to filter to remove any errors. We can highlight any rows which are seen to b

2023: Week 28 - Prep School Track Team

July 12, 2023

Created by: Habeeb Gayle This is our next challenge from one of the members of the 38th cohort of the Data School UK. The Prep School is keen on developing athletes from amongst their students. They want to identify and rank the top 100m and 200m athletes. Each student listed has taken part in either the 100m or 200m sprint. Some work needs to be done to find, rank and clean the dataset. Inputs 1. Each student has some basic information about themselves such as age and gender and they have an id number to be identified by. 2. Each student has a track event and corresponding time. 3. These events have a set benchmark that identifies the top students in each age group. Requirements Input the data Join the students basic information with their event and track time Join the benchmark table with the right students Note: the number of rows should still be 300 after the join Filter out the students that did not fall within the benchmark There has been an error with the collection of the

2023: Week 27 - Solution

July 11, 2023

Solution by Tom Prowse and you can download the workflow here . The challenge this week comes from Tanbir Jalil and is looking at helping the executives of the School District Board to analyse how their overhead costs vary from school to school. Step 1 - Input Data First we want to input all of our data. This is spread across multiple different sheets for each of the schools and we want to combine these into a single table. As the tables have the same structure we can use a wildcard union to 'stack' the tables on top of each other straight from the input step. Within the input step we can select the 'Union Multiple Tables' option and then ensure that we have a matching pattern of 'School*' within the worksheet name. Using the '*' acts as a wildcard, therefore it will pick up all of the tables with 'School' in the title and not worry about the letters afterwards. We can then remove the Table Names and File Path fields and the table should look lik

2023: Week 27 - The Cost of Running the Prep School

July 05, 2023

Challenge by: Tanbir Jalil This is our next challenge from one of the members of the 38th cohort of the Data School UK. You have been tasked with helping the executives of the School District Board analyse how their overhead costs vary from school to school, and over time. Inputs The data is provided to you in excel format, with 4 sheets for 4 different Schools. Requirements Input the data Consolidate the data from the 4 different excel sheets into 1 data set The executives would like the data transformed so each row is the monthly costs broken down into each category, for each school Calculate the total monthly cost for each school Make it so that the dataset is ordered by month, for each school Output the data Output 8 fields Sort School Name Date Total Cost Gas Cost Electricity Cost Water Cost Maintenance Cost 48 rows (49 including headers) You can download all the output from here . After you finish the challenge make sure to fill in the participation tracker , then share your

2023: Week 26 - Solution

July 04, 2023

Solution by Tom Prowse and you can download the workflow here . This is a follow on challenge from last week where we take a further look at analysing the admissions data for our schools. If you haven't completed Week 25 then go back and do that one first as we use the output as on of the inputs in this challenge. Step 1 - Combine Additional Information We want to combine both of our tables together so that we're working from a single table. To do this we need to extract the initials of the first and last names from the full name field. You can do this by splitting the name into separate fields and then use the Left() function, or you can do this in a single calculation: Initials LEFT([Full Name],1) + LEFT(SPLIT([Full Name],' ',2),1) We can then join the tables using an inner join on Initials, Date of Birth, School Name, English, Maths, Science fields. Step 2 - Ranking Next we want to rank the students based on their grades, subject selections, and region. For this