2022: Week 9 - Solution


Solution by Tom Prowse and you can download the workbook here.


This week the challenge was all about creating a cohort analysis that can be used to visualise in Tableau Desktop. Normally, the calculations for this type of analysis are done within Tableau Desktop, but if you want to take it a bit further then doing the calculations in Tableau Prep first can be hugely beneficial.

This challenge came off the back of one of Jenny's training sessions, so we would love it if you have any alternative solutions!

Step 1 - Year of Order

The first step is to calculate the year that each customer made an order. We can make use of Tableau's date conversions by duplicating our Order Date field, and then converting the duplicated date to a Year.


Once we have the year for each order, we can aggregate this to calculate whether a customer had made an order in that year. Within the aggregation tool, we need to group by Year of Order, Customer ID, & Customer Name. This will give us all customers who have ordered on different years.


Step 2 - First Purchase

Next, we want to calculate when the first purchase was made. We can calculate this by using a Fixed LOD to find the Min Year for each customer ID:

First Purchase 

Now we can scaffold the data so that we have a row for each year after the first purchase, even if they didn't make a purchase in that year. This will allow us to create the categories from the requirements.

First we need to go back to our Aggregation step, create a new branch, and create an aggregation tool that brings through each of the years in our data set (group by Year of Order). This will give us all of the years to create the scaffold.

Using the year aggregation, we need to join this back to where we calculated the first purchase using the condition Year of Order <= Year.


After the join we then need to aggregate the data so that we have a single row for each year, customer and order year that they made. The aggregation is setup with by grouping Year, First Purchase, Customer ID, & Customer Name, and then finding the Max of Year of Order.


As a result, our table is now ready for us to identify whether or not an order was made in each year: 

Order? 

IF [Year of Order]=[Year]
THEN 1
ELSE 0
END

After creating the flag our table should now look like this: 


Step 3 - Cohorts YoY Difference

Now that we have identified the first year of purchase, and whether or not a purchase was made in a given year, we can now start to create cohorts for each of our customers. 

First we want to calculate the Year on Year difference in the number of customers from each cohort (first purchase year). To do this, we want to calculate how many customers are in each 'cohort' so we can use an aggregation tool to group the Year & First Purchase, and then Sum the Order? field, this can become our Number of Customers field: 


Now we can compare the number in the current year, with the next year by creating a new branch, then adding 1 to the Year field.

Year + 1 
[Year]+1

Then finally, rename the Number of Customers, to Last Years Customers.

Next, we want to compare the Year + 1 field with the Year field from our original data set, so we can join back to our previous clean step, and join where First Purchase = First Purchase and Year = Year + 1: 


Then finally the YoY difference is calculated by subtracting the Last Year Customers from Number of Customers: 

YoY Difference 
[Number of Customers]-[Last Year Customers]

This part of the workflow should look like this:

At this stage our table should look like this: 


Step 4 - Cohorts Ordered in Previous Year

Now we have the YoY difference, we can now calculate the whether or not the customer ordered in the previous year. 

To do this we need to join back in the customer name & ID information by joining on First Purchase and Year 

Join conditions:


We can now use an aggregate tool, to identify whether or not a customer purchased in a given year. We need to group by Year & Customer ID, and then find the Max of Order?: 


We can then run a similar process as we did previously, by identifying the next year and then joining this back to our customer data.

First, we need a new branch and then we need to add 1 to the Year field.

Year + 1 
[Year]+1

Now we want to rename the Order? field, to LY Order so that we can easily identify whether an order was made in the previous year. 

Then we can join this back to our data set by using Customer ID = Customer ID and Year = Year +1


Our table should now look like this: 


We are now ready to classify our customers into different cohorts with the following calculation:

Customer Classification 
IF [Year]=[First Purchase]
THEN "New"
ELSEIF [Order?]=0
THEN "Sleeping"
ELSEIF [LY Order?]=0 and [Order?]=1
THEN "Returning"
ELSEIF [LY Order?]=1 and [Order?]=1
THEN "Consistent"
END

After the classification our table should now look like this: 


Step 5 - Join Original Data

The final step this week is to join back the original data set that we started with right at the beginning. This will allow us to maintain the row structure and detail that we had at the start, whilst also retaining the classifications and YoY difference that we previously calculated.

For this join we want to join on Customer ID = Customer ID and Year of Order = Year: 


After a final bit of cleaning, our table should now look like this and is ready to output:


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text