2022: Week 33 - Solution


Solution by Tom Prowse and you can download the workflow here.


The challenge this week is to try and identify how long there is between each of the orders for each product and location. There were some different answers across social media with different tools and different solutions... but this is how we solved it!

Step 1 - Input Tables

The first step is to input all of our data sources and combine them so we have a single table. First we can focus on the two orders tables and combine these using a Union.

Once we union the two tables together, we need to ensure that the mismatched date fields (Sales Timestamp & Sales Date) are merged together into the same field. We can do this by selecting Sales Timestamp and then using the + icon on the Sales Date row to merge these together.


After merging the fields, we then want to replace any null values with Online. This will be in the Store field and nulls are created as this field isn't present within the Online orders table. All we need to do is double click on Null in the profile pane and rename to Online. 

At this stage the table should look like this:


We can then join the Product Lookup table to our workflow by using an inner join where Product = Product ID:


After the join our table should now have all three data sources combined and should look like this:


Step 2 - Product Type

Using the Product Name field, we can extract the first word to get our Product Type field. To do this we can use a custom split functionality where we take the first part before the space:


We can then rename this to Product Type and remove the Product Name field. 

Our table should now look like this: 


Step 3 - Next Sale 

Now we want to create a data set that allows us to identify the next sale from the same store and product. To do this we want to use a self join to identify any orders that were after the given order.

First we can create a clean step, then use this to join back to our original workflow using an inner join where ID (Self Join) > ID (Original Workflow), Product Type = Product Type, and Store = Store. 


Our workflow should look something like this: 


At this stage we can clean our table by doing the following steps: 

  • Remove Product Type-1 and Store-1
  • Rename ID-1 to Original ID
  • Rename ID to Next ID
  • Rename Sales Timestamp-1 to Original Time
  • Rename Sales Timestamp to Next Sale
From here we can then calculate the difference in time between the original and next orders: 

Time to Next Sale of Same Product
DATEDIFF('minute',[Original Time],[Next Sale])

Then after this calculation, we want to remove any Null values from the calculated time field (this is due to refunds or data errors). 

Our table should now look like this: 


Step 4 - Avg Time

We can now find the average time between each order. First we need to use an aggregation to reduce the number of rows in our table. After the self-join, all of our orders and now joined with every other order, therefore we want to identify the Min length of time (eg, the next order) for each Product Type, Store, and ID combination:


Now that we have a row for each sale and the time that it took to make the next order. We can then use this to find the average time for each Product and Store using another Aggregation tool.


The final step is to rename the field to Average Mins to Next Sale, and then round this to 1 decimal place:

Average mins to next sale 
ROUND([Average mins to next sale],1)

After this we are ready to output our table that should look like this: 


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text