2020: Week 26 - Solution




You can find our full solution workflow below and download it here


This week the challenge was all about working with unique IDs and how to deal with things when they don't quite all match up!

Step 1 - Join Inputs

The first step this week is to bring in both the 'Internal Data' and '3rd Party Data' tables and then identify which IDs match up perfectly, and the ones that don't. To do this we can use the following joins: 

Perfect Match
This is an inner join on ID = 3rd Party ID


We can then add a step after the join where we remove the [Scent-1] field and add a string calculation called 'Status' that labels all of the rows as 'Matched'.

Unmatched Internal
Right Outer Join to include all values from the Internal table, joining on 3rd Party ID = ID


Similar to before, we are then going to add a clean step, then remove any 3rd Party fields (ID & Sales) and also rename [Scent-1] to [Scent]

Unmatched 3rd Party
Left Outer Join to include all values from the 3rd Party table, joining on 3rd Party ID = ID


Again we will add a clean step, and this time just removing the ID, Scent-1, & Sales fields. 

We should now have three branches to our workflow that looks something like this: 

Step 2 - Prepare Unmatched Fields

Now we have joined our data together, it's now time to focus on the unmatched fields. Therfore, we are going to use another join to bring these together, however this time we will be joining on Scent from both sides: 


Now we have both our Internal and 3rd Party unmatched IDs together, we need to calculate the difference in sales between the two. We can use the following calculation: 

Sales Difference (Absolute Value)
abs([Sales]-[3rd Party Sales])
We now want to calculate the minimum sales difference across our data set for the 3rd Party IDs, therefore we use the following LOD to calculate this: 

Min Sales Difference



We can then use this field to filter our any values that are equal to the sales difference. So the filter calculation is [Min Sales Difference] = [Sales Difference (Absolute Value)] then we can also remove the Min Sales Difference field. 

Next, we want to do the same process but for our Internal IDs so we follow these steps: 

1. Calculate Min Sales Difference for IDs


2. Filter on this condition - [Min Sales Difference (Internal)]=[Sales Difference (Absolute Value)]

3. Remove [Min Sales Difference (Internal)] & [Sales Difference (Absolute Value)]

We have now identified the products that can be matched by their scent, therefore we need a calculation to keep this status. We can use a string like before, to create Status as 'Matched on Scent' for all of the rows. 

The final step of this part of the challenge is to recombine the two workflows that we currently have, so we can use a union tool to stack the 'Perfect Matches' onto top of the matches that we have just identified by Scent. Our table should look something like this: 


Step 3 - Classify Unmatched

Now we have identified the IDs that match from the different tables, it's now time to identify the ones that don't. To do this we are going to bring in the two inputs again, you can use the original ones but bringing in new ones allows us to keep the workflow nice and clean!

We are going to use the two inputs to again join with our original workflow, creating two new streams for Internal and 3rd Party IDs. 

Internal Unmatched 
Outer Right Join including all information from the Internal Data table. Join on ID = ID


We can then clean the table with the following steps: 
  • Remove Status, ID, Scent, Sales, 3rd Party ID, 3rd Party Sales
  • Rename ID-1 to ID, Scent-1 to Scent, Sales-1 to Sales
  • String calculation Status - 'Unmatched - Internal'
As a result of this we should have the following table: 


3rd Party Unmatched
We are going to use a similar process with the 3rd Party unmatched branch, so we will made the following steps: 

1. Join 3rd Party Data to Original workflow. This is an Outer Right Join with all of the 3rd Party rows returned, and joined on 3rd Party ID = 3rd Party ID


2. Clean the following fields: 
  • Remove Fields - Status, ID, Scent, Sales, 3rd Party ID, 3rd Party Sales
  • Rename 3rd Party ID-1 to 3rd Party ID, Scent-1 to Scent, 3rd Party Sales-1 to 3rd Party Sales
  • String calculation for Status - 'Unmatched - 3rd Party' 
Now we have prepared both the Internal and 3rd Party IDs we can bring these together into one table by using the union tool. These tools look like this in our workflow: 

Our table should look something like this: 


The final step is to bring all of this back together with our original workflow. As the data is structured in the same way throughout the workflow, we can take our current workflow (Unmatched Union) and the workflow before the unmatched work, and union these together. 


As a result of this final union, we now have the desired output with the Matched and both Unmatched IDs in one table: 


Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text