2023: Week 12 - Solution

Solution by Tom Prowse and you can download the workflow here


This weeks challenge is a long one, so hopefully you stayed with it and chipped away at the different techniques that were required to solve the challenge.

Step 1 - UK Bank Holidays

First we want to input the UK Bank Holidays table where we can remove the Date and Bank Holiday field, and remove any null values from the Year. After this we can join back to the original table using the Source Row Number (note the source row number is a new feature within Tableau Prep 2023.1). 

Within the join we want to create an inner join where Source Row Number >= Source Row Number: 



This will allow us to fill down the year for each row and bank holiday date


From here we can tidy the table by keeping the relevant information. First we want to find the max source row number for each year: 

Max


Then we can filter to keep only where the Max = the Source Row Number-1 and finally remove the Max, Source Row Number-1, Source Row Number, and Year field then rename Year-1 to Year. Our table should look like this: 


Then finally we can remove the nulls from the Date field and create a date based on the Date and Year fields.

UK Bank Holiday

MAKEDATE([Year],MONTH([Date]),DAY([Date]))

Then after removing the unneeded fields we should have a list of all UK Bank Holiday dates: 


Step 2 - Combine UK New Customers

Next we want to combine the new customers data set from the UK. Before joining with our existing data set we can extract the day of the week from the Date field. 

To do this we want to duplicate the Date, and then on the duplicated date convert this to 'day of the week': 



We can then join this table onto our existing workflow ensure that all of the rows from the UK Bank Holiday table is included. This will mean that we need to use a left (or right) join where UK Bank Holiday = Date: 


After the join our table should look like this: 


Step 3 - Reporting Days

Next we want to identify the reporting days as per the requirements: 

Reporting Day 
IF STARTSWITH([Day],'S') THEN 'N' 
ELSEIF ISNULL([UK Bank Holiday]) THEN 'Y'
ELSE 'N'
END

This allows us to identify the weekends (Sat & Sun) and the values with no date within the UK Bank Holiday field are reporting days.

From here we can then split the workflow to assign customers based on the next reporting day that they joined. We can have separate branches for this: 

Joined on a Non-Reporting Day
All we need to do here is filter the Reporting Day field to Keep Only 'N'.

We then need to make sure we have all of the reporting days in this list so that we can reassign to the closest reporting day. For this we need a separate branch where we filter the Reporting Day field to exclude 'N'. 

Then rename Date to Reporting Date and remove all other fields. Once we have the list of reporting dates, we can then join this back to the branch of non-reporting days. This join needs to be an inner join where the Date (from the Non-Reporting branch) < Reporting Date (from the 2nd branch). 

This allows us to then have all the reporting dates after the given date, so we can then use a Fixed LOD to identify the minimum - this will identify the next reporting date: 

Min 

Then we can filter using this calculation:

[Min]=[Reporting Date]
OR
ISNULL([Reporting Date])

Once we remove the Min field our table should look like this: 


Reporting Days

We need a final branch which will include all of the reporting days. Within this branch we want to filter the Reporting Day field to keep only 'Y' and then rename Date to Reporting Date. 

From here we can combine both branches by using a Union so that we have a single table that looks like this: 


We can then aggregate the view so we have a total number of customers for each reporting date: 


Then we can calculate the reporting month by using a DateTrunc: 

Month
DATETRUNC('month',[Reporting Date])

Then find the last day of each month: 

Last Day 

Then we can format the Reporting Month by using the following calculation: 

Reporting Month 
IF [Reporting Date]<[Last Day]
THEN DATENAME('month',[Reporting Date])+"-"+STR(YEAR([Reporting Date]))
ELSE 
DATENAME('month',DATEADD('month',1,[Reporting Date]))+"-"+STR(YEAR(DATEADD('month',1,[Reporting Date])))
END

Then as per the requirements, we want to remove the January 2024 months by excluding this from Reporting Month field. 

Then finally calculate the Reporting Day by using this calculation: 

Reporting Day

{ PARTITION [Reporting Month]: { ORDERBY [Reporting Date] asc : ROW_NUMBER()}}

After these calculations our table should look like this: 


Step 4 - ROI Data

We're now ready to include the ROI New Customer table. Within this input we can remove the reporting day and rename the Reporting Month and New Customer fields with an ROI prefix using the Rename Fields functionality: 


We can now align the ROI data to the UK Reporting Days. First we can use a left join to on Reporting Date to ensure that we're matching all the dates from the UK branch and the ones that match in ROI: 


Then we can just keep the Reporting Month, Reporting Day, and Reporting Date fields. 

We can also do the inverse by using an outer right join on the same fields on a separate branch: 


After this join we can rename the Reporting Date-1 field to Reporting Date and then keep only ROI Reporting Month, ROI New Customers, and Reporting Date.

We can then bring both branches together by joining where the UK Reporting Date > ROI Reporting Date. 


We can then find the next day by finding the minimum:

Min


Then filter the table where [Min]=[Reporting Date]

We can then combine this back with the other Reporting Days via a Union and the table should look like this: 


Step 5 - Final Cleaning

Finally we need to tidy the table a little more. We can replace any nulls in the New Customer & ROI New Customers field with a 0. 

Then finally we want to create a flag to identify where the reporting months differ between the two systems. 

For this we want to aggregate where we group by Reporting Month, Reporting Day, and Reporting Date, then Sum New Customers and ROI New Customers, and also return the Max ROI Reporting Month.

Then finally we can use this to create a misalignment flag: 

Misalignment Flag 

IF LEFT([Reporting Month],3) != LEFT([ROI Reporting Month],3)
THEN 'X'
ELSEIF ISNULL([ROI Reporting Month])
THEN 'X'
END

Then we are ready to output our table that should look like this: 


You can download all the output from here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank