2020: Week 53 - Solution

 

Solution by Tom Prowse and you can download the full workflow here

The final challenge of 2020 was based on the introduction of a new star sign, Ophiuchus. We compared how the dates have now changed and if your star sign has changed based on your date of birth.

Step 1 - Input Data

The first step this week is to ensure that we have input our data correctly. As the initial inputs don't have headers and the data starts on the very first row, we need to ensure that we have selected the 'Generate Field Names Automatically' option within the input settings: 


Once we have selected this, our headers are now F1, F2 etc therefore we need to update the names so that it's easier for us to understand going forwards. The naming changes look like this: 

Old Star Signs  

New Star Signs 

Step 2 - Pivot Old Star Signs

The first step is to remove any null values that have been picked up from the input, we can just exclude these. 

Next, we want to pivot the data so that all of the Star Signs are in a single column. We use a Columns to Rows pivot by pivoting the three star sign fields: 


As a result of this pivot we now have three dates for each star sign which is incorrect, therefore we need to make sure that we create a new date field with the correct corresponding dates. We use the following calculation: 

Start End Date Range 
CASE [Pivot1 Names]
    WHEN 'Star Sign' THEN [Start End Date]
    WHEN 'Star Sign 1' THEN [Start End Date 1]
    WHEN 'Star Sign 2' THEN [Start End Date 2]
END

We can then remove the three initial date fields so that our data looks like this: 


Step 3 - Clean Old Star Sign Dates

Now we have a single column of star signs we can clean the date range into a start and end date column. The first step is to split the date range into separate numbers using the automatic split functionality. We can then rename the column headers and remove the Date range field so our data looks like this: 


We can then start to create a start and end date from each of the split parts using the following calculations: 

Start Date 
MAKEDATE(2020,[Start Month],[Start Day])

End Date 
MAKEDATE(2020,[End Month],[End Day])

With the end date, we need to readjust any dates for January because they roll over into 2021, therefore we also need this calculation: 

End Date 
IF [End Month] = 1 
THEN DATE(DATEADD('year',1,[End Date]))
ELSE [End Date]
END 

As a result we now have the Old Star Signs table formatted like this: 


Step 4 - Clean New Star Signs Dates

Next, we want to focus on the New Star Signs table and clean the dates so that we can combine this with our Old Star Signs table. 

As these are in slightly different format after we have input the table we can't use the same technique as with the old table. 

First, we want to convert the Start Month into a month number. Unfortunately, not all of the dates are in the same format therefore we want to focus on just the first three letters of each month and then convert this to a month number:

Start Month 
DATEPART('month', DATEPARSE('MMM',left([Start Month],3)))

We can use the same technique for the End Month as well: 

End Month 
DATEPART('month', DATEPARSE('MMM',LEFT([End Month],3)))

We are now ready to create our Start and End date fields using the same calculation as the Old Star Signs table: 

Start Date
MAKEDATE(2020,[Start Month],[Start Day])

End Date 
MAKEDATE(2020,[End Month],[End Day])

Then finally readjusting the January months for 2021: 

End Date 
IF [End Month] = 1 
THEN DATE(DATEADD('year',1,[End Date]))
ELSE [End Date]
END

Our New Star Signs table should now look like this: 



Step 5 - Scaffold Dates & Combine Tables

Now both of our tables are in the same format, we can use the scaffold table to fill in all of the days of the year. To do this we want to join this to both of our workflows using the following join conditions: 

Old Star Signs


New Star Signs 


Once we have joined the scaffold to both tables, we can then join these together to make a single table using the following join condition: 


Our workflow now looks like this: 


And our data table looks like this: 


Step 6 - Prepare Output

We can now start preparing the required output file, first by creating the date range for each star sign. We created the range by using the following calculation: 

Date Range 
STR(DATEPART('day',[Start Date]))
+
' '
+
LEFT(DATENAME('month',[Start Date]),3)
+
' - '
+
STR(DATEPART('day',[End Date]))
+
' '
+
LEFT(DATENAME('month',[End Date]),3)

Next, we want to identify if the star sign has changed so we use this calculation to identify changes:

Star Sign Change? 
INT([Star Sign]=[Star Sign-1])

Using the INT function will transform a Boolean True/False into a 1/0. 

Then we want to identify the dates that are on cusp, and if at least one of the new dates matches the old date by using this Fixed LOD: 


If the LOD calculation returns a 1 then this means that there is an overlap and considered not to have changed there we want to use a filter to remove these values. Our table should now look like this: 


The final step is to format the Date field and convert this to our Birthday field. We can do this by using the following calculation: 

 Birthday 
LEFT(DATENAME('month',[Date]),3)
+
' '
+
STR(DATEPART('day',[Date]))

Once we have removed and renamed we have our desired output: 


The full output can be downloaded here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text