2020: Week 28 - Solution


Week 28 Solution by Tom Prowse




Download our full solution here

For this week's challenge we revisited a technique from Week 25 where we converted some Roman Numerals into numeric values. This week is similar but with a twist as we will be identifying which year each of the Olympic games took place.  

Step 1 - Join Scaffold

For the challenge we are given two data sources, the first is details about each of the Olympic games and the other is a scaffold of numbers from 1 to 10. Our first task is to join these two tables together so that the scaffold number matches the number of characters in our Games (Roman Numeral) field. 

To do we can bring the Host Cities data source into the workflow, and then use the following calculated field: 

Numeral Characters
len([Games])
This will provide us with how many characters are in each of the Roman Numeral strings. From here we can then use this field to join the scaffold table. The join calculation will look like this: 


Notice that we are using Numeral Character >= Scaffold, as this will allow our data to make multiple joins therefore increasing the number of rows massively.

Step 2 - Convert Numerals to Numeric

Our next task is to convert the numeral values into numeric ones. To do this we need to use the Mid function to identify which character matches with the scaffold value. The calculation looks like this: 

Numeral Character
mid([Games],[Scaffold],1)
This calculation allows us to extract each of the characters from the string and have them all in one column, like the table below: 



The next step is to convert the Numeral Characters into a numeric value. We can do this with a simple IF statement: 

Numeral Value
IF [Numeral Character] = 'X' THEN 10
ELSEIF [Numeral Character] = 'V' THEN 5
ELSEIF [Numeral Character] = 'i' THEN 1
END
Now we have replaced the Characters with Numbers, we can remove the Numeral Character field, and the table should look like this: 


Next, it's time to pivot our data. We want to pivot so that each of our Scaffold values is a separate column and not a row. Therefore, we use a Rows to Column pivot, with this setup:


Now we have a separate column for each character/scaffold number, we need to identify whether or not the value is positive or negative. This is quite a manual process and uses the following calculations:

1
IIF(zn([2])>[1],-[1],[1])

2
IIF(zn([3])>[2],-[2],[2])

3
IIF(zn([4])>[3],-[3],[3])

4
IIF(zn([5])>[4],-[4],[4])

5
IIF(zn([6])>[5],-[5],[5])

For each of these calculations, we are looking at the next column, if it is greater than the current column then we make it negative, if it isn't then it remains positive. 

As a result our table should look like this:


Finally, it's time to re-pivot our data, so that we have our scaffold numbers from columns to rows. Therefore, a Columns to Rows pivot is needed with this setup:


After these steps, we still have a lot of additional rows that we no longer need and we also want to total up the values that we have just converted. So the final task in this step is to use an aggregation tool to tidy the table: 


We want to Group By Games, Host, Dates, Nations, Sports, & Events then Sum the Pivot1 Values. 

Our table should now look like this:


Step 3 - Calculate Start & End Dates

The final step for this week's challenge is to calculate the start and end dates for each of the Games. To do this want to calculate the year which the Games were held, our Pivot Values field can be used here, along with some Olympic knowledge that the first Games were held in 1896 and that they happen every 4 years.

As a result we can use the following calculation to find the year of each of the games: 

Year
1896+(4*[Pivot1 Values])-4
Now we have the year, we need to work out the starting and end dates. To do this, we can split the current date field by using a '-' as a separator. The automatic split should work nicely here! We have renamed the split fields so they are easier to understand.


The next step is to again use a split function, but this time to split the day and month from the End Date field, again an automatic split does the trick nicely! 

Now we have our day/month is separate columns for the end date, we can use the following calculation to identify whether or not the Start Date contains a day & month, or just the day. 

Start Date
IF len([Start Date]) >=1 
AND len([Start Date]) < 3
THEN [Start Date]+ ' ' + [End Date - Split 2]
ELSE [Start Date]
END
For this calculation,we are identifying how many characters are in the string. If it is greater than 1 but less than 3 (eg, just the day) then we want to bring the Start Date, and the End Date Split 2 (End Month) together as a string.

Our table should now look like this:


The final task when creating the dates, is to convert them to a date field. We can do this with the following calculations: 

Start Date
DATE([Start Date]+' '+str([Year]))

End Date
DATE([End Date]+' '+str([Year]))
As a result of these calculations we have now converted our dates from words to a traditional date format: 


The final step for this week's challenge is to tidy the table so that it is ready for the output. This required the following steps: 
  • Remove Fields
    • End Date Split 1
    • End Date Split 2
    • Dates
    • Year
  • Remove Nulls
    • Exclude Null values from the Start Date field
After the final tidying we are now ready to output our table which should look like this: 


The full table is available here for you to check your answers.  

Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank