2020: Week 52 - Solution
Solution by Tom Prowse and you can download the full workflow here.
This week's challenge was from community member Kate Brown who created this workflow so that she could clean and prepare some data so that it was ready for her to make a viz that included Polygons. Let's take a look at how she solved it!
Step 1 - Join Tables
Our first task is to clean some of the field names from the US Open Winners table so that we have a better understand of what the fields refer to going forwards. We want to make the following field name changes:
- Pos to Position
- player to Player
- to par to Total to Par
- round 1 to Round 1
- round 2 to Round 2
- round 3 to Round 3
- round 4 to Round 4
- total to Total
- year to Year
Also, from the Location table, we don't need to include the Score or To par fields so these can be removed.
Once we have made these changes to both tables, we are ready to join these together using the following join conditions:
Step 2 - Calculate Par
Now we have combined both tables, we can start to clean some of the fields and calculate the total par scores and par score for each round.
First we can remove the following fields for Year-1, Champion, Position as these are now longer required. There are also some extra spaces within some of the fields, therefore we want to use the Remove Extra Spaces functionality within the clean options. We can do this on Country, Player, Venue, & Location fields.
After removing unwanted spaces, next we need to replace the 'E' within the Total to Par field with a 0, and then change this to a Whole Number.
Now this is a number field we can start to calculate the par score with the following calculation:
Par
IF [Total to Par] < 0
THEN [Total] + (-[Total to Par])
ELSEIF [Total to Par] > 0
THEN [Total] - [Total to Par]
ELSE [Total]
END
This will become our Total Par for the complete tournament. There are 4 rounds in each tournament, each with the same Par score, therefore we can calculate this with the following:
Round Par
CEILING([Par]/4)
Our table now looks like this:
Step 3 - Create Polygons
In the next part of this challenge, we have used the Yes Polygons! blog post from Kevin Flerlage to help us along the way.
The first step is to reshape the data using a Pivot step so that each of our Rounds are within the same column:
Next to create the square we need to duplicate our data 4 times (once for each of the corners) and then union these back together so that we have densified our data:
This extracts which branch each row comes from:
Then we want to remove the Table Names field, then once removed, we can rename the newly split field to Table Names.
Round to Par
[Round Score] - [Round Par]
Now we are getting into the Polygon creation steps. We need to create the following calculations:
Side Len
SQRT([Round Score])
Point
CASE [Table Names ]
WHEN 'womensusopenwinners' THEN 'Point1'
WHEN 'womensusopenwinners-1' THEN 'Point2'
WHEN 'womensusopenwinners-2' THEN 'Point3'
WHEN 'womensusopenwinners-3' THEN 'Point4'
END
Round Colours
CASE [Round Num]
WHEN 'Round 1' THEN 'A'
WHEN 'Round 2' THEN 'B'
WHEN 'Round 3' THEN 'C'
WHEN 'Round 4' THEN 'D'
END
X Coordinate Polygon
CASE [Round Colors]
WHEN 'A' THEN
(CASE [Point]
WHEN 'Point1' THEN [Side Len]
WHEN 'Point2' THEN 0
WHEN 'Point3' THEN 0
WHEN 'Point4' THEN [Side Len]
END)
WHEN 'D' THEN
(CASE [Point]
WHEN 'Point1' THEN 0
WHEN 'Point2' THEN -[Side Len]
WHEN 'Point3' THEN -[Side Len]
WHEN 'Point4' THEN 0
END)
WHEN 'B' THEN
(CASE [Point]
WHEN 'Point1' THEN [Side Len]
WHEN 'Point2' THEN 0
WHEN 'Point3' THEN 0
WHEN 'Point4' THEN [Side Len]
END)
WHEN 'C' THEN
(CASE [Point]
WHEN 'Point1' THEN 0
WHEN 'Point2' THEN -[Side Len]
WHEN 'Point3' THEN -[Side Len]
WHEN 'Point4' THEN 0
END)
END
Y Coordinate Polygon
CASE [Round Colors]
WHEN 'A' THEN
(CASE [Point]
WHEN 'Point1' THEN 0
WHEN 'Point2' THEN 0
WHEN 'Point3' THEN [Side Len]
WHEN 'Point4' THEN [Side Len]
END)
WHEN 'D' THEN
(CASE [Point]
WHEN 'Point1' THEN 0
WHEN 'Point2' THEN 0
WHEN 'Point3' THEN [Side Len]
WHEN 'Point4' THEN [Side Len]
END)
WHEN 'B' THEN
(CASE [Point]
WHEN 'Point1' THEN -[Side Len]
WHEN 'Point2' THEN -[Side Len]
WHEN 'Point3' THEN 0
WHEN 'Point4' THEN 0
END)
WHEN 'C' THEN
(CASE [Point]
WHEN 'Point1' THEN -[Side Len]
WHEN 'Point2' THEN -[Side Len]
WHEN 'Point3' THEN 0
WHEN 'Point4' THEN 0
END)
END
Over Under Par Round
IF [Round Score] > [Round Par]
THEN 'Over'
ELSEIF [Round Score] < [Round Par]
THEN 'Under'
ELSEIF [Round Score] = [Round Par]
THEN 'Par'
END
After these set of calculations we also need to determine which decade each of the years is in. Therefore, we need to duplicate the Year field, and then group the numbers into each decade. We have done this manually, but you can use a calculation if preferred:
Next we continue to complete all of the calculations needed to create the viz/polygons.
Row
CASE [Decade]
WHEN 1940 THEN 1
WHEN 1950 THEN 2
WHEN 1960 THEN 3
WHEN 1970 THEN 4
WHEN 1980 THEN 5
WHEN 1990 THEN 6
WHEN 2000 THEN 7
WHEN 2010 THEN 8
END
Column
INT(RIGHT(STR([Year]),1)) + 1
The output should look like this:
Step 4 - Min/Max Scores
For the second part of the challenge, we want to find out what the min/max scores were for each decade. First, we need to find what the Min and Max scores were for each decade using a Fixed LOD, we are doing this for both total score and round score:
Min Round Score
Max Round Score
Min Total Score
Max Total Score
Finally, to remove any duplicated rows due to the data densification, we use an aggregate tool with the following setup:
This is now ready for our second output which should look like this:
Both full outputs can be downloaded here.
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14, @JonathanAllenby & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!