2020: Week 52 - Solution

 


Solution by Tom Prowse and you can download the full workflow here


This week's challenge was from community member Kate Brown who created this workflow so that she could clean and prepare some data so that it was ready for her to make a viz that included Polygons. Let's take a look at how she solved it!

Step 1 - Join Tables

Our first task is to clean some of the field names from the US Open Winners table so that we have a better understand of what the fields refer to going forwards. We want to make the following field name changes: 
  • Pos to Position
  • player to Player
  • to par to Total to Par
  • round 1 to Round 1
  • round 2 to Round 2
  • round 3 to Round 3
  • round 4 to Round 4
  • total to Total 
  • year to Year 
Also, from the Location table, we don't need to include the Score or To par fields so these can be removed. 

Once we have made these changes to both tables, we are ready to join these together using the following join conditions: 


Step 2 - Calculate Par

Now we have combined both tables, we can start to clean some of the fields and calculate the total par scores and par score for each round. 

First we can remove the following fields for Year-1, Champion, Position as these are now longer required. There are also some extra spaces within some of the fields, therefore we want to use the Remove Extra Spaces functionality within the clean options. We can do this on Country, Player, Venue, & Location fields. 

After removing unwanted spaces, next we need to replace the 'E' within the Total to Par field with a 0, and then change this to a Whole Number

Now this is a number field we can start to calculate the par score with the following calculation: 

Par 
IF [Total to Par] < 0
THEN [Total] + (-[Total to Par])
ELSEIF [Total to Par] > 0
THEN [Total] - [Total to Par]
ELSE [Total]
END

This will become our Total Par for the complete tournament. There are 4 rounds in each tournament, each with the same Par score, therefore we can calculate this with the following:

Round Par 
CEILING([Par]/4)

Our table now looks like this: 



Step 3 - Create Polygons

In the next part of this challenge, we have used the Yes Polygons! blog post from Kevin Flerlage to help us along the way. 

The first step is to reshape the data using a Pivot step so that each of our Rounds are within the same column: 


Next to create the square we need to duplicate our data 4 times (once for each of the corners) and then union these back together so that we have densified our data:

Now we have densified our data, it's time to complete the rest of the calculations. First, we want to be able to identify which branch each row comes from by using the Table Names field. We use a custom split to extract the part after the final '/': 


This extracts which branch each row comes from:


Then we want to remove the Table Names field, then once removed, we can rename the newly split field to Table Names. 

Next we want to calculate how much under/over par each round score was by using the following calculation: 

Round to Par 
[Round Score] - [Round Par]

Now we are getting into the Polygon creation steps. We need to create the following calculations:

Side Len 
SQRT([Round Score])

Point 
CASE [Table Names ]
    WHEN 'womensusopenwinners'  THEN 'Point1'
    WHEN 'womensusopenwinners-1' THEN 'Point2'
    WHEN 'womensusopenwinners-2' THEN 'Point3'
    WHEN 'womensusopenwinners-3' THEN 'Point4'
END

Round Colours 
CASE [Round Num]
    WHEN 'Round 1' THEN 'A'
    WHEN 'Round 2' THEN 'B'
    WHEN 'Round 3' THEN 'C'
    WHEN 'Round 4' THEN 'D'
END

X Coordinate Polygon 

CASE [Round Colors]

    WHEN 'A' THEN
        (CASE [Point]
            WHEN 'Point1' THEN [Side Len]
            WHEN 'Point2' THEN 0
            WHEN 'Point3' THEN 0
            WHEN 'Point4' THEN [Side Len]
            END)
 
    WHEN 'D' THEN
        (CASE [Point]
            WHEN 'Point1' THEN 0
            WHEN 'Point2' THEN -[Side Len]
            WHEN 'Point3' THEN -[Side Len]
            WHEN 'Point4' THEN 0
            END)
 
    WHEN 'B' THEN
        (CASE [Point]
            WHEN 'Point1' THEN [Side Len]
            WHEN 'Point2' THEN 0
            WHEN 'Point3' THEN 0
            WHEN 'Point4' THEN [Side Len]
            END) 
  
    WHEN 'C' THEN
        (CASE [Point]
            WHEN 'Point1' THEN 0
            WHEN 'Point2' THEN -[Side Len]
            WHEN 'Point3' THEN -[Side Len]
            WHEN 'Point4' THEN 0
            END) 
END

Y Coordinate Polygon
CASE [Round Colors]

    WHEN 'A' THEN
        (CASE [Point]
            WHEN 'Point1' THEN 0
            WHEN 'Point2' THEN 0
            WHEN 'Point3' THEN [Side Len]
            WHEN 'Point4' THEN [Side Len]
            END)
 
    WHEN 'D' THEN
        (CASE [Point]
            WHEN 'Point1' THEN 0
            WHEN 'Point2' THEN 0
            WHEN 'Point3' THEN [Side Len]
            WHEN 'Point4' THEN [Side Len]
            END)
 
    WHEN 'B' THEN
        (CASE [Point]
            WHEN 'Point1' THEN -[Side Len]
            WHEN 'Point2' THEN -[Side Len]
            WHEN 'Point3' THEN 0
            WHEN 'Point4' THEN 0
            END) 
  
    WHEN 'C' THEN
        (CASE [Point]
            WHEN 'Point1' THEN -[Side Len]
            WHEN 'Point2' THEN -[Side Len]
            WHEN 'Point3' THEN 0
            WHEN 'Point4' THEN 0
            END) 
END

Over Under Par Round 
IF [Round Score] > [Round Par]
THEN 'Over'
ELSEIF [Round Score] < [Round Par]
THEN 'Under'
ELSEIF [Round Score] = [Round Par]
THEN 'Par'
END

After these set of calculations we also need to determine which decade each of the years is in. Therefore, we need to duplicate the Year field, and then group the numbers into each decade. We have done this manually, but you can use a calculation if preferred:

 
Next we continue to complete all of the calculations needed to create the viz/polygons. 

Row 
CASE [Decade]
    WHEN 1940 THEN 1
    WHEN 1950 THEN 2
    WHEN 1960 THEN 3
    WHEN 1970 THEN 4
    WHEN 1980 THEN 5
    WHEN 1990 THEN 6
    WHEN 2000 THEN 7
    WHEN 2010 THEN 8
END

Column
INT(RIGHT(STR([Year]),1)) + 1

After removing some of the fields that are no longer required, we are now ready to output our data which is ready to create Kate's viz in Tableau Desktop!

The output should look like this: 



Step 4 - Min/Max Scores 

For the second part of the challenge, we want to find out what the min/max scores were for each decade. First, we need to find what the Min and Max scores were for each decade using a Fixed LOD, we are doing this for both total score and round score: 

Min Round Score 

Max Round Score



Min Total Score


Max Total Score

Finally, to remove any duplicated rows due to the data densification, we use an aggregate tool with the following setup:


This is now ready for our second output which should look like this: 


Both full outputs can be downloaded here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14@JonathanAllenby & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text