2021: Week 6 - Solution

 


Solution by Tom Prowse and you can download our full workflow here

This week's challenge was all about exploring the data within Tableau Prep before bringing it into Desktop. We took the data set from last month's #SportsVizSunday and looked at comparing the prize money for Women and Men's golf.

As always there are multiple ways of completing the challenge, I've given you a couple of options here. Both use similar techniques but are just broken out in a couple of ways that might help others understand or when handing over a workflow. 

For alternative solutions make sure you check out the submissions on Twitter and the Tableau Community forum.

Solution 1

Step 1 - Total Prize Money

The first question that we need to answer is 'What's the Total Prize Money earned by players for each tour?'. We can answer this with a simple aggregate where we group by Tour and Sum Money:



Step 2 - Number of Players

Question number two was - 'How many players are in this dataset for each tour?'. Again we can use an aggregate to calculate this, but this time we want to group by Tour and Count Distinct Player Name:


Step 3 - Number of Events

The next question is 'How many events in total did players participate in for each tour?' and following from the previous two we can again answer this with an aggregate tool. In this aggregate we group by Tour and Sum Events:



Step 4 - Avg Money per Event

The final two calculations are a little more tricky compared to a simple aggregate. To answer 'How much do players win per event? What's the average of this for each tour?' we first need to calculate the per event winnings and then average this per tour. 

Therefore, on the first step that we created after the input tool, we want to calculate the money won per event using the following calculation: 

Money per Event 
[MONEY]/[EVENTS]

After we've calculated this we can then use an aggregate tool, by grouping on Tour and Avg Money per Event:



After the aggregate, we can clean this field up slightly by rounding the avg value to a whole number:

Money per Event 
Round(Money per Event)

Note the difference, in results, between this calculation and changing the data type to a Number (Whole).

Step 5 - Avg Rank Difference

The final question is to calculate the avg difference between the players tour rank and overall rank. So to break this down, first we need to rank each player within each tour by how much money they have won. Then we need to do a second rank, where we rank the players overall (not taking Tour into account). We can do both of these ranks on the initial clean step after the input. 

Rank per Tour 


Overall Rank 

From here we can then calculate the difference between the overall and per tour ranking using the following calculation: 

Difference in Rank 
[Rank Overall]-[Rank per Tour]

Now we have calculated the difference between the two ranks, we can use this in an aggregate tool to find the avg difference for each tour. In the aggregate we group by Tour and Avg Difference in Rank:



Step 6 - Union Question Answers

Now we have answered all of the questions, it's time to bring these all together. First, I have created a new calculated field, after each of the aggregate tools, containing a simple string label for each answer. For example, for question 1 I have the label 'Total Prize Money' and so on. 

Once each branch is labelled, we have used the union tool to bring them all together. Initially, the union will match with the Tour and Measure (label) field, however all of the measure numbers will be in a separate column:


To overcome this we can use the union functionality to match the mismatched fields together. Using the options on the left-hand side, we can select a field then press the + when hovering over another field: 


Once we have done this for all of our measures, then we can remove the Table Names field and our table should look like this: 


Step 7 - Calculate Difference

The final part of this challenge is to calculate to difference between the PGA & LPGA tours. To do this we need each of the tours to have their own column, therefore we need to pivot our data. We can use a Rows to Columns pivot to create a new column for each of the tour values: 



Then finally we can calculate the difference between the tours using the following calculated field: 

Difference Between Tours 
[LPGA]-[PGA]

Then we have our final output that looks like this: 



This solution is a little bit longer than the alternative below, but it breaks out each of the aggregate functions so could potentially be easier to follow if you were handing over the workflow. 

The following is an alternative which is a bit more compact.

Solution 2

Step 1 - Calculations

The first step with the solution is to create a clean step after the input, and repeat the same calculations as in the previous solution. These include Money per Event, Rank per Tour, Rank Overall, and Difference in Rank. 

Step 2- Aggregate

We are now in a position to aggregate our results. However, unlike last time we are going to combine these into a single aggregate step with the following setup:


At this stage, we can round the Avg Money per Event field just like we did in the previous example. 

Our data should now look like this:



Step 3 - Pivot

Now we have all our answers, we need to restructure the data so that we can calculate the difference between each tour. 

First we want to pivot so that all of the Measures are in a single column: 


Then we want to use a second pivot step to create a column for each tour by using a Rows to Columns pivot: 



Step 4 - Calculate Difference

The final step is to calculate the difference between each Tour by using the same calculation as in the previous solution: 

Difference between Tours 
[LPGA]-[PGA]

As a result we again have the same output that looks like this: 


Hopefully this has shown that there are multiple different ways to solve a challenge, and that sometimes you can revisit your initial thought process to make the workflow more compact. 

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text