2024: Week 28 - Solution


Solution by Tom Prowse and you can download the workflow here.



Step 1 - Singles Championship

First we focus on cleaning the Singles winners and getting the data into a place where we can union it with the other tables. 

Initially we want to remove any null values from the Men field, and then create an ID field based on the year: 

ID 
IF LEN([year])>4
THEN RIGHT([year],1)
END

We can then use this to make sure the year is numeric: 

Year 
IF ISNULL([ID])
THEN [year]
ELSE LEFT([year],LEN([year])-1)
END

After this we can change the Year to a whole number and then filter to keep the ID that equal 3 or null. 

At this stage we can remove any extra fields so that we have the Year, Men, and Women fields.

Next we need to pivot the data so that we have the men and women in a single column. For this we use a columns to rows pivot: 



We can then rename any fields and rename the Men and Women rows to be Men's or Women's Singles: 

Tournament 
IF [Tournament]='men'
THEN "Men's Singles"
ELSE "Women's Singles"
END

Then finally we want to split the Champion field so that we separate the name and the country. For this we can use the automatic split and then rename the fields so the table looks like this: 



Step 2 - Doubles

Now we can turn our focus on the Doubles table. For this we want to repeat the steps for the Singles workflow so that we have removed the nulls, created an ID field, and then made sure the Year is numeric: 


We then want to do something slightly different as we want to remove any years where the tournament was cancelled. For this we need a new branch where we keep only the null values in the men field. We can then split off the first number from each row using a left function: 

ID 
LEFT([year],1)

We can then join this back onto our doubles branch using an outer join on the ID field, so that it removes any rows that were highlight as cancelled.


After this we can pivot the table so we have a single column for men and women, then instead of splitting out the country we need to use a custom split on a ',' to split out each doubles partner name. 

Now we have both partners in a separate field we can use a columns to rows pivot on the two split fields to create a row for each partner.



The table should then look like this: 



Step 3 - Mixed Doubles

Finally we can focus on the mixed doubles table. For this we want to remove any nulls from the runners-up field and then create a Tournament fields where every row is equal to 'Mixed Doubles'.

We can then split out the two winners by splitting the Champions field. However, there is a slightly tricker situation as this contains a '\n' which is a newline function.

First we need to replace the '\n' with something that we can split on: 

Champions 
REPLACE([Champions],CHAR(10),',')

Then we can split the Champions field by the ',' so that we have a field for each partner. The table should look like this:




We are now ready to union each table together where all the tables should 'stack' on top of each other and look like this: 


Step 4 - Prepare Output

Now we have all of the inputs in a single table we can calculate the most recent win for each Champion:

Most Recent Win 

We then want to transform the shape of the data so that each of the different types of play are in a separate field. For this we want to use a rows to columns pivot on Tournament and Year: 



Then we can filter on people who have won both singles and doubles by using this calculation within a filter: 

([Men's Singles]+[Women's Singles]>0)
AND
([Men's Doubles]+[Mixed Doubles]+[Women's Doubles]>0)

Then finally we can rank the players based on total championships: 

Total Championships 
[Men's Doubles]+[Men's Singles]+[Women's Doubles]+[Women's Singles]+[Mixed Doubles]

Rank 


Finally we can create a Gender field based on their wins: 

Gender
IF [Men's Singles]>0
THEN 'Man'
ELSE 'Woman'
END

We can then output our data that should look like this: 



You can view the outputs here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text