2024: Week 49 - Solution



Solution by Tom Prowse and you can download the workflow here




Step 1 - Input Files

The first step is to input all of the sheets from the Excel file. For this we can connect up to the Excel document, connect to one of the sheets and then use the 'Union Multiple Tables' in the input step to bring all of the files into a single input. We don't need to change any of the settings as we want to bring in everything: 



After the sheets are all in we can make some changes to extract the jersey number and clean so other fields. 

To get the jersey number, we can duplicate the Name field and then use the in-built functionality to remove any letters and punctuation from the duplicated field. 



Once we have done this we can rename to number and make sure it's a whole number. 

We can also create a 10cm grouping based on the heights of players. We'll use this later in the flow so it's good to calculate this earlier so that it can be reused.

Height 
ROUND(FLOAT(LEFT([HT],4)),1)

We can also remove the letters from the HT field and make that a decimal number. 

At this stage the table should look like this and we will use this as our starting step for each of the questions that we'll answer below: 



Question 1 -  What's the lowest number not used on a jersey in the NBA?

For this we can create a new clean step and remove any players who don't have a number (exclude Nulls). We can also remove any other fields so we just have the Name, Number, and Height fields remaining.

We now want to pad out the numbers so that we can identify what numbers are missing from the ones that have been allocated. For this we can create a New Rows step based on the Number field: 


As the new rows step will identify any numbers that are missing and create a 'Null' row within the Name field. We can then keep only the null values and then use an aggregation step to find the minimum jersey number: 


This will answer our first question. 


Question 2 - What's the average salary per 10cm of height? 

Next up we want to create a new branch from the first clean step. This time we can clean the Salary field so that we remove the $ symbol and make it a number field. 

Salary 
FLOAT(REPLACE([SALARY], '$',''))

We can then use an aggregation to group by the Height and find the Avg Salary: 


After rounding this value we will have the answer to question 2: 

Salary
ROUND(([SALARY]/1000000),1)



Question 3 - What's the shortest average height team in the NBA? What's the tallest team?

Again we want to start at the first clean step and this time we want to create an aggregation step where we group by Team and find the Avg HT. This time we want to use the actual height of players and not the 10cm groups.



We can then use a Rank calculation to make sure the teams are in height order:

Height Rank 

We can then use this table to answer question 3: 



Question 4 - Do taller players choose larger jersey numbers?

Finally, the last question that we want to answer is around jersey numbers and height. Again, we want to create a new branch from the first clean step, and this time we want to aggregate by grouping by Height and averaging the Number: 



We can then round the avg numbers: 

Number 
ROUND([Number],1)

Then we can use this table as the answer to our final question: 



After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text