2024: Week 50 - Solution


Solution by Tom Prowse and you can download the workflow here.


Step 1 - Reshape Data

Our first task is to combine our two tables so that we have a single table to work from. This involves a few different steps and joins to get the data in the right shape.

First we want to input both of the tables and join these together using an inner join where country = entity. 


From here we can remove the entity field and rename the Life Expectancy at Birth to Country: Life Expectancy at Birth.

We also want to include the continent information, so from the List of Countries input, we can create an aggregate step and group by Continent. This will create a list of all the continents which we can use to join back to our Life Expectancy table.



With the second join this time we want to use an inner join where entity = continent: 


We can then remove the Entity and Code fields and rename the Life Expectancy at Birth field to Continent: Life Expectancy at Birth. 

Now we have both of these tables ready, we can join them together using a left join where we include all of the values from the Country table. We want this join to occur when on the Year and Continent fields. 



Step 2 - Life Expectancy Higher than Continent Average

First we want to filter the data so we only have values for 1950 - 2020. We can do this by using a range of values filter on the Year field: 



Then we can calculate whether it's above average or not: 

Above Continent Avg?
IF [Country: Life expectancy at birth]>[Continent: Life expectancy at birth]
THEN 1
ELSE 0
END

Then we can calculate the % change by using a lookup calculation: 

% Change 
 


We can then make sure this is rounded to 1dp: 

% Change 
ROUND(100*[% Change],1)

At this stage the table should look like this: 



Step 3 - Top Countries per Continent

The last step is to identify the top three countries per continent with the highest percentage increase in life expectancy.

For this we first need to aggregate to a country and continent level so we need to use an aggregate step where we group by Country and Continent, and then find the Max % Change, Sum of Number of Rows and Sum of Above Continent Avg? 


From here we can rename Number of Rows to Number of Years, and Above Continent Avg? to Number of Years above Avg. 

We can then use these fields to calculate the % of Years Above Continent Avg: 

% Years Above Continent Avg 
ROUND(
100*[Number of Years Above Avg]
/
[Number of Years]
,1)


Then finally we can rank the customers based on the % change: 

Rank 

Then from here we want to keep numbers 1, 2, & 3 from the rank field to identify our top 3 countries from each continent. 

Our final table should look like this: 


You can view the output here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text