2023: Week 17 - Solution

 


Solution by Tom Prowse and you can download the workflow here


Step 1 - Population Data

First we want to input the Population table and make sure that we have trimmed all of the spaces from the Country Name field so we have no leading or trailing spaces.


We can then pivot using a columns to rows pivot using all of the years as our pivoted fields. The pivot names become Year, and pivot values becomes Population.

After the pivot we can clean the Country Name by ensuring that the data role is a Country, and tidying any countries that aren't recognised or spelt incorrectly.


Step 2 - Country Size Data

Next we want to input the Country Size table and then split out the km2 value in the Land Size field. We can do this by splitting the first value before the '(', then renaming the split field to Land in km2 and making sure it's a decimal number: 


We can then clean the Country / Dependency field by trimming any spaces and then removing any countries included in brackets by splitting off just the first occurrence. There is also a row that contains Canada[Note1] so we also need to split off the first part before the '[' as well. 

Again, we want to make sure the country data role is correct and then make any changes based on the suggestions. After these changes there are multiple rows for some countries, therefore we can aggregate so we have a total per country.


Step 3 - Combine Data Sets

We can now join both data sets using an inner join on the Country field. 


Then filter the Country field to exclude the 'World' values and also calculate the Population Density:

Population Density
[Population]
/
[Land in km2]


Step 4 - Population Growth

First we want to create two parameters so that we can compare the growth between two years. These are going to be 2 whole number parameters that are set to 2000 and 2021



We can then use those parameters in our workflow to filter the years involved where:

[Year]=[Parameters.Select Second Year] 
OR
[Year]=[Parameters.Select First Year]

Then we can use a rows to columns pivot to have a field for each year and the total population density


After renaming the fields the table should look like this


Step 5 - Outputs

We are now ready to create our two outputs. 

Top 10 in 2021

To find the top 10 countries with the highest population density in 2021, we can remove the 2020 field then rank the 2021 field in descending order: 


Then filter the rank field so we only have the top 10 values so the output looks like this: 


Top 10 % Change

On a separate branch we can calculate the % change between 2000 and 2021: 

% Change in Population Density

100*([Population Density 2021]-[Population Density 2000])
/
[Population Density 2000]

Then rank the % change field:


Then we can filter for the top 10 and the output should look like this: 


You can download all the output from here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @SerenaPurslow, @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text