2023: Week 32 - Solution



Solution by Tom Prowse and you can download the workflow here



For this week's challenge we continue with our HR themed month, and this time we want to categorise people based on their age so we can run some reporting on this data set.

Step 1 - Generation

First we want to input the generations data set where we can create a new field that contains information about the generation name. 

Generation Name 
[generation] + " " + 

(IF ISNULL([start_year]) THEN "(born in or before " + STR([end_year]) + ")"

ELSEIF  isnull([end_year])
THEN "(born in or after " + STR([start_year]) + ")"

ELSE "(" + STR([start_year]) + "-" + STR([end_year]) + ")"

END)

This brings together the information about the generation and the start/end years into a single label. 

We can then fill in the null start and end years with a year that won't change our calculations too much. We've used 150 years ago and the latest date, but you can use a static date if preferred.

Start Year
IFNULL([start_year],YEAR(TODAY())-150)

End Year
IFNULL([end_year],YEAR(TODAY()))

At this stage our data should look like this: 


Step 2 - Output 1

To complete our first output we want to combine the dim_v2 table with our existing workflow. To do this we first need to calculate the birth year for each of our employees.

Birth Year
YEAR([date_of_birth])

We can then join both branches together by using an outer join where we keep all of the fields from the dim_v2 table. We want the join conditions to be where birth year >= start year and birth year is <= end year, so in other words where the birth year is between the start and end date: 


Then we can replace any null values in the generation name field with 'Not Provided' and remove any fields that aren't required for the output.

The 1st output should look like this: 


Step 3 - Output 2 

Now we can start working on our 2nd output where we want to include the monthly_v2 data table as well. Here we want to join the monthly table with our dim table using an inner join and the employee id field: 



We can then calculate the age of each employee: 

Age
FLOOR(
DATEDIFF('month',[date_of_birth],[month_end_date])
/
12)

Then once we have calculated the age of the employee we can categorise their age: 

Age Range 
IF [age] < 20
THEN 'Under 20 years'

ELSEIF [age] >= 70
THEN '70+ years'

ELSEIF ISNULL([age])
THEN 'Not Provided'

ELSE 
  STR(floor([age]/5)*5) + '-' 
  + STR(floor([age]/5)*5 + 4) + ' years'
END

We can then remove any additional fields that aren't required for the output and our final table should look like this: 



You can download the output from here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @kelly_gilbert@_GPattinson@Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text