2023: Week 32 - HR Month - Reshaping Generations

August 09, 2023

Now that all of the missing IDs have been filled in, HR would like us to add several age groupings to the dataset for reporting, so the employee’s exact date of birth can be removed from the dataset.

For this challenge, we will be using the Pew definitions for each generation, which can be found in the generations.csv input.

Inputs

1. The list of generations, with start and end date (generations.csv)

2. The employee dimension table from last week (ee_dim_v2.csv)

3. The DC/month assignments from last week (ee_monthly_v2.csv)

Requirements

Input the data
Add a new column, generation_name, which includes the generation name and the start/end years, example: “Generation X (1965-1980)”

If the generation doesn’t have a start year, the text should be “(born in or before XXXX)”
If the generation doesn’t have an end year, the text should be “(born in or after XXXX)”

Calculate the employee’s birth year from the date_of_birth
Join the employee data and generation data to get the generation name for each employee

If the employee’s birth date is missing, the generation_name should be “Not provided”

Join the monthly data to the employee data on employee_id.
Calculate the employee’s age (in full years) as of the month_end_date.
Calculate the employee’s age range, in 5-year increments, and name that column age_range:

Employees under 20 should be grouped into “Under 20 years”
Employees between 20 and 69 should be grouped into 5-year increments (“20-24 years”, “25-29 years”, etc.)
Employees 70 and over should be grouped into “70+ years”
If the employee’s birth date is missing, the age_range should be “Not provided"

Output the data (two datasets):

For the the employee data, keep only the original columns + the new generation_name column, and remove the date_of_birth and birth year columns
For the monthly data, keep only the original columns + the new age_range column.

Outputs

1. The employee dimension table, with generation_name added (ee_dim_v3.csv)

10 fields

employee_id
guid
first_name
last_name
generation_name
nationality
gender
email
hire_date
leave_date

537 rows (538 including headers)

2. The monthly table, with age_range added (ee_monthly_v3.csv)

7 fields

employee_id
age_range
guid
dc_nbr
month_end_date
hire_date
leave_date

7,440 rows (7,441 including headers)

You can download the output from here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @kelly_gilbert, @_GPattinson, @Datajedininja, @JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!

Search This Blog

Preppin' Data