2023: Week 32 - HR Month - Reshaping Generations

Challenge by: Kelly Gilbert

Validated by: George Pattinson

Now that all of the missing IDs have been filled in, HR would like us to add several age groupings to the dataset for reporting, so the employee’s exact date of birth can be removed from the dataset.

For this challenge, we will be using the Pew definitions for each generation, which can be found in the generations.csv input.

Inputs

1. The list of generations, with start and end date (generations.csv)

2. The employee dimension table from last week (ee_dim_v2.csv)

3. The DC/month assignments from last week (ee_monthly_v2.csv)

Requirements

  • Input the data
  • Add a new column, generation_name, which includes the generation name and the start/end years, example: “Generation X (1965-1980)”
      • If the generation doesn’t have a start year, the text should be “(born in or before XXXX)”
      • If the generation doesn’t have an end year, the text should be “(born in or after XXXX)”
  • Calculate the employee’s birth year from the date_of_birth
  • Join the employee data and generation data to get the generation name for each employee
    • If the employee’s birth date is missing, the generation_name should be “Not provided”
  • Join the monthly data to the employee data on employee_id.
  • Calculate the employee’s age (in full years) as of the month_end_date.
  • Calculate the employee’s age range, in 5-year increments, and name that column age_range:
    • Employees under 20 should be grouped into “Under 20 years”
    • Employees between 20 and 69 should be grouped into 5-year increments (“20-24 years”, “25-29 years”, etc.)
    • Employees 70 and over should be grouped into “70+ years”
    • If the employee’s birth date is missing, the age_range should be “Not provided"
  • Output the data (two datasets):
    • For the the employee data, keep only the original columns + the new generation_name column, and remove the date_of_birth and birth year columns
    • For the monthly data, keep only the original columns + the new age_range column.

Outputs

1. The employee dimension table, with generation_name added (ee_dim_v3.csv)

  • 10 fields
    • employee_id
    • guid
    • first_name
    • last_name
    • generation_name
    • nationality
    • gender
    • email
    • hire_date
    • leave_date
  • 537 rows (538 including headers)
2. The monthly table, with age_range added (ee_monthly_v3.csv)
  • 7 fields
    • employee_id
    • age_range
    • guid
    • dc_nbr
    • month_end_date
    • hire_date
    • leave_date
  • 7,440 rows (7,441 including headers)

You can download the output from here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @kelly_gilbert@_GPattinson@Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank