2023: Week 33 - HR Month - Combinations

Challenge by: Kelly Gilbert

Validated by: Viktoriia Slaikovskaia

The HR analyst used the data from last week to build a dashboard. The DC managers found it very useful, and they requested some new features.

First, we need to add the employee’s tenure (how many months and years they have worked at that particular DC) to the dataset. 

Second, the HR analyst would like to keep the reports consistent from DC to DC, so they requested an aggregated dataset that fills in zeroes if a DC does not have any employees in a specific demographic group each month. For example, DC #1 did not have any employees in the 60-64 years age group for the month of February 2019, so we need to add a row for that combination, with 0 employees.

Inputs

1. The employee dimension table from last week (ee_dim_v3.csv)

2. The DC/month assignments from last week (ee_monthly_v3.csv)

Requirements

  • Input the data 
  • Using the monthly table, calculate each employee’s tenure
    • For the tenure_months, we want to know the number of full months between the employee's hire_date and either the month_end_date or leave_date - whichever is soonest
    • For the tenure_years, we want to translate the tenure_months into full years
      • i.e. 32 month would be 2 years of tenure
  • Output the updated monthly table with the two new tenure columns
  • Join the ee_dim table to the monthly data on employee_id to get the employee attributes, gender, nationality, and generation
  • Create a summary record for each DC/month/demographic:
    • For each DC, month, and generation name, count the number of employees 
    • Name the employee count “ee_count”
    • Rename the generation_name column to “demographic_detail”
    • Add a new column, demographic_type, which will have the same string in every row, “Generation Name”
    • Repeat above steps for gender, nationality, age_range, and tenure_years
    • Union all of the demographic summaries into one dataset
  • Expand this new dataset so there is a row for every combination of DC/month_end_date, and demographic_type/demographic_detail
    • Only include the relevant months for each DC (for example, DC 6 did not open until November 2019, so don’t include records earlier than that for DC 6.
    • Only include relevant details for each demographic. For example, you shouldn’t have any records where the demographic_type is “Tenure” and the demographic_detail is a generation name. 
  • If there were no employees in a category for the month, fill the ee_count with zero
  • Output the demographic summary

Outputs

1. ee_monthly_v4.csv 
  • 9 fields
    • dc_nbr
    • month_end_date
    • employee_id
    • guid
    • hire_date
    • leave_date
    • tenure_months
    • tenure_years
    • age_range
  • 7,440 rows (7,441 rows, including the header)
2. ee_demographic_summary.csv 
  • 5 fields
    • dc_nbr
    • month_end_date
    • demographic_type
    • demographic_detail
    • ee_count
  • 14,758 rows (14,759 rows, including the header)
You can download the output from here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @kelly_gilbert@ViktoriiaSSS@Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text