Preppin' Data

Posts

Showing posts from August, 2023

2023: Week 35 - HR Month - Transfers

August 30, 2023

Challenge by: Kelly Gilbert Validated by: Charalambos Pavlou Occasionally, employees may want to transfer from one distribution center to another (for example, if the employee wants to move to a different city). The company doesn’t have a consistent policy or process for that. The HR manager has asked us to figure out how often it happens, so the HR team knows if they should prioritize working on that policy. Our HR recording system has a few issues, so we’ll need to make some assumptions for this analysis. Input The DC/month assignments from Week 33 (ee_monthly_v4.csv) Requirements Input the data Flag when transfers occurred: A transfer occurs when the employee’s dc_nbr changes from month to month, and the difference in months is <= 2. Our HR system doesn’t differentiate between an employee transferring from one DC to another vs. the employee quitting and then being rehired at another DC later, so we are going to make the assumption that if the gap is no more than ...

2023: Week 34 - Solution

August 29, 2023

Video Solution Solution by Tom Prowse and you can download the workflow here . We are onto the penultimate challenge of our HR month and we are looking at turnover rates this week. Step 1 - Every employee & every month After inputting our data source, we first want to aggregate the data so we have a single row for each employee, their hired date and their latest leave date. Within the aggregation we can group by dc_nbr, employee_id and hire_date, then find the max leave_date: We then want to find the start of the month when an employee has left the company. This will be used when we are creating the new rows for the months between their hire and leave dates: leave_date_start_of_month DATE(DATETRUNC('month',[leave_date])) Then we can fill in any null values with a standard date of 2023-06-30: leave_date_start_of_month DATE(IFNULL([leave_date_start_of_month],#2023-06-01#)) We are now ready to create a row for each of the months between the hire and leav...

2023: Week 34 - HR Month - Turnover Rate

August 23, 2023

Challenge by: Kelly Gilbert Validated by: Emma Jones Turnover is a commonly-used HR metric that measures employee churn, or how quickly employees move in and out of the organization. Using the same dataset from last week, the HR manager has asked us to calculate the monthly and prior-12-month turnover for each DC and month. There are many ways to calculate turnover, but for this exercise, we’ll use: [# employees leaving during the period] / [avg # employees during the period] * 100% where [avg # employees] = ([# employees at the end of the prior period] + [# employees at the end of the current period]) / 2 Example: Employees at the end of Dec 2022 = 90 Employees at the end of Jan 2023 = 110 Employees leaving during Jan 2023 = 5 Avg employees in Jan 2023 = (90 + 110) / 2 = 100 Turnover in Jan 2023 = 5 / 100 * 100% = 5% However, before we can calculate the turnover, we need to fix a data quality issue. Input The DC/month assignments from last week (ee_monthly_v4.c...

2023: Week 33 - Solution

August 22, 2023

Solution by Tom Prowse and you can download the workflow here . We are onto the 3rd challenge of our HR month where we look at employees tenure and also making consistent reports, even when data is missing. Step 1 - Tenure First we can input the data so ee_monthly_v3, from here we can calculate the tenure in months by finding the difference between the hire and leave dates: Tenure_Months IF ISNULL([leave_date]) THEN DATEDIFF('month',[hire_date],[month_end_date]) ELSE DATEDIFF('month',[hire_date],[leave_date]) END Then we want to identify whether they were there for a full or partial month: Tenure_Months IF ISNULL([leave_date]) THEN [tenure_months] ELSE (IF DAY([leave_date])<DAY([hire_date]) THEN [tenure_months]-1 ELSE [tenure_months] END) END Then finally we can round down to the nearest number so that we are only looking at full months and calculating the years of service: Tenure_Years FLOOR([tenure_months]/12) We can then output our data set relating t...

2023: Week 33 - HR Month - Combinations

August 16, 2023

Challenge by: Kelly Gilbert Validated by: Viktoriia Slaikovskaia The HR analyst used the data from last week to build a dashboard. The DC managers found it very useful, and they requested some new features. First, we need to add the employee’s tenure (how many months and years they have worked at that particular DC) to the dataset. Second, the HR analyst would like to keep the reports consistent from DC to DC, so they requested an aggregated dataset that fills in zeroes if a DC does not have any employees in a specific demographic group each month. For example, DC #1 did not have any employees in the 60-64 years age group for the month of February 2019, so we need to add a row for that combination, with 0 employees. Inputs 1. The employee dimension table from last week (ee_dim_v3.csv) 2. The DC/month assignments from last week (ee_monthly_v3.csv) Requirements Input the data Using the monthly table , calculate each employee’s tenure For the tenure_months, we want ...

2023: Week 32 - Solution

August 15, 2023

Solution by Tom Prowse and you can download the workflow here . For this week's challenge we continue with our HR themed month, and this time we want to categorise people based on their age so we can run some reporting on this data set. Step 1 - Generation First we want to input the generations data set where we can create a new field that contains information about the generation name. Generation Name [generation] + " " + (IF ISNULL([start_year]) THEN "(born in or before " + STR([end_year]) + ")" ELSEIF isnull([end_year]) THEN "(born in or after " + STR([start_year]) + ")" ELSE "(" + STR([start_year]) + "-" + STR([end_year]) + ")" END) This brings together the information about the generation and the start/end years into a single label. We can then fill in the null start and end years with a year that won't change our calculations too much. We've used 150 years ago and the latest date, b...

2023: Week 32 - HR Month - Reshaping Generations

August 09, 2023

Challenge by: Kelly Gilbert Validated by: George Pattinson Now that all of the missing IDs have been filled in, HR would like us to add several age groupings to the dataset for reporting, so the employee’s exact date of birth can be removed from the dataset. For this challenge, we will be using the Pew definitions for each generation, which can be found in the generations.csv input. Inputs 1. The list of generations, with start and end date (generations.csv) 2. The employee dimension table from last week (ee_dim_v2.csv) 3. The DC/month assignments from last week (ee_monthly_v2.csv) Requirements Input the data Add a new column, generation_name, which includes the generation name and the start/end years, example: “Generation X (1965-1980)” If the generation doesn’t have a start year, the text should be “(born in or before XXXX)” If the generation doesn’t have an end year, the text should be “(born in or after XXXX)” Calculate the employee’s birth year from the date_of_birth Joi...

2023: Week 31 - Solution

August 08, 2023

Solution by Tom Prowse and you can download the workflow here . This is our first week where we revisit another HR focused month with some challenges provided by Kelly Gilbert . For the first challenge we look at filling in missing IDs from various different tables. Step 1 - Lookup Table First we want to create a lookup table to identify the unique IDs from each table. We can input both tables and create an aggregate step after each step. Within these aggregations we want to make a unique list of the employee id's and the guid's so we can group by both of these fields: We can then use a union step to bring both branches together and create a long table with all the IDs: From there we can remove the Table Names field and then filter the table to remove any fields that are Null in both the employee id and guid fields. We can filter using this calculation: NOT ISNULL([employee_id]) AND NOT ISNULL([guid]) Then finally we can use another aggregation ...