2020: Week 7 Solution



The Challenge


This week's challenge is all about the common issue of counting something that no longer exists in a data set when processing data. The challenge is focused on looking at employees that have left the business over time as well as understanding which employees still remain.


Step 1: Combine Tables

The first step is to create a full list of current employees and employees who have left the business within the time period. We have a table for each set of employees (current and leavers), both of which have similar fields, apart from the current employees table doesn't have a 'Leave Date' field. Therefore, for ease, we are going to use a calculated field to create a leave date of 1st March 2020 for all current employees:

Leave Date
MAKEDATE(2020,3,1)

Now we have the same fields on each table we are able to 'stack' these on top of each other using the union tool, and therefore create a full list of employees. The union tool creates a 'Table Name' field which we can use to identify if the employee is currently working or is a leaver (we recommend updating the field names to Current & Leaver so they are easy to identify).

Step one is complete and the flow should look like this:



Step 2: Join Reporting Dates

After using the union to create a full list of employees, we now need to combine the 3rd table which contains the reporting dates. The reporting dates table contains a monthly date, meaning we will need to alter our employee join/leave dates so that we can join the tables together nicely.

We can use the DATETRUNC() function to truncate the date to the month-year level, allowing us to use this to join to the reporting table. We also need to convert the 'Month' field in the reporting table to a Date field.

Join Month
DATE(DATETRUNC('Month',[Join Date]))

Leave Month
DATE(DATETRUNC('Month',[Leave Date]))

Now we have a common field for the join, we can use a scaffold technique to join the employees and reporting tables. More on the scaffolding technique can be found on our 'How to... Scaffold your Data' post.

The join conditions need to be set up so that we join any reporting months that are greater than or equal to our joining month, and also any reporting month that is less than our leaving month. The join should look like this:


We have now completed step 2, so our flow should look something like this:



Step 3: Calculate Measures

The final step of this challenge is to create the following measures for our analysts to use:
  • Number of employees per month
  • Avg salary per employee per month
  • Total salary paid by the company per month
Our first task is to clean the salary field and convert this to a number. We can't simply change the data type due to the currency symbol and the commas, therefore we need to use the 'Clean' functionality and 'Remove Punctuation' before converting to a decimal.

In order to calculate the different measures we are going to use an aggregation tool. We are going to be calculating both Avg and Total salary per month, this means we will need to use the Salary field twice. However, you can only use a field once within an aggregation tool, meaning we will need to duplicate the salary field before bringing in the aggregate. 

We have three measures to calculate, all of which are going to be 'per month'. We need to group by the 'Month' field, and then use the following to calculate each of the measures: 
  • Number of Employees - Count Employee ID field
  • Avg Salary - AVG Salary field
  • Total Salary - SUM Salary-1 field
The aggregate will look like this: 



The final task for step 3 is to tidy up any fields by renaming so that they are suitable for the required output.



Output

We've now completed the task and our output should look like this:


Make sure you share your solutions with us on Twitter using #PreppinData and complete our participation tracker!

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text