2022: Week 49 - Solution


Solution by Tom Prowse and you can download the workflow here.


We are starting our very first HR month this week with the challenges coming from long term participant Kelly Gilbert. We are going to be working through the real world challenges over the next few weeks and hopefully build up some useful examples from the HR space.

The first challenge this week is all about the hiring process and identifying the latest entry in a process.

Step 1 - Most Recent File

After inputting our Status History file, we want to extract the date from each of the filename fields. Unfortunately this doesn't have a common naming convention, however the date is always the 19 characters before the .csv part at the end.

There are a couple of ways that you could extract this... You could use a Right() function to return all 19 characters and the .csv, and then remove the .csv part.

Alternatively, you could use a regex calculation that looks like this:

filedate 
REGEXP_EXTRACT([filename],'(\d{4}-\d{2}-\d{2})')

This calculation identifies the numbers in the correct format and extracts this from the string. We can then change this field to a Date and we have our extracted date. 



We then want to remove any fields that aren't the latest date. Again there are a couple of different ways to do this...

1. LODs 
Using a Fixed LOD to calculate the Max FileDate for each Position and Candidate, we can then use this within a filter to only return values where the LOD = Filedate.


2. Aggregate & Join
Alternatively, we can use an aggregation to create the Max Date for each (group by) candidate and position: 

Then from this aggregation we can join this back to the original table using the three fields from the aggregation (candidate id, position id, & filedate). This join will act as a filter and only return the rows with the latest filedate.


As a result, we can now output our first table for Output 1: 


Step 2 - Most Recent Status

Now we have identified the latest file we can now also identify the latest status for each candidate and position combination.

We are going to do a similar technique to identify the latest position. Again, you could use the LOD method, but we have created an aggregation to find the Max ts (we renamed filedate to ts) by grouping by Candidate id and Position id: 


Then joining back to the previous clean step using candidate id, position id, and ts: 



After the join we only need to keep the Candidate id, Position id, and status fields. Then rename status to current status and we are ready to output our table: 

You can download the full outputs here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14@TomProwse1 & @kelly_gilbert

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank