Preppin' Data

Posts

Showing posts from December, 2022

2022: Week 52 - Find Sub-Departments from an employee hierarchy (HR Month)

December 28, 2022

Challenge By: Kelly Gilbert It's the final challenge of HR month already and the final challenge of 2022! Thanks again to Kelly for helping us round out the year with some really interesting real-life challenges. We'll begin 2023 with a beginner month as is tradition, followed by an intermediate month and then our first ever advanced month! Thanks to all who participated this year, we hope you're all having a wonderful festive season. ___________________________ The HRIS (HR information system) keeps track of an employee’s department, but sometimes it is helpful to know the sub-department for reporting and analytics. Inputs Employees: Teams: Requirements Input the data From the employee data, calculate each person’s hierarchy_level The employee_id_hierarchy field contains a person’s supervisory hierarchy flattened into a string (e.g. | CEO’s ID | Vice President’s ID | Director’s ID | Manager’s ID | Employee’s ID |) To calculate the hierarchy level, count the number of ...

2022: Week 51 - Solution

December 27, 2022

Solution by Tom Prowse and you can download the workflow here . We continue with our HR challenges for this week with a focus on parse data that is all within a long string value. This can be a useful technique to transfer across to other problems and not just one that someone working within HR will encounter. Step 1 - Parse Fields First we want to remove the rows with no data in them (Supervised & Industry Experience). We can do this straight from the tick boxes in the input step. Next we want to parse the single string into 5 separate fields. We can use a split for this as most of the are split by a ',', therefore we can use this as a separator to get some of the fields. This results in 4 separated fields and we can remove the original Work Experience field, then filter to exclude any null values from the Split 2 field. This filter removes any fields that just contain the year of application. To get the 5th field, we can again split off the number in the parentheses by u...

2022: Week 51 - Parsing an Unfortunate Report Format (HR Month)

December 21, 2022

Challenge by: Kelly Gilbert This is the third challenge in our special HR month. Thanks to long time participant Kelly Gilbert for sharing her real world challenges she faces in her role with data prep. ______________________________ The Talent Acquisition team would like to understand if we are attracting qualified candidates to apply for open roles. They are able to extract a report from the applicant tracking system, but it needs some preparation before they can complete their analysis. Input All of the data is in one long text string : Requirements Input the data Remove the header rows (row 1 and year headers) Parse five fields from each row: Application Month Work Experience Number Supervised Industry Experience Candidate Count (the number in parentheses) Remove blank columns and the original data column Convert the application month to the month-ending date For each month, calculate the % of applicants who meet the preferred qualifications. Preferred qualificati...

2022: Week 50 - Solution

December 20, 2022

Solution by Tom Prowse and you can download the workflow here . We continue with our HR themed challenges this week and take a look at when a candidate is withdrawing from the hiring process. Hopefully this analysis can allow the Talent Acquisition team understand if there are any trends or areas that they need to address. Step 1 - Rank Status First we want to rank the status based on the Timestamp field. We can do this easily in Tableau Prep by creating an analytical calculation to create the rank: Rank We group by candidate_id and position_id then rank the Timestamp in a descending order. This allows us to have the status that is ranked #1 (candidate withdrew), but we also want rank #2 which will be the status before the candidate withdrew - this is the key piece of information that we are looking to analyse. The get the previous status we first need to isolate the Candidate Withdrew status rows. To do this we want to filter to keep only 1 in the Rank field, then k...

2022: Week 50 - Withdrawals (HR month)

December 14, 2022

Created by: Kelly Gilbert This is the second challenge in our special HR month. Thanks too long time participant Kelly Gilbert for sharing her real world challenges she faces in her role with data prep. ______________________________ Occasionally, candidates may choose to withdraw from the hiring process. The Talent Acquisition team would like to understand where in the hiring process candidates are choosing to drop out. Input One csv file : Requirements Input the data Rank the statuses based on the timestamp (ts) field, where the most recent status is 1 If the most recent status (rank #1) is “Candidate Withdrew”, find the previous status (rank #2) Count the number of withdrawals, grouped by the previous status Count the total number of candidate/positions that have ever been in each status Join the total counts by status to the withdrawals by status Make sure that statuses with no withdrawals are included in the output Calculate the % of candidates who withdrew after each s...

2022: Week 49 - Solution

December 13, 2022

Solution by Tom Prowse and you can download the workflow here . We are starting our very first HR month this week with the challenges coming from long term participant Kelly Gilbert . We are going to be working through the real world challenges over the next few weeks and hopefully build up some useful examples from the HR space. The first challenge this week is all about the hiring process and identifying the latest entry in a process. Step 1 - Most Recent File After inputting our Status History file, we want to extract the date from each of the filename fields. Unfortunately this doesn't have a common naming convention, however the date is always the 19 characters before the .csv part at the end. There are a couple of ways that you could extract this... You could use a Right() function to return all 19 characters and the .csv, and then remove the .csv part. Alternatively, you could use a regex calculation that looks like this: filedate REGEXP_EXTRACT([filename],'(\d{4}-\d...

2022: Week 49 - Cleaning Status History (HR month)

December 07, 2022

Challenge by: Kelly Gilbert Welcome to HR month (well five weeks actually but one of those will come in 2023!). Long term participant Kelly Gilbert has supplied the Dr Preppers some challenges to finish the year. Firstly, thank you to Kelly for these challenges and we're all looking forward to learning more about some of the challenges you face in your day-to-day role. Secondly, we love receiving real world challenges so we hope you enjoy them as much as we do. The challenge The applicant tracking system records the statuses and timestamps as a candidate moves through in the hiring process. Each morning, any candidates/positions that changed during the previous day are loaded to the raw table. However, each time a candidate’s status changes, we receive all of their previous statuses again, in addition to the new one. We call this dataset “The 12 Days of Christmas,” because, like the song, each time a status is added, we get all of the history again. As a result, our dataset has a ...