2022: Week 51 - Solution


Solution by Tom Prowse and you can download the workflow here


We continue with our HR challenges for this week with a focus on parse data that is all within a long string value. This can be a useful technique to transfer across to other problems and not just one that someone working within HR will encounter.

Step 1 - Parse Fields

First we want to remove the rows with no data in them (Supervised & Industry Experience). We can do this straight from the tick boxes in the input step.

Next we want to parse the single string into 5 separate fields. We can use a split for this as most of the are split by a ',', therefore we can use this as a separator to get some of the fields.


This results in 4 separated fields and we can remove the original Work Experience field, then filter to exclude any null values from the Split 2 field. This filter removes any fields that just contain the year of application.

To get the 5th field, we can again split off the number in the parentheses by using a split on the Split 4 field. This time using a '(' as the split separator will allow us to have both the industry experience and candidate count numbers in separate fields. We can then tidy up the candidate count field by removing any punctuation and making it a whole number. 

At this stage our table should now look like this: 


Step 2 - Clean Parsed Fields

We now want to remove the field title from each of the rows of data. We could do this on a field by field basis, but we can also pivot the data to make this a bit easier and future proof the solution.

First, we need to create a row ID so we can identify each candidate at a later stage. We can do this by using an analytical calculation to create a row number: 

Row 
{ ORDERBY [Work Experience - Split 1]: ROW_NUMBER()}

Now we have a row number we can pivot the data so that each of the fields is in a single column with a count of the candidates in another. We need a columns to rows pivot, with a wildcard on '-' to bring through all of the split fields: 


We can then use another split to break apart the field header from the value by using a split on the ':'



As a result we should now have a field containing the headers and another with the values:



The last step is to pivot this back the other way so that each of the rows becomes a field again. This time using a rows to column pivot: 



Finally we can clean the Application Date field so that it is a date field type. To do this we can first change it to a Date field type, then we want to identify the month end of the given month. To find this we want to use the following calculation to identify the previous day then add a month: 

Application Month
DATEADD(
'day',-1,
DATEADD('month',1,[Application Date]))

Our table should now look like this: 


Step 3 - Preferred Qualifications

The next step is to calculate whether the candidates have met the preferred qualifications set out in the requirements.

Work Experience
For this we want to duplicate the Work Experience field and rename to Work Experience > 4. We can then regroup the values with a 1 or 0. If the years of experience is greater that 4, then we rename to 1, all of the others become 0. 


Supervised
Again we want to start by duplicating the Supervised field and rename it to Supervised > 10. We want to use a similar technique but this time everything over 10 is grouped as 1, then anything else is 0.



Industry Experience
Finally we want to duplicate the Industry Experience field and rename to Industry Experience = Yes. We can then regroup the Yes = 1, and No = 0.


We can then use these three fields to flag the rows that meet all of the preferred requirements: 

Candidates with Preferred Qualifications
MIN(MIN([Supervised > 10],[Industry Experience = Yes]),[Work Experience > 4 Years])

This calculation will provide the minimum numbers from each of the 3 fields. If any of these are 0 then this means they don't meet one of the requirements. 

Step 4 - % of Totals

The last task this week is to create the % of total calculation based on the number who meet the preferred qualifications. 

First we want to calculate the monthly totals by using an aggregation to group by month and sum the candidate count and Candidates with Preferred Qualifications: 


Then we can rename the candidate count to Total Candidates and calculate the % of total: 

% of Candidates 
ROUND(
[Candidates with Preferred Qualifications]*100
/
[Total Candidates]
,1)

We are then ready to output the table that looks like this: 


After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14@TomProwse1 & @kelly_gilbert

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text