2022 Week 12 - Solution
Solution by Tom Prowse and you can download the workbook here.
This week take inspiration from the Gender Pay Gap Bot on Twitter, and take a look at historical data to get this into a nice, easy to understand structure.
Step 1 - Combine Files
The first step is to combine each of the different files from the different years. We have 5 different CSV files so we need to use the Wildcard Union within the Input step to stack these on top on one another.
After inputting all of the data files, we can then remove a lot of the fields that are included. We can remove fields straight from the input step by using the tick boxes on the right-hand side.
We want to keep the following fields (these will be ticked):
- EmployerID
- EmployerSize
- DiffMedianHourlyPercent
- File Paths
- EmployerName
Step 2 - Clean Report Year
Step 3 - Most Recent Report
And then remove the duplicated EmployerID and the Employer Name from the original workflow - this will leave us with the updated company names.
Step 4 - Pay Gap Explainer
As noted in the requirements there are a few different problems that might occur here. Therefore we follow these steps to create the string for each company:
1. Float Point Inaccuracies
String
LEFT(STR((CEILING(round(abs([DiffMedianHourlyPercent]),1)*100))/100),4)
ELSEIF ABS([DiffMedianHourlyPercent])<100 THEN
LEFT(STR((CEILING(round(abs([DiffMedianHourlyPercent]),1)*100))/100),5)
ELSE
LEFT(STR((CEILING(round(abs([DiffMedianHourlyPercent]),1)*100))/100),6)
END
This will ensure that the numbers are correct rounded to 2 decimals places.
2. Positive & Negatives
String
THEN LEFT([String],LEN([String])-1)
ELSE [String]
END
This will ensure that the positive and negatives are as we expect.
3. Correct Phrasing
"In this organisation, women's median hourly pay is "+[String]+"% lower than men's."
ELSEIF [DiffMedianHourlyPercent]<0 THEN
"In this organisation, women's median hourly pay is "+[String]+"% higher than men's."
ELSE "In this organisation, men's and women's median hourly pay is equal."
END
This creates the string for each of the three different options.
The final step of the challenge this week is to reorder the fields and then we are ready to output our data: