2019: Week 46 Solution




This week we got all festive for a Christmas edition of Preppin' Data! In this task we had to help Santa out with understanding how much time he has allocated to his Elves to build presents that have been wasted due to people being on the naughty list!

The first task is to bring the Naughty and Nice lists together, and make sure that we could identify who was on each list. To do this we bring both tables into the view, and then use the union tool to 'stack' both tables on top of one another.

As a result of this, we have a FilePath field that contains the information about what list the data had come from. Therefore, we can parse out this information by using a custom split on the last word after the '/'.


The next step is to match the address on the Naughty/Nice list with the address on the Present list. The problem we have here is that on the Present's list there is only part of the address, but not the same part on each row. Sometimes it is the Road name, other times it is the City. As a result of this, we can't do a traditional parse which is the same for each row.

To overcome this we use the Split functionality once again. If we split the address field every time there is a ',', we then get every part of the address in a different column.



We can then pivot these newly split fields, so that we have a single row for each part of the address. The pivot is setup like below:



Result after the pivot:



As you can see there are now multiple lines for each person, but we now have a column (Santa Address) that we can join to the Presents list. Within the join we want to be joining on both address and name, therefore we need to do some preparation to the Present list before we can do this.

Within the Present list table, we can use the LEFT() function to extract the first three letters of each name. The calculation would look like this:

Short Name

LEFT([Name],3)

This field is now suitable to join with the Naughty/Nice list table, therefore the join can be setup in the following way:



Now we have joined all of the tables together, all that is left is create the two output files, a summary and the detail.


Summary Output

To create the summary output, we need to calculate the total amount of time spent on the Naughty and Nice lists. Therefore we can use an aggregation tool to find the total time for each list option. The aggregation is as below, and we can then exclude any nulls from the list field:


After the aggregation the final step is to transform the mins into hours, we can do this by dividing the figure by 60, then rounding to 2 decimal places. This is the calculation used:

Elves Build Time (Hours)

ROUND([Elves Build Time (min)]/60,0)

We can then remove any unwanted fields and prepare the data for the Output.

Summary Output:



Detail Output

The detailed output is a little easier to complete, all we need to do is create a branch from the step before the aggregation. On this new clean step, we can remove any nulls, and any unwanted fields then we should be ready to go with the second required output!

Detailed Output:



Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank