2019: Week 8 Solution

This week on Preppin’ Data we once again had you all serve as labour for Chin & Beard Suds Co. It was quite a tough one! In order to overcome this challenge, our method uses a number of techniques from previous challenges. We mentioned at the bottom of the requirements that we used the Rows-to-Column pivot and another key hint provided was: although you had five questions to answer they can all be answered using the output data-set provided. What is this hinting at? Using the Profile Pane to answer questions!

Before we dig into a few of the main challenges in the workflow itself, let’s take a look at the answers to the 5 questions themselves.

1) Whilst we can see both bar and liquid soap has been stolen in similar frequencies, the liquid soap has been stolen in much greater quantities.


2) & 3) We can see that 19 items of stock haven’t been updated yet: 5 in 'Wimbledon 2' and 14 in 'Oxford Street'.


4) The only store to update its stock in 1 day or less is 'Wimbledon 1', making it the fastest to update inventory stock post-theft.


5) From 2) & 3), we know the '5' & '14' [Stock Variance] values correspond to stock values that haven’t been updated yet. Clicking on the remaining '1' & '2' values reveals that 'Oxford Street' and 'Wimbledon 1' have both updated their stock incorrectly.


You can view the full workflow below and download it here.

The full solution.

The meat of this solution takes place in these four steps, which I’ll break down below.

The Meat.

Joining Theft Audit with Branch ID

After splitting the [Branch ID] field in [Branch ID] to get [Store ID] and [Branch Name], we can now join it with the Theft Audit sheet on [Store ID] = [Store ID].

The Join


Duplicating Quantity

In the Theft Audit sheet, the quantity represents both the amount of stock stolen AND the amount of stock adjusted. We know by the end that we want these values in two different fields. This step prepares for that by simply duplicating [Quantity]; the original gets renamed [Stock Variance] and the copy gets renamed [Stolen Volume]. The following Row-to-Column pivot will take care of separating the actual values in both fields, so they only contain values that actually relate to the field name.

The Duplicate.


Pivoting from Rows-to-Columns

As mentioned before, right now the theft quantity and the stock adjusted quantity are all muddled together in the [Stock Variance] and [Stolen Volume] fields. However, the [Action] field defines whether each value is a theft quantity, or a stock adjusted quantity. If we row-to-columns pivot the [Action] field we can aggregate on MIN of [Date]. The MIN is arbitrary since there’s only one date per row, but what it means is that the table now contains a [Stock Adjusted] field and a [Theft] field, each which contains the date that the action occurred and null otherwise. We can use these dates in the next to step help obtain a single row for each crime of each soap type at each store.

The Pivot.


Aggregating

Again, as mentioned above, we need a single row for each [Branch Name][Crime Ref Number][Type] combination. A nice aggregation step will take of this. We know that there is a single date field now for each row, so we can take MAX of [Theft] and MAX of [Stock Adjustment] – this gets the actual date and not the NULL value.

We also know all the actual theft quantities are positive, so we can take MAX of [Stolen Volume] to get a single value for each row.

Finally, we need to get the actual variance between the theft quantities and the stock adjustment quantities. This means we can take the SUM of [Stock Variance] which produce a value for each row that indicates any disparity between the stock levels before the theft and after the theft!

The Aggregation.

From this point we can continue our flow to clean up any spelling issues and calculate the days between [Theft] and [Stock Adjustment].

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text