2019: Week 41 Solution
Preppin’ Data Week 41 Solution
This week’s challenge was brought to you by Jack Parry, and after looking through various Twitter posts there were plenty of different options to get to the correct answer!
For this challenge we needed to have two outputs. Firstly, we needed the refund totals, including how much stock was lost and how many batches/items were recalled. Secondly, we needed a table informing us about the remaining stock that we had as a result of the sales and refunds.
Output 1
The first step is to parse out the Order Number, Customer Number and Item Number from the complaint string. To do this we first input the Complaints sheet, and use a REGEXP_EXTRACT calculation from each of the numbers as follows:
Order Number
REGEXP_EXTRACT([Complaint],'(\d{5})')
The regex expression here is saying… Extract any number (\d) that is exactly 5 digits in length ({5}).
Customer ID
REGEXP_EXTRACT([Complaint],'(\b\d{4}\b)')
This time we will extract from the Complaint field, any number (\d) that is exactly 4 digits in length ({4}), but also must be a separate word (\b). The separate word part of this expression is important as we will pick up other unwanted results without it.
Item Number
REGEXP_EXTRACT([Complaint],'(\b\d{1}\b)')
Finally, to extract the item number, we want any number value (\d) which is one digit in length ({1}) and again, we want this to be it’s own word (\b).
Step one complete!
Next up we are going to bring in the other two sheets, Batches and Orders, and join them together using the Batch Number field. After we have removed any duplicate fields, we can then use this newly formed table to join to the Complaints data which we were previously working on.
For this second join we are going to use a left-inner join, to bring through all of the orders data, and join with the complaints table. We will be joining on Order Number, Item Number and Customer Number.
*Note this may be a Right join for you depending on which table you are joining first. We just need to make sure that all of the fields from the Orders/Batches table are retained.
Once we have successfully joined the tables together, we want to remove any duplicate fields then calculate the number of complaints and the refunded amount. Firstly, we need to mark whether the order had a complaint against it. To do this we use the following calculated field:
No. of Complaints
IF NOT ISNULL([Complaint]) THEN 1
ELSE 0
END
This will return a 1 every there is data within the complaint field, this will help us in a later step when we come to aggregate.
Refund Amount
IF NOT ISNULL([Complaint])
THEN [Price]
ELSE 0
END
This is similar to the previous calculation, but instead of returning a 1, we will return the price of the item. This will again help us to calculate the total amount of money that we have had to refund.
Now that we have created the last two calculated fields, we can use the aggregate tool to sum up the totals. We want to group by Size of Order, Batch Number, Product, Price, & Scent. Then we need to sum Refund Amount, No. Complaints, and also Number of Rows. It should look like this:
We will then rename the Number of Rows field to No Items Sold.
From the totals for Complaints, Refunds, & Orders we can now complete some more calculated fields to calculate the amount we have lost from refunds and refunded batches. These are the calculations needed:
Batch Recall?
([No. Complaints]/[No. Items Sold])*100>20
This calculates if over 20% of the items sold resulted in a complaint.
Total Amount Lost
IF [Batch Recall?] = FALSE
THEN
[Refund Amount]
ELSE
[Size of Order]*[Price]
END
This calculates whether what refund amount is a result of the complaint. If the batch doesn’t need to be refunded, then just return the individual refund amount, else return the amount we ordered multiplied by the price we purchased for.
Whole Batch Refund
IF [Batch Recall?] = TRUE
THEN
[Size of Order]*[Price]
else
0
END
Next if we need to recall the whole batch, then we need to calculate how many we ordered multiplied by the price, else we can just return a 0.
Item Only Refund
IF [Batch Recall?] = FALSE
THEN
[Refund Amount]
ELSE 0
END
Finally we need to calculate the amount of the refunded amount for all of the items that we need to refund, but only those that aren’t included in the whole batch refunds.
After calculating the refunded amounts for each item, then we can again aggregate to get the totals. This time we want to just aggregate Whole Batch Refund, Item Only Refund, & Total Amount Lost fields.
The next stage is to rename the fields so that they are the same as the desired output, and to use the ROUND() function to round the Item Only Refund, and Total Amount Lost fields to 2 decimal places.
The final step of the first output is to use the pivot tool to so that we have each value on a separate line. The pivot should look like this:
Output 2
Now we have calculated the refund totals, we now need to calculate how much stock we have remaining as a result. The first step is to calculate the total stock that we initially ordered, and we can calculate this by using the aggregation tool to group by Product and Scent, then sum Size of Order. We then renamed the Size of Order field to Total Stock.
Next we want to calculate the total stock that we have ‘used’. That could be what we have sold, what has been refunded, and also any batches that have been fully refunded. Therefore we can use a branch from the step where we calculated the refunded amount, then use this to calculate Total Sales & Refunds.
Total Sales & Refunds
IF [Batch Recall?] THEN [Size of Order]
ELSE [No. Items Sold]
END
This calculation returns the number of items we have ‘used’ as a result of being sold, or returning the whole batch.
Similarly to the Total Stock, we can use the aggregation tool to calculate the total used stock. To do this we group by Product and Scent, then sum Total Sales & Refunds
Now we have calculated the totals from the starting stock and used stock, we can join these together using an inner join on Product and Scent.
After these are joined, there is one more step to calculate the remaining amount of stock, by using the following:
Remaining Stock
[Total Stock] - [Total Sales & Refunds]
All that is left is to remove any unwanted fields and we are left with the output needed for the remaining stock levels for each different product.