2019: Week 14 Solution

Each week I normally like to outline some key points and main learning objectives, however today I’m offering a full walk-through of the solution, so everyone can understand the logic involved at each step.

You can see our solution below and download it here.



Before I go into the full walk-through, I’d first like to provide some reassurance that this is a difficult challenge and our trim and streamlined solution was the result of iteration and feedback! If anyone got a flow that looks anything like ours on their first try then we are seriously impressed. To add context, here are some of the original solution flows we came up with reaching the final flow seen above.

--------------------
--------------------
.

When trying to solve a data prep obtaining any solution is a great start as you can begin inspecting, refining, and improving it. As with many problems, iteration is a key to success! Now onto the walk-through.

Cleaning the Data and Adding Flags

First we need to clean the data. This involves replacing null [MemberID]s with zeros and replacing null prices with 1.5. We can replace the null [MemberID]s without creating a new field by changing the type to string, then using Group & Replace to replace 'null' with ‘0’, then changing back to a integer number.

We also need to add a [Flag] field of some kind. This should be an integer field and just contain ‘1’ for every row. This flag field is going to be used for various purposes. Later we’ll pivot and aggregate these to flag what item type is in each row as well as use them to work out things such as number of each item type per ticket and the average price for these item types in each ticket.

Pivoting the Data to Count Items

Next, we need to Row-to-Column pivot the [Type] field and count the [Flag] field. This will give us 3 new fields:

[Main]
[Drink]
[Snack]

Each of these will contain a number that indicates how many of that specific item, indicated by the [Desc] field, are in the ticket. The reason we use COUNT instead of SUM here is that COUNT will return a ‘0’ when the new field doesn’t match the item type in that row, whereas SUM will return NULLs.


Calculating Price per Row

Now we need prepare three new total price field for each item type as well as prepare a total ticket price field. We can aggregate these later to get the total prices for each item type per ticket as well as the overall ticket price. To do this we can three new fields where we multiply each item field by the [Price] field. For example, for drinks the new field will be called [Total Drink Price] and contain “[Drink] * [Price]”. Using these we can prepare the [Total Ticket Price] field by adding [Total Drink Price] + [Total Main Price] + [Total Snack Price].



Aggregating to Count Items and Prices per Ticket

After we’ve prepared these fields, we can use an aggregate step to sum up all our numerical fields. This mean for each [TicketID] & [MemberID] we’ll have 7 aggregated fields:

SUM of [Total Drink Price] : The total price of all drinks in the ticket.
SUM of [Total Snack Price] : The total price of all snacks in the ticket.
SUM of [Total Main Price] : The total price of all mains in the ticket.
SUM of [Total Ticket Price] : The total price of the ticket.
SUM of [Drink] : The number of drinks in the ticket.
SUM of [Snack] : The number of snacks in the ticket.
SUM of [Main] : The number of mains in the ticket.

Counting Meal Deals and Meal Deal Earnings per Ticket

With this information to hand we can calculate how many meal deals would have been in each ticket. This number will be the minimum number of any item type in each ticket: if we have 2 drinks, 4 mains, and 3 snacks then there are 2 possible meal deals as we only have 2 drinks and a meal deal needs one of every item type.

To calculate this we can create a new field called [Number of Meal Deals] which contains:

MIN( [Main, MIN( [Snack], [Drink] ) )

The MIN() function returns the minimum of two values, so we have to put a MIN() inside another MIN() in order to compare 3 or more values.

We can also calculate the [Total Meal Deal Earnings] in each ticket by multiplying the [Number of Meal Deals] by 5 (as the cost of a meal is £5).

Calculating the Average Price of each Item Type in each Ticket

In order to estimate how much enacting meal deals will cost the business we need to know roughly how much each item type in each ticket costs, so we can figure out the difference between actual cost and cost as part of a meal deal. We can figure out the average cost for each item type in each ticket by creating three new fields, one for each item type, which divides the total price for the item type by the number of that item type in the ticket. This function looks like this:

ZN( [Total Drink Price] / [Drink] )

This is wrapped in a ZN() function which replaces any NULLS with ‘0’ (as [Drink] could be zero; an impossible calculation).

Calculating the Excess Price per Ticket

Armed with the number of each item type, [Number of Meal Deals], and the average price for each item type in each ticket we can estimate the difference between price with meal deal and price without meal deal, i.e. the excess price. This will be the cost to our business for implementing meal deals. We’ll calculate this for each item type first and then sum these to get the [Total Excess] per ticket.

To calculate the excess for each item we can figure out how many of that item aren’t in a meal deal by subtracting the [Number of Meal Deals] from the item count for that item type. We can then estimate the cost of these items by multiplying the result by the average price for that item type in that ticket.

For example, our [Excess Snack Value] looks like this:

( [Snack] – [Number of Meal Deals]  ) * [Avg Snack Price]

After creating these fields for each item type we can simply create a new field called [Total Excess] which adds them together.

Calculating the Variance in Price

Finally, we need to calculate the difference between what the original ticket price was and what the ticket price would have been if they’d paid meal deal prices instead. We can figure out what they would have paid by adding the [Total Meal Deal Earnings] and the [Total Excess] together. By subtracting this from the [Total Ticket Price] we are left with the [Ticket Price Variance to Meal Deal Earnings].

Outputting the Results

All that’s left at his point is to output our results! We have one output that simply outputs the data as is currently is, with one row per ticket, and a second result with an aggregated view. To create the aggregate view, simply add an Aggregate step and sum the [Ticket Price Variance to Meal Deal Earnings] field, and the [Total Ticket Price] fields. We don’t actually need to group on anything.
And we’re done!

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text