2019: Week 16 Solution
You can find our full solution workflow below and download it here!
The DATEADD()function can help us here, as the following function will retain only values within our desired 6 month period:
You can manually type in dates in this fashion with the format #YYYY-MM-DD#. You can update this calculation to work for live data using the NOW() function. The NOW() function returns the current date and time, so you could instead use:
1. Create an Aggregate step which groups by [Email] & SUMS [Order Total].
This gives us the total spent by each user within the last 6 months.
2. Create a clean step to duplicate the data.
If we add a clean step but don’t actually make any changes this duplicates the data which allows us to:
3. Join the data to itself with the condition [Order Total] <= [Order Total].
This will generate a whole bunch of rows. The greater the [Order Total], the fewer the rows. In fact, the [Email] with the greatest [Order Total] only has one row.
4. Create another Aggregate step which groups by [Email] & [Order Total] and SUMS the ‘Number of Rows’.
Every aggregate step offers the option of using the ‘Number of rows’ instead of one of the existing fields. By summing the number of rows for each [Email] we’re actually adding ranks! As mentioned before, the highest [Order Total] now has a rank of 1 as there was only one row, whilst the lowest [Order Total] has a rank of 909.
As you may have noticed, this produces a modified competition rank – an equal [Order Total] means an equal [Last 6 Months Rank]. There are ways of fixing this to make the ranking unique, however that’s a challenge for another day.
1. Add an Aggregate step which finds the SUM of 'Number of Rows'.
This gives us the maximum rank. You can confirm this by also add MAX of [Last 6 Months Rank] to the Aggregate.
2. Multiply the [Number of Rows (Aggregated)] field by 0.08 to create a new field called [Top 8% Filter].
This lets us know where the 8% cut-off is. In our case this should be 72.72, so we want any rows with a [Last 6 Month Rank] of 72 or less.
3. Use an INNER join between the step containing the [Top 8% Filter] and the aggregate step where we calculated the [Last 6 Months Rank]. Join on [Top 8% Filter] >= [Last 6 Months Rank].
Setting the join up like this means only rows with a [Last 6 Month Rank] of 72 or less get kept, thus filtering out the rest of the data. Originally I intended this to be another chance to practice using a Filter Calculation, but this inner join prevents the need to write one.
The full solution workflow. |
Learning Objectives:
- Practice wildcard unions to manage & speed up inputs.
- Practice using filter calculations for more precise and versatile filtering.
- How to workaround the lack of table-calcs in Prep to:
- Add ranks to your data.
- Create a top N% filter.
Unioning all the relevant data
In last week’s solution post we mentioned how using the Wildcard Union could be used to not only quickly import and combine your data but also ensure you only import the data you actually want to import.The Wildcard Union. |
- Add one of sales files to the canvas.
- Switch to the Multiple Files tab.
- Select Wildcard Union.
- Define a “Matching Pattern” of “Sales_*”. This means only files that start with “Sales_” will be including in the Wildcard Union.
- Click Apply.
Obtaining data from just the last 6 months
Here we’re also reinforcing lessons from last week by introducing a more complex filter calculation. In our data set, 24/05/2019 is what we’re considering as “today”, so we need only sales data for order that occurred in the 6 months leading up to this date. The quickest and most versatile way to do with is using a Filter Calculation using the “Filter Values” option.The DATEADD()function can help us here, as the following function will retain only values within our desired 6 month period:
[Order Date] >= DATEADD('month', -6, #2019-05-24#)
Using the 'Filter Values' option with a Filter Calculation. |
You can manually type in dates in this fashion with the format #YYYY-MM-DD#. You can update this calculation to work for live data using the NOW() function. The NOW() function returns the current date and time, so you could instead use:
[Order Date] >= DATEADD('month',-6, NOW())
Adding ranks to your data
There are four steps required to creating a simple rank field in our data.Aggregate -> Duplicate - > Self-Join -> Aggregate. |
1. Create an Aggregate step which groups by [Email] & SUMS [Order Total].
This gives us the total spent by each user within the last 6 months.
2. Create a clean step to duplicate the data.
If we add a clean step but don’t actually make any changes this duplicates the data which allows us to:
3. Join the data to itself with the condition [Order Total] <= [Order Total].
This will generate a whole bunch of rows. The greater the [Order Total], the fewer the rows. In fact, the [Email] with the greatest [Order Total] only has one row.
4. Create another Aggregate step which groups by [Email] & [Order Total] and SUMS the ‘Number of Rows’.
Every aggregate step offers the option of using the ‘Number of rows’ instead of one of the existing fields. By summing the number of rows for each [Email] we’re actually adding ranks! As mentioned before, the highest [Order Total] now has a rank of 1 as there was only one row, whilst the lowest [Order Total] has a rank of 909.
As you may have noticed, this produces a modified competition rank – an equal [Order Total] means an equal [Last 6 Months Rank]. There are ways of fixing this to make the ranking unique, however that’s a challenge for another day.
Creating a top 8% filter
We now have the [Last 6 Months Rank] for every [Email] by [Order Total]. All that remains is to filter this now to just the top 8% of [Email]s. This can be done in 3 steps.Aggregate - Calculate - Inner Join |
1. Add an Aggregate step which finds the SUM of 'Number of Rows'.
This gives us the maximum rank. You can confirm this by also add MAX of [Last 6 Months Rank] to the Aggregate.
2. Multiply the [Number of Rows (Aggregated)] field by 0.08 to create a new field called [Top 8% Filter].
This lets us know where the 8% cut-off is. In our case this should be 72.72, so we want any rows with a [Last 6 Month Rank] of 72 or less.
3. Use an INNER join between the step containing the [Top 8% Filter] and the aggregate step where we calculated the [Last 6 Months Rank]. Join on [Top 8% Filter] >= [Last 6 Months Rank].
Setting the join up like this means only rows with a [Last 6 Month Rank] of 72 or less get kept, thus filtering out the rest of the data. Originally I intended this to be another chance to practice using a Filter Calculation, but this inner join prevents the need to write one.