2021: Week 9 - Solution
Solution by Tom Prowse and you can download our workflow here.
Step 1 - Reshape Data
The first part of the challenge is to input the customer Info table and then split apart each of the different values. Each of these values are a series of different characters, however they have a repeating pattern and can be split apart by using a ' ' (space).
Our custom split looks like this:
As a result we now have 5 separate column all containing an individual string that we want to parse later in the challenge.
The next step is to reshape the data so that all of these different strings are in a single column and have an individual row. To do this we need to use a columns to rows pivot, and we can use a wildcard pivot on 'IDs' (or similar). The wildcard will bring through all of the rows where this pattern is matched therefore if we reload the data in the future and there are more than 5 columns, then it will automatically pick this up.
Step 2 - Parse String
Now we have all of the strings into a single column we can start to extract the key information as per the challenge instructions. In this solution we have used regex however there are alternative ways using splits and other techniques. If you aren't comfortable with the regex syntax then I would recommend using a website such as regex101.com to help you along the way.
First we want to remove any blank or null values from the IDs Split, then we can use the following calculations to parse the string.
Phone Number
REGEXP_EXTRACT([IDs Split], '(\d{6})\,')
This returns the 6 digits (\d) before the comma (\,) from the IDs Split field.
Area Code & 1st Letter
REGEXP_EXTRACT([IDs Split], '\,(\d\d[A-Z])')
This gets us the 2 digits (\d \d) after the comma (\,) and then any letter after those two digits ([A-Z]). The digits and letters are in brackets as this is what we want to be included in the output and not the comma.
Quantity
REGEXP_EXTRACT([IDs Split], '\,\d\d[A-Z](\d+)')
This time we have the same pattern as before, 2 digits and letter are the comma, however this time we don't want to include it (outside of brackets) and instead we want to include any one or more (+) of the numbers (\d).
Product Code
The final part is the product code, which is the last letters after the '-'. These are at the end of the string therefore we've used a split to return any characters from the right of the last '-':
After all of the parsing our data looks like this:Step 3 - Join Area Code Locations
Now we have parsed the key information from the Customer Info table, we can input the Area Code Lookup table and combine them.
First, we want to create a join calculation for the Area Code Lookup,
Join Calc
RIGHT(STR([Code]),2) + LEFT([Area],1)
This combine the two characters from the right side of the code with the first character from the area and this matches the area code in our customer info table.
We can then join this to the customer info table using these join conditions:
Then after the join we can clean the table by removing any duplicated fields (Join Calc & Area Code + 1st Letter) then filtering the Area field by removing 'Clevedon', 'Fakenham', 'Stornoway'.
Our table should now look like this:
Step 4 - Join Product Lookup
We can now join our workflow with the Product Lookup table using the following condition:
- Price - Remove the £ sign - REPLACE([Price], '£', '')
- Change the Price data type to a decimal number
- Change the Quantity to a whole number
- Calculate Revenue - [Price * Quantity]
Step 5 - Rank Areas
The final task for this week is to rank the different areas based on their revenues. This is using the visual calculation editor where we group by Area and rank revenue in a descending order.
Rank
Next we want to calculate the % of total per product and area. So the first step is to calculate the total revenue for each area using an analytical calculation (Fixed LOD):
Total Revenue per Product, Area
We can now calculate the % of total for each area:
% of Total - Product
ROUND(([Revenue] / [Total Revenue per Product, Area]) * 100, 2)
Then finally we can round the Revenue to a whole number:
Revenue
ROUND([Revenue],0)
And that's the challenge complete! The output should now look like this:
The full output can be downloaded here.