2019: Week 45 Solution



In this week’s Preppin Data challenge, Chin & Beard Suds Co is trying to piece together data from lots of small files so that they can begin to analyse their data. We want to analyse the sales volumes and values of our two South-West London stores in Clapham & Wimbledon.

The first step is to bring together all of the data files into one table. Luckily for us, the data files have the same naming conventions and structure, meaning that we can use the wildcard union input tool to bring in all of the tables at the same time. However, there is a dates table as well but we don’t want to bring this in quite yet, therefore we can use the ‘exclude’ functionality to bring in all the other tables apart from this one. This is the setup we have used:



Now we have all of the data tables in the flow, the next step is to determine the store and day. As mentioned earlier, the files have the same naming convention, which contains the day and store. Therefore we can parse this information out of the FilePath field which had been created as a result of the union.

We parse the day and store, by using the split functionality. There are multiple ways that you could approach this, but we have broken this down into two steps. Firstly, we have used the custom split, to split all values on a blank space (‘ ‘). This separates the Stores into a separate column, and leaves the days in another. The next step is to use another custom split on the newly created day field (Split 2), where we split off just the 1st field before a ‘.’. This will remove the ‘.csv’ and leave us with just the day, then we can rename and remove any unwanted fields.

Split 1 -


Split 2 -


The next step we need to take is to remove any duplicate values from the data sets. To do this, we have used an aggregation tool to group on Store & Day, then averaged the sales value and volume. We have averaged the values as this won’t inflate any of the values if there are duplicate lines. For example, if we have the following lines:

A = 2
B = 4
B = 4
C = 3

If we take the average of each letter, we get the following:

A - 2/1=2
B - (4+4)/2=4
C - 3/1=3

Therefore removing any duplicated values as a result.

Now we have removed the duplicates, we can calculate the percentages for Scent and Weekday. We have broken this down into two branches so that it is clear, however, some steps may be able to be combined for a more efficient workflow.


Scent %
To calculate the Scent sales volume and value per Store, we first need to calculate the total sales val/vol for each scent in each store. Using an aggregation tool, we can group by Store & Scent, and then sum the Sales Value and Sales Volume.


Next we need to calculate the total sales for each store. We can again use the aggregation tool to do this, but this time we only group on Store, whilst summing the Sales Value and Volume again.


We now have the various total figures that we need to calculate the Sales Value and Volume percentages. First we need to join the aggregated values back onto the original flow, by joining on the Store field. After the join we have renamed some of  the fields so that we don't get confused and use the wrong ones!

The Sales Volume and Sales Value percentages can now be calculated by the following:

Sales Value %
ROUND(
[Sales Value Scent]/[Total Sales Value]
,2)

Sales Volume %
ROUND(
[Sales Volume Scent]/[Total Sales Volume]
,2)

Both of these calculations are rounded to 2 decimal places to match the final output.


Day %
Our next task is to follow a similar process as the scent percentages, but this time for days of the week. The first few steps are similar to the scent percentages, as we will be using an aggregation to find the total for each store each day. The aggregation tool is setup like this:


From here we again need to find the total sales for each store. This is the same step as before, therefore these could be combined, but we have kept them separate for clarity.



Then finally, we need to join this back onto the original workflow, again just like the previous scent percentages branch.


We are now ready to calculate the Volume and Value percentages for each day. These are the calculations needed:

Sales Value % 
ROUND(
[Sales Value Day]/[Total Sales Value]
,2)

Sales Volume %
ROUND(
[Sales Volume Day]/[Total Sales Volume]
,2)

Again we are rounding to two decimal places as this is what is required in the output.

The final task is to replace the days of the week with the actual date from the Dates table. The Dates table needs a bit of prep before we can join, and we need to parse out the day so that we have a common join condition between both tables.

To do this, we again use the custom Split functionality to split off the first word before a space (' '). This allows us to have a field for just the day, which we can then use to join to our Day % workflow.



After the join the final step is to remove any duplicated and unwanted fields so that the data is ready for the Output.

Output 1 - Scent %


Output 2 - Day %


Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text