2020: Week 22 - Solution




You can find our full solution workflow below and download it from the Preppin' Data Community page!


This week we were looking at the process of creating realistic dummy data that was used for last week's challenge. 

Step 1 - Calculate Market Share

The first step is to input the Companies data set, this provides us a list of each of our companies. From here we want to add an ID to each of these companies. To do this we can use the Ranking functionality to rank our companies in a descending order: 

ID

We now have an ID for each of our companies that looks like this: 


From here we can then input our 'Random' data set that contains an ID field and a Random number. We then use the ID field to join this onto our Companies table:


After the join we are left with 3 fields; Company, ID and Random.



As we need all of our random figures to be greater than 0, we can use the ABS() function to do this: 

Random
ABS([Random])

We now have all our randomised numbers as positives.

The next stage is to find the total sum of our random numbers. To do this we can use a Fixed LOD to Sum all of the Random numbers: 

Sum Random


Now we have a random number for each company and the total for all companies we can use this to calculate our Market Market Share: 

March Market Share
[Random]/[Sum Random]

After calculating our market share for March, we can now calculate the market share for April. To do this we are going to use a Case statement like this: 

April Market Share
CASE [ID]
WHEN 1  THEN[March Market Share]+0.001
WHEN 2 THEN[March Market Share]-0.002
WHEN 3 THEN[March Market Share]-0.001
WHEN 4 THEN[March Market Share]+0.002
WHEN 5 THEN[March Market Share]
END

Within this calculation we are saying that depending on which ID the row has, we want to multiply the March Market Square by given number. This helps to stop the massive fluctuations that can occur with random numbers and gives us some realistic values. 

We now have our market share numbers for each ID/Company: 


Step 2 - Calculate Sales & Growth

The next stage is to input the Total Market data source which contains 1 row for March Sales and Growth. 

First we can calculate April's sales by using the March Sales and Growth fields: 

April Sales
[March Sales]*(1+[Growth])

We also need a Dummy field so that we can join this table onto our existing workflow. To do this we just create a calculation with '1'.

Dummy
1

This allows us to join both tables together by using the join condition ID >= Dummy: 


Now everything is in a singular table we can use the following calculations to calculate sales and growth for each company. 

April Sales
[April Sales]*[April Market Share]

March Sales
[March Sales]*[March Market Share]

Growth
([April Sales]-[March Sales])
/
[March Sales]

After these calculations we now have the following fields: 


Step 3 - Join Scent

The next step is to introduce the Scent data set into our workflow. Again, in order for us to join this onto our original workflow we need to create a Dummy field: 

Dummy 
1
Once we have the dummy field we can join to our workflow using the following conditions: 


Notice how the data has grown massively as there is a join between all of the different 'Dummy' fields that are the same. 

Next we want to create a specific ID for each Company and Soap Scent combination. First, we want to concatenate both the Company and the Soap Scent, we can use a similar string calculation to bring these together: 

Concat
[Company]+[Soap Scent]
This brings each of the companies and soap scents together and looks like this: 


Now we have each combination as it's own individual row with a unique identifier we can use the Rank functionality to create an ID just like we did previously with the Company field. 


Finally we want to add 5 to each of our ID's so that we make sure that we don't get the same random numbers as for the Company IDs and we can use a join to join on the Random numbers: 


After the join our data should now look something like this (plus a few extra columns!):

Step 4 - Scent Calculations

Now we have all the data in one table, we can use this to calculate the numbers that we need for the rest of the challenge. 

First we want to calculate the growth for each scent, so similar to before we want to find the total sum of our Random field across each company. The calculation looks like this: 

Sum Random


We can then use this to calculate the Scent Growth: 

Scent Growth
(([Random]/[Sum Random])*[Growth])
Our table should now look something like this: 


We have calculated the growth by scent, next we calculate the Difference in Sales. To do this we just multiply the March Sales by the Scent Growth: 

Difference in Sales
[March Sales]*[Scent Growth]
Finally we want to 35 (number of rows) to our ID field so that we don't get any duplicates when using the random number. 

ID
[ID]+35
Using our Random data set again, we can join this to our original workflow again so we have a new set of random numbers:

The final calculations that we need to make are similar to ones that we have previously created. They are as follows: 
  • Make random numbers all positive 
Random
ABS([Random])
  • Calculate total Random per Company
Sum Random 

  • Calculate March & April Scent Sales
March Scent Sales
([Random]/[Sum Random])*[March Sales]
April Scent Sales
[Difference in sales]+[March Scent Sales]

After all of these calculations we only need 4 fields for the next step. Therefore we are going to 'Keep Only' the following: 
  • Company
  • Soap Scent
  • April Scent Sales
  • March Scent Sales
Then our table should look something like this:


Currently we have a separate column for both our months sales. We don't require that setup for the output, therefore we want to use a pivot tool so that we have a row for each months sales. 

The pivot will be a Columns to Rows and looks like this:


Now we have a row for each month's sales, we can use the Spilt function to remove the 'Scent Sales' text from each row; leaving us with just the month (March/April). The split will look like this: 


We want to split off the first word before the ' ' (space), and then rename this column as Month. 

The final step is to remove any additional columns so that we are left with the following as our Output: 


Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text