2019: Week 15 Solution

You can view our solution workflow below and download it here.


Full solution workflow.

There are 3 main learning objectives for this week's Preppin' Data challenge:
  • Using "Wildcard Unions" & file paths.
  • Using aggregations to calculate "percent of total" fields.
  • Creating Filter calculations.

Unioning the data and getting the regions

We’ve got five CSVs, one for each region, and all of them have the same data structure – sounds like a call for a Wildcard Union input! Instead of inputting each file separately we can:


Applying a Wildcard Union

  1. Add one of regional stock files to the canvas.
  2. Switch to the Multiple Files tab.
  3. Select Wildcard Union.
    • If they share a folder with other files, define a “Matching Pattern” of “* Stock Purchases.csv”.  This means only files that end in “Stock Purchases.csv” will be imported into the union.
  4. Click Apply.

From here, Tableau Prep automatically unions all the data together for you. What’s more, it adds a field called [File Paths] that contains the name of the file that each row of data was inputted from.
The new [File Paths] field highlighted.

We can use this to create a [Region] field to determine which region each sale can from and aggregate on later. The easiest way to pull the region name out of the file paths in this case is:



  1. Click [File Paths] in the profile pane and choose the Custom Split option.
  2. Enter a space in the “User the separator” text box and make sure the “Split off” options are set to “First 1 fields".
  3. Click Split.
  4. Rename the newly generated split field to [Region] and delete the [File Paths] field.
Splitting on the first space.

We now have a nice [Region] field!

Using aggregations to calculate % of totals

In order to figure out the percentage that each sale in each row contributes towards the total and regional total sales for each stock we first need to figure out what the total and regional total sales actually are. This requires two different of levels of aggregation:


  1. Aggregate on [Stock] & [Region] and sum the [Sales] to create the [Total Regional Sales] for each stock.
  2. Then, aggregate on [Stock] and [Total Regional Sales] to get the [Total Sales] for each stock.
                  
                           Creating [Total Regional Sales]
Creating [Total Sales]

The reason we’ve aggregated the [Total Sales] using the [Total Regional Sales] instead of the original data is that it should be (very, very slightly) quicker to process as there are less rows to sum up when compared to the original data (5000 rows before aggregating VS 4460 after aggregating). In this case, the performance difference will be practically unnoticeable but for larger data sets with more aggregation this could be more significant, so it’s good to practice thinking about the best place to aggregate.

We can now join the [Total Sales] back to the original data based on [Stock] and the [Total Regional Sales] back based on [Stock] & [Region]. After this, we can simply calculate the “percent of total” fields by dividing [Sales] by each aggregated total field:


% of Regional Sales
% of Total Sales

( [Sales]/[Total Regional Sales] ) * 100


( [Sales]/[Total Sales] ) * 100


Here we multiply the results by 100 as we’re just producing a report for the regulators – typically if doing any further calculations or using the data inside Tableau you want to leave the percentages as decimals that normally fall between 0 & 1.

Creating a filter rule

The last step in our solution is to filter out any rows which contain the only purchase for that share in that region. (You could potentially handle this earlier on in other ways - remember, no wrong solutions!) This situation can be identified by two conditions:

a) The [% of Regional Sales] is 100
b) The [Sales] is equal to the [Total Regional Sales].

We can use either of these to create a "Filter Calculation" using the “Filter Values…” option. Both the below calculations will achieve the desired result:

a) [% of Regional Sales] < 100
b) [Sales] != [Total Regional Sales]


Creating a filter calculation.



Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

2024: Week 2 - Average Price Analysis