2019: Week 25 Solution

You can view our full solution workflow below and download it here!

Our full workflow solution.
This week saw our first collaboration with Workout Wednesday! For the Preppin’ Data solution, the order of steps is very interchangeable this week – as long as you join on the correct fields it should work regardless of whether you join at the very start, very end, or somewhere in the middle. There are two particular steps I’ll cover here:
  1. Obtaining all the [Fellow Artists].
  2. Removing duplicate records.
However to view a full walkthrough of the solution check out the video below.

Obtaining the [Fellow Artists]

This is probably the hardest part of the task. I’ll outline the steps below but you can find it demonstrated in the solution video.

If the [Concert] field contains a ‘/’ then it is a list of artists. If it doesn’t include a ‘/’ then it’s a concert name like “Divide World Tour”. We want to both filter out these concert names as well as Ben and Ed’s names so that our final [Fellow Artists] field only contains blanks or actual fellow artists.
To do this, we can first use the following calculation to produce a cleaned up [Concert] field:

[Fellow Artists]

//This field's purpose is for us to split it apart in order to get the individual fellow artists later.

//If concert contains a '/'...
//then return the concert name with Ed or Ben's name removed if it appears...
//else return a blank field.

IF CONTAINS([Concert],"/")
    THEN TRIM(
         REPLACE([Concert],[Artist],"")
    )
    ELSE ""
END


This should create a field which looks like this:
The initial [Fellow Artists] field.

From here we can use a Custom Split where we use ‘/’ as a separator and split off all fields.* This should give us 14 [Fellow Artists – Split X] fields, each of which either is blank or contains an actual fellow artist.

We can then later on use  a Columns-to-Rows pivot where we rotate all these fields into a singular [Fellow Artists] field.

*There are still issues with the custom split. The following split should result in 14 fields but we’re aware some people were getting 7-10 fields returned.

You may notice, however, that there is suddenly a huge number of rows with a large majority of them containing a blank [Fellow Artists] field. This is because many of the [Fellow Artist – Split X] fields did not contain data. This is where the ‘remove duplicates’ stage comes in!
Over 20k blank [Fellow Artists]?!

Removing duplicate rows

Tableau Prep actually has a built in ‘duplicate row filtering’ tool in the form of the Aggregation tool! The Aggregation tool’s primary purpose is to group rows together based on a set of common dimensions and aggregate some measures for them. However:

  1. You don’t actually need fields under both ‘Group’ & ‘Aggregate’.
         - If you have just Grouped fields it returns one row for every unique combination of Group         fields.
         - If you have just Aggregate fields then it returns the aggregated values for the entire data             set
  2. You can group on every field to remove duplicates!

In our case we don’t want to group on [ConcertID] as it acts as a row ID, but if we group on every other field then Tableau Prep reduces all duplicates rows down to a single row.
The aggregation settings.


Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text