2019: Week 3 Solution

Congratulations to all of you that have survived week 3 and welcome back! This week we aimed to give you all a taste of scaffolding data in Tableau Prep. We decided to not be truly evil and not only gave you the scaffold but even kindly named the second sheet as “Scaffold” to help point you in the right direction. What can I say, we’re just lovely people.

To get more detail about scaffolding and scaffolding use-cases there are a number of blog posts available [1] [2] [3] however in short: scaffolding data is a way to create rows of data when the data is missing or empty. In our case, this refers to the fact that we need the payment date for each contract for each month but we only have the contract length and contract start date.

If you were to build a scaffold for our data your first instinct may be to create a date scaffold to join our phone contract data to - however by now you'll have noticed we instead have a length scaffold of numerical values. This was a deliberate choice - by creating the scaffold based on contract length, the scaffold can be reused for any period of time. If we built one around dates then every time we wanted to scaffold a new range of dates we'd need then go ahead and update our scaffold with all these new dates!

To scaffold our challenge data we need to achieve two main things:

  1. Find a way to join the data to the scaffold.

  2. Find a way to filter out any rows where the payment date is past the end of the contract.

There are two main ways to accomplish this. Our example solutions can be found in the sections below.

Create a dummy field to join the data and scaffold together



This was our initial solution and the method talked about in Bethany’s Tableau conference vid. You can download our full workflow here.

  1. In both the original data and the scaffold data create a “dummy” field to join every row of the scaffold to every row of the data. This can be as simple as a calculated field called [Dummy] which just contains the string “X”.

  2. Create an inner join between the two data sets on the condition [Dummy] = [Dummy].  This will always be true as the only value in both [Dummy] fields is the string “X”.

From here, you can filter the data to remove any row where [Contract Length] is greater [Length], and then use the DATEADD() function to add [Length] to [Start Date] to create the [Payment Date], i.e. DATEADD(‘month’,[Length],[Start Date]).

Use a join calculation to join the data and scaffold calculation



This slightly quicker and more scalable way was first sent to us by Donna Coles who pointed out you can filter within the join step by modifying the join clause. You can download our full workflow for this here.

  1. Add an inner join step between the original data and the scaffold date.

  2. Modify the join clause to join on [Contact Length] <= [Length].

This skips not only the need to create a dummy field to join on but also the need to manually filter out the data after joining. Discovering this solution demonstrates one of the joys of this initiative. By putting these challenges to the community, it allows us to consistently find and share new and innovative ways to accomplish difficult or common problems faced in data preparation.

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank