How to... Not Need Data Prep at all

Sometimes we all overthink problems and this happens a lot in Data Preparation. Once you have some strong skills, it can be easy to overthink the issue and take additional steps you simply don't need.

History of Data Preparation in Tableau

When trying Tableau Public for the first time in version 5.2 and becoming a heavy user of Desktop version 7 onwards, completing your Data Prep in Tableau was tough unless you were doing very simple use cases. Tableau Prep was just a glint in the eye of the Tableau Developers at that point. 

Common data prep tasks had to happen outside of Tableau Desktop for a number of reasons, like performance (slower processing speeds), but it was the lack of functionality that caused the most significant issues. There was often the need to use external tools to complete tasks. The Tableau Excel Add-In Extension was the principle way to deal with Pivoting data fields to change columns into rows of data. This was very useful for dealing with survey data and where dates were held in separate columns (ie not Tableau friendly).

Complex joins, removing data for performance and unions were not part of the data connection window in Desktop as it is now. This meant that to use data sources, the preparation would have to happen in the database the data originated in. 

As Tableau's development aim has always been to keep the user in the flow of their analysis, they have built more preparation functions into the core tool, Tableau Desktop. The challenge with this was that all the data preparation happened within the Data Connection window and space was becoming limited and the complexity of the challenge could not be kept within Tableau's usual simple User Interface until version 2018 changed that approach by spinning out Tableau Prep Builder as a separate tool.

But what should you still attempt in Tableau Desktop compared to using Tableau Prep?

Simple Joins

Joins have been a feature of data connections in Desktop for years. Yes Tableau Prep can of course handle these but by keeping the join in Desktop, you can flexibly change the join type and conditions whenever you want. With the addition of Join Calculations, most situations where you need additional columns from additional data sources can, and should, be handled in Desktop. 

When to move to Prep?
When making multiple, different joins, things can get confused in Desktop. With Prep's profile pane, it's much easier to see when join conditions have gone wrong or created a lot of nulls that you were not expecting. Also look to use Prep when you need to change the level of aggregation of one of your data sources before joining the data sets together. 

Unions

Like Joins, when Unions were added to the Tableau Desktop Connection Window, the number of times data prep had to be completed outside of Tableau Desktop reduced dramatically. Unions in Desktop are quite flexible. So much so the options for Unions within Prep are very similar. 
 A basic Union can be added in Desktop by dragging the additional dataset underneath the original connection.
Once the union is formed, there is the option to edit the Union or add additional data sources to that union. More complex unions can be completed through using Wildcard Unions. There's more explanation on Wildcard Unions in this How To...Union post.
Similar to Prep, the Union can be wild carded at he the Sheet or Workbook level when connecting to Excel sheets. Different data types have different options at this point so Planning what data you need and where it is is key. By being able to explore the effects of the unioned data straight away in Desktop is a reason to start in the tool unless you need to do other data preparation steps.

When to move to Prep?
When uniting data with different structures (column names), the resulting data set from the union will lead to a lot of 'null' values. Being able to deal with those nulls before progressing to your analysis can be complex so the calculations to clean this up is important. 

Single Pivots

When pivot was added to Desktop, the need to use external tools like the Excel Add-In for Tableau became considerably less. Survey data has long been a considerable challenge in Desktop but being able to pivot a column, or set of columns, meant that external tools were no longer the only option to prepare and shape the data set. 

The resulting data field names are far from useful but are easy to adjust.


When to move to Prep?
The limitations of pivoting within Desktop is that you can only pivot once. Many data sets are much more complicated than this especially that pesky survey data. 

Review / Handover

Tableau Prep is a piece of software that allows a lot of traceability. By this we mean it is largely self-documenting as you build the data preparation flow. The use of iconography, the Changes Pane and ability to step through the changes one-by-one allows anyone to retrace the journey from input to output and why the stages are likely to exist. With renamed steps and added descriptions, the work is a lot easier to understand than a labouriously written out Word document as project handover that we have all had to consume over the years.

When to move to Prep?
Do the data preparation in Prep and save yourself the time in writing the documentation for handover separately. 

All these steps in isolation have been possible within Desktop but the combination of these techniques and others result in the need for Tableau Prep. Also handing over complex workbooks with multiple stage data preparation in Desktop is far from ideal for the recipient so again, passing this work to Prep is useful let alone the more complex functionality that will feature in additional 'How to...'s. 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text