How to... connect to programming scripts

With Tableau Prep version 2019.3, the product's development team added the ability to call Programming Scripts to enable users to use functionality in Prep that is not natively available. The Script step allows the user of Prep to enter either Python or R scripts which may already exist, utilising others logic, or you can write your own.
This post will look at what situations you may want to connect a script into Prep, how to set up your computer to allow you to use the Script functionality and a short example.

When to use the Script step in Prep?

The simple answer is when Prep does not have the functionality you require to achieve your data preparation needs. Programming is a very flexible means to instruct computers to complete the tasks you require. Python and R are two languages that many are taught in universities due to the number of packages available. A package utilises a programming language to complete common tasks saving the user from having to write all the instructions from scratch. 

Tableau Prep is designed to be easy to use and the introduction of the Script tool sits at the other end of the easy-to-use spectrum for many so how might non-programmers take advantage of the Script step? Find a programmer who can help! This person may exist within your organisation, or it could be a helpful member of the Tableau community. Clearly you need to trust the source of the code before using the script to ensure you are not going to cause any harm with malicious code. The Script file can be created once and then reused multiple times and the Tableau Community loves to share so there will be an ever growing volume of useful scripts to take advantage of for users of Prep. 

A constraint of the Script step is that you will need to install a version of a TabPy or Rserve server. This may not be possible within your organisation as a number of organisations are very protective of what programmes can be run on their computers. If you don't have this constraint then you will be able to use Scripts that can potentially save a lot of time and complexity within flows that require advanced techniques. 

How to set-up your computer to use Scripts in Prep

If you have the permissions on your computer to set up TabPy or Rserve, then this is what this section will cover. Rather than offering step-by-step instructions for both language, this post will focus on Python as that's where I have more experience. 

To use Python scripts in Prep, you need to firstly download Python on to your computer. Anaconda is a free and easy set-up to install from anaconda.com
Once Anaconda is installed, you will need to download TabPy. TabPy is hosted on Github and can be found through googling or going to https://github.com/tableau/TabPy. 

To install TabPy you will need to go in Terminal (Mac) or Command Prompt (Windows). 

If successful, TabPy will be available on localhost if you have installed it on your computer through Port 9004 and the message 'Web service listening on port 9004' will show in Terminal / Command Prompt.

When TabPy is running, you will be able to connect to the instance in Tableau Prep using the Script step. To get to the Script step, you will need to feed a data source into the step. 

An error will show when you initially add the Script step as you need to point the Script step to the programming server you want to use and configure the tool correctly. To configure the step, firstly select the Connection Type you want to use. This example will use a Tableau Python connection. Once the selection is made, you will need to connect to the server running that language. 

The example used here is just the defaults left from a default install. You do not need to enter a username and password if your instance doesn't require them. A successful connection will show a light grey connection to the server name under the 'Server' section in this case 'localhost:9004'.

The next step is link the script you want to run in Prep. To do this, click 'Browse' and navigate to the .py file you want to run. You can write .py files in many text editors. The final step is picking the function you want Prep to return. This is set in the Function Name section.

Walkthrough of how to use a Scripting step

As this post is focused on those starting to use Prep, here is a simple way to create a Row ID that cane be used as an ID for analysis. In Prep 2020.1, Prep added the ability to use the row_number() function but to that point, this was the easiest technique to add a row ID into a data set where it didn't exist. This technique will show how to make adding a data field to an input easier with the Python tool than adding more lines of code.

The dataset for this example is Superstore, the demo data set that come with every install of Tableau Desktop or Prep. This data set has a hidden Row ID that shows in Prep but doesn't show in Desktop. This example will show how to add a new data field to Superstore called 'New Index' that could be used as a reference for the Superstore with an ID per product, per order.

After connecting to Superstore, add a clean step to create a calculated field that will host the values affected by the Script. Create a calculated field with just the value 1 in it and call it 'New Index'.

The index script used in this example requires the field effected to be an integer so ensure the data type is correct. The script can then be built to run in the Scripting tool but will need to be held as a .py script somewhere that Prep can access. This particular script creates a function called 'Set_Index' that alters 'New Index' by creating an Index starting at 0, adding 1000 and then returning the values as a data frame. The Set_Index function has to be the function called in the 'Function Name' setting at the bottom of the configuration pane of the Script step. Here is the full script.

Data frames are the values returned from the script into the Prep flow. If you are creating new data fields, you will need to define the data frame being returned. This takes additional coding and is avoided in this example by creating the dummy 'New Index' field to be overwritten by the scripting step. If you do require additional data fields you will need to add the schema function 'getOutputSchema' to your script. In this example, the output of the Scripting step is an updated 'New Index' field that starts from 1000.

Summary

Prep opened up it's capabilities dramatically with the introduction of the Script step. Despite the greater complexity of programming, the flexibility the functionality offers in the data preparation process is significant. Using programming is more complex and will not be suitable for every user of Prep. Like any capability, scripting can be a skill that is learned and developed over time to super charge your data preparation abilities. 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text