2019: Week 11 Solution

April 30, 2019

The full solution flow can be seen below and downloaded here.

The Full Workflow

This week’s challenge served to showcase Tableau Prep’s versatility and prove that it can be used to fairly easily parse JSON data. Most of the steps required to complete the challenge were covered in the challenge’s requirements, so this week’s solution will gloss over these parts.

Parsing and Cleaning the JSON

JSON headers are always broken up using fullstops, so a Custom Split on every “.” easily breaks this up for us. There’s a reason we suggest breaking it up before filtering any rows. You could straight away filter on “NOT CONTAINS([JSON_Name],”meta”)”, however if you split the JSON up first you can then easily click on and exclude unnecessary rows of data. From this view you can also easily remove any unnecessary information contained in the JSON_Name by identifying generated Split fields that only contain a single value.

Removing fields that do not add any detail.

Pivoting the Data

After cleaning up the JSON we should be left with 3 fields:

[Row] : This information was originally Split 8, with empty values replaced with values from Split 5.
[Data Type] : This information was originally Split 7, with empty values replaced with values from Split 4.
[Value] : This information is the [JSON_ValueString] field renamed.

We want our [Data Type] values to be our new headers so we know we need a row-to-column pivot. We can drag our [Data Type] field onto the Pivoted Fields shelf and drag our [Value] field onto the Field to aggregate for new columns shelf. The aggregation can be set to SUM, MIN, MAX, AVERAGE, or MEDIAN. This is because there should be only a single value related to each field in each row, so all these will give the same result.

Converting [Timestamp] to [Date]

A common date format found used in computing and date value storage is Epoch time (also known as Unix time). This is a measure of how many seconds have passed since 00:00:00 on January 1st, 1970 (not counting leap-seconds). This means we can figure out the actual date and time for each row by adding [Timestamp] to this date. This can be accomplished with the DATEADD() function.

By specifying a ‘Date_Part’ of ‘second’, an 'interval' to add of INT([Timestamp]), and a 'Date' of #1970-01-01 00:00:00#, we can get our actual date.

DATEADD('second', INT([timestamp]), #1970-01-01 00:00:00#)

Two things to note with this function:

1. It doesn’t specify the data types needed for each parameter. Confusingly, this is left to you to figure out from the function description, trial & error, and the error messages provided. Whilst the [Timestamp] field is a number, in my case the field type was set to “Number (decimal)”. The error message provided when using the field is seen below.

“Function DATEADD’ can’t accept parameters: (string, number, datetime)."

I spent a long time thinking the issue was how I was supplying the date. In fact, all I needed to do to resolve the error was to wrap [Timestamp] in an INT() function or change the data type to “Number (Whole)”.

2. As you can see, you can type in a date surrounded by “#” symbols and it will accept these as dates or datetimes. If you enter in a date in an incorrect format the error message will let you know the valid formats:

The date format isn’t supported. Use the date formats “yyyy-MM-dd”; “yyyy-MM-dd HH:mm:ss”; “yyyy-MM-dd HH:mm:ss.SSS”; “HH:mm:ss.” or “HH:mm:ss.SSS”.

You can see here that neither the error message nor the function description specify that the # symbols are necessary, however they do indeed seem to be.

Search This Blog

Preppin' Data