2019: Week 5 Solution
Welcome back for the solution of week 5’s challenge! I’ll preface this by saying this week’s challenge was probably the biggest head-scratcher for me so far, mostly due to trying to thinking of the best way of 1) handing the ‘th’ in “19th July 19” and then 2) adding the weekday to the date to get the [True Date].
This week is also one of our biggest examples so far for proving the statement “there’s no one single correct answer” as there’s a ton of different ways to handle various parts of the challenge. For a couple of stages of this challenge I’ve decided to present two alternate methods; one using RegEx and one without using RegEx, to help accommodate all tastes and perhaps offer a reason to start dipping your toes into the RegEx world.
You can see my solution flow below and download it to give it a prod from here. Where it splits and re-joins is simply show-casing the alternative ways mentioned above to handle the same stage – branches and joins are not required for a solution!
So, let’s get into ways of solving the main tasks involved in the challenge.
For this task it is super valid to drag each sheet from the starter file into your view and then using a union step. However, a more dynamic and long term solution is using a Wildcard Union. This will let you automatically pull the data from all similarly named sheets and union them together in a single Import. It also adds a field to your data named [Table Names] that contains the name of the table or sheet that each row came from.
To use a Wildcard Union, simply drag one of your sheets into the view and check the “Wildcard Union” radio button. It should automatically fill the Files “Matching Pattern” to the name of the starter file so that you don’t accidentally read include other files. Defining a “Matching Pattern” simply looks for and includes any files or sheets with a name that fit that pattern.
If we had sheets in our file that you didn’t want to union, you can use the Sheets options to define a “Matching Pattern” to only include the sheets we wanted. In our case we just want all sheets, so we can leave the Matching Pattern blank as this just means “include all sheets”.
Who would have thought that a simple “th” could cause such a headache? I came up with two different solutions to handle this. Even though our data only came from one month & year and only included days that are double-digits & ending in “th”, I’ve written both solutions with the fact that single digit days & days ending in “rd”, “st”, and “nd” could be included in the future.
Both my solutions begin with simply ditching the first part of each [Table Names] field using REPLACE([Table Names], “Week commencing “, “”) to create a new field called [Week Start String] which contains values like “17th June 19”. This makes the assumption that all similar sheets in the future will have the same naming convention.
In my case I used the following function to get “17 June 19” from “17th June 19”:
In short, this pulls out the first group of digits in the field (e.g. “17”), pulls out everything after and including the first space in the field (e.g “ June 19”), and then concatenates them together to produce the date (e.g. “17 June 19”).
The good thing about this method is it will work for any day of the month regardless of the suffix (e.g. “1st”, “2nd”, “3rd”, “4th”, etc.). From here, Tableau can actually understand this as a valid date, so if you simply switch the data type on the field from 'String' (“Abc”) to 'Date' it’ll produce a nicely formatted date as we want.
After the hassle of getting the [Week Start Date] from the table name, I decided to go with a simpler hard-coded approach for using the weekday in [Date] to get the [True Date] by utilizing some nested functions as seen below.
I used a CASE statement to convert the weekday to a numerical value, and then wrapped this in the DATEADD() function to add the value to the day part of the [Week Start Date]. I then wrapped this all inside a DATE() function, since DATEADD() produces a DateTime whereas all we want is a Date.
Here is another point at which I’d like to showcase two different options, one using RegEx, and explain why one may be better than the other. However, the issue with both methods is that they rely on the ‘#’ appearing only in relation to the policy number. If the note-writer started sprinkling their notes with “#boredOfMyJob” and “#answeringPhonesSux” then problems might start arising.
In short, this is extracting the first string of digits that has a ‘#’ symbol directly in front of it from the [Notes] field. My colleague from The Data School, Robert, has written a great post on the Data School blog going into this function in a little more depth and breaking down exactly how to read the RegEx expression!
The great thing about this option over the following option is that this option can handle a policy number of any length, as ‘\d+’ looks for a string of digits of any length, whereas the non-RegEx method assumes the policy number will be exactly 4 digits long which could be a little short-sighted.
This week is also one of our biggest examples so far for proving the statement “there’s no one single correct answer” as there’s a ton of different ways to handle various parts of the challenge. For a couple of stages of this challenge I’ve decided to present two alternate methods; one using RegEx and one without using RegEx, to help accommodate all tastes and perhaps offer a reason to start dipping your toes into the RegEx world.
You can see my solution flow below and download it to give it a prod from here. Where it splits and re-joins is simply show-casing the alternative ways mentioned above to handle the same stage – branches and joins are not required for a solution!
Our full workflow. |
So, let’s get into ways of solving the main tasks involved in the challenge.
Importing and unioning the tables
For this task it is super valid to drag each sheet from the starter file into your view and then using a union step. However, a more dynamic and long term solution is using a Wildcard Union. This will let you automatically pull the data from all similarly named sheets and union them together in a single Import. It also adds a field to your data named [Table Names] that contains the name of the table or sheet that each row came from.
To use a Wildcard Union, simply drag one of your sheets into the view and check the “Wildcard Union” radio button. It should automatically fill the Files “Matching Pattern” to the name of the starter file so that you don’t accidentally read include other files. Defining a “Matching Pattern” simply looks for and includes any files or sheets with a name that fit that pattern.
If we had sheets in our file that you didn’t want to union, you can use the Sheets options to define a “Matching Pattern” to only include the sheets we wanted. In our case we just want all sheets, so we can leave the Matching Pattern blank as this just means “include all sheets”.
Converting the table name to an actual date
Who would have thought that a simple “th” could cause such a headache? I came up with two different solutions to handle this. Even though our data only came from one month & year and only included days that are double-digits & ending in “th”, I’ve written both solutions with the fact that single digit days & days ending in “rd”, “st”, and “nd” could be included in the future.
Both my solutions begin with simply ditching the first part of each [Table Names] field using REPLACE([Table Names], “Week commencing “, “”) to create a new field called [Week Start String] which contains values like “17th June 19”. This makes the assumption that all similar sheets in the future will have the same naming convention.
Option 1: The RegEx Method
Using RegEx reduces the number of steps required and tends to be a little more versatile. The caveat to this is you have to both spend some time learning how RegEx expressions work and then how the Tableau RegEx functions work, which can be a little tricky.In my case I used the following function to get “17 June 19” from “17th June 19”:
REGEXP_EXTRACT([Week Start String],'(\d+)')
+
+
REGEXP_EXTRACT([Week Start String],'(\s.*)’)
In short, this pulls out the first group of digits in the field (e.g. “17”), pulls out everything after and including the first space in the field (e.g “ June 19”), and then concatenates them together to produce the date (e.g. “17 June 19”).
The good thing about this method is it will work for any day of the month regardless of the suffix (e.g. “1st”, “2nd”, “3rd”, “4th”, etc.). From here, Tableau can actually understand this as a valid date, so if you simply switch the data type on the field from 'String' (“Abc”) to 'Date' it’ll produce a nicely formatted date as we want.
Option 2: The Split, Trim, and Date-Parse Method
The other option I used was as follows:- Split [Week Start String] on every space to get every part of the date into its own field.
- For the field with the day part in it, create a calculated field called [Day of Date] by removing the day suffix using the following function: LEFT([Week Start String - Split 1],LEN([Week Start String - Split 1])-2)
The reason I haven’t simply taken the left two characters is because if the date was a single digit (e.g. “1st”) then it’d return “1s”. By measuring the length of the entire string and then subtracting 2 we can account for any date. A three character long date would only take the 1st character (e.g. “1st” -> “1”) and a four character long date would take the 1st and 2nd characters (e.g. “17th” -> “17”). - Now that we have fields containing each part of the date in a format that Tableau can parse, we can simply use DATEPARSE(‘dddMMMMyy’,[Day of Date]+[Month of Date]+[Year of Date]) to create the [Week Start Date] field.
Creating the True Date for each record
After the hassle of getting the [Week Start Date] from the table name, I decided to go with a simpler hard-coded approach for using the weekday in [Date] to get the [True Date] by utilizing some nested functions as seen below.
I used a CASE statement to convert the weekday to a numerical value, and then wrapped this in the DATEADD() function to add the value to the day part of the [Week Start Date]. I then wrapped this all inside a DATE() function, since DATEADD() produces a DateTime whereas all we want is a Date.
Getting the policy number
Here is another point at which I’d like to showcase two different options, one using RegEx, and explain why one may be better than the other. However, the issue with both methods is that they rely on the ‘#’ appearing only in relation to the policy number. If the note-writer started sprinkling their notes with “#boredOfMyJob” and “#answeringPhonesSux” then problems might start arising.Option 1: The RegEx Method
You can get the policy number using the Regex function below.
REGEXP_EXTRACT([Notes],'.#(\d+)')
In short, this is extracting the first string of digits that has a ‘#’ symbol directly in front of it from the [Notes] field. My colleague from The Data School, Robert, has written a great post on the Data School blog going into this function in a little more depth and breaking down exactly how to read the RegEx expression!
The great thing about this option over the following option is that this option can handle a policy number of any length, as ‘\d+’ looks for a string of digits of any length, whereas the non-RegEx method assumes the policy number will be exactly 4 digits long which could be a little short-sighted.
Option 2: The Split & Trim Method
On the other hand, you could achieve nearly the same thing using SPLIT([Notes],’#’,2), which splits the [Notes] field on every ‘#’ symbol and takes the second portion, and then taking the left 4 characters. As mentioned before, the major downside to this is that if a policy number isn’t exactly 4 digits long it may start returning some funny results!Finding Contact Method and Contact Reason
I won’t go into too much detail for these but each of [Contact Method], [Statement?], [Balance?], and [Complaint?] fields I simply used variations of the CONTAINS() function. The main thing to note for these is that if you make sure to wrap your [Notes] field in a LOWER() or UPPER() function first, it will make matching strings easier. The CONTAINS() function, like many of Tableau’s functions is case-sensitive, so by making your string either all uppercase or all lowercase first means you no longer have to be worried about any results slipping through the net due to some bored note-writer choosing to write all their nOtEs LiKe ThIs.Contact Method details. |
Contact Reason details. |