2021: Week 19 - Solution

 


Solution by Tom Prowse and you can download the full workflow here


We continued our fundamental challenges this week with a focus on string calculations. We are trying to find out some more detail on what is going on with the project over runs in Prep Air by parsing out the key information from the commentary log. 

Step 1 - Create Week

The first step that we have done this week is to create a 'Week X' field. We have already been provided with the week number so we need to add the string 'Week' before each one. We use the following calculation for this: 

Week
'Week '+ str([Week])

Our week rows now look like this: 



Step 2 - Split Commentary

The next step is to break apart each of the different projects from the commentary field. To do this we want to split each one into a separate row by using the split functionality. Each project/task starts with a pair of square brackets (eg, [NLS/Op-Sc]) so we can use the '[' to split each of the different projects. 

Our custom split function looks like this: 


After the split there should be each project as a separate field for each week:


Ideally we want all of the different projects to be in a single row, therefore we can use a columns to rows pivot to do this. Using a wildcard pivot will allow us to pick up any fields with the word 'Split' in them meaning that this will work for any future projects as well: 

After the pivot we also want to remove any blank fields that have occurred from the split/pivot steps.

Step 3 - Parse Project & Sub-Project

Now we have all of the projects in a single column we can start to parse out the key information using the following steps: 

1, Split Detail

Using a custom split on ']' we can separate the detail text from the commentary:

As a result our data looks like this: 


2, Split Project, Sub Project & Task

Again we can use the split functionality to split the project and sub-project. This time we want to use '/' as the separator: 


Our data will now looks like this: 

Then the final clean is to remove the punctuation from the Project Code field. Tableau Prep has in-built functionality to do this so it's super simple


We now have a clean Project Code field, so the next task is to again use the split functionality to split apart the sub-project and task field. Instead of using a custom split, an automatic split will work here so we should now have two columns that look like this: 


As a result of the split we are left with three separate fields that look like this: 


The only other step that we need to take here is to combine the 'Op' and 'Ops' fields. There are different ways of doing this and Tableau Prep allows you to group by different things to automate the process. 


Our table should now look like this: 


Step 4 - Join Lookups

We are now ready to join the lookup tables so that we can transform the codes into something a bit more useful. There's a lookup table for Project, Sub-Project, and Task so this will be three separate joins: 

Project


Sub-Project 




Task


After removing any duplicate or code fields our table should look like this: 


Step 5 - Owner Code

The owner code is always at the end of the detail string and is always three letters. Therefore we can use the following calculation to extract this: 

Owner Code
RIGHT([Detail],4)

Then remove any punctuation using the clean functionality so we are just left with the three letters of each name. 

Next we can join in the owner code lookup table by joining on Owner Code = Abbreviation


We now have the full name for each of the project owners: 



Step 6 - Days Noted

The final part of this week's challenge is to see how many days each task took to complete which can be found in the detail comments. 

First we need to parse out any numerical values that come before 'days'. To do this we can use a combination of mid and find calculations: 

Days Noted
mid([Detail],find([Detail],' days')-2,2)

Then we can remove any letters and punctuation from the string to leave just the numerical values for each. 

An alternative would be to use a Regex function with the following calculation: 

Days Noted
REGEXP_EXTRACT([Detail],'.*?(\d+)\sdays.*')

 After extracting the number of days, we are then ready to output the data.


Here's our full output for comparison. 

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text