2021: Week 39 - Solution

Solution by Tom Prowse and you can download the workflow here. 

The challenge this week was all about parsing information from a single column of data, with a focus on a painting process from our Bike store. 

Step 1 - Make Date Time Field

The first step this week is to combine the Date and the Time field to create a datetime that each of the changes occurred in the process. This can be easily created by using a MAKEDATETIME() function: 

Datetime 
MAKEDATETIME([Date],[Time])

Then we can remove the individual Date and Time fields. 

Step 2 - Bike Type & Batch Status

We can now start to identify each of the different areas that the engineers are interested in. This includes the type of bike and whether the batch failed or not, and these are identified by the Data Type = 'Result Data'. 

After filtering the data set so we only have the Result Data fields, we can then pivot these so they each have a separate column: 


At this stage our data set should look like this: 


Step 3 - Actual & Target Values

Going back to the step where we created the Datetime field, we need to create a new branch and filter the 'Data Type = Process Data' and then exclude 'Name of Process Stage'. This will leave us with the main data parameters that the engineers will monitor throughout the process. 

To parse the Actual and Target values, we first need to split the process parameters into separate fields to identify whether it's an actual or target value. Using a custom split and a space as the separator, we can split off the actual/target value from each parameter: 


Then using the newly split field, we can replace the Actual or Target value with a blank in the original Data Parameter field, so that we are left with just the process parameters: 

Data Parameter 
REPLACE([Data Parameter],[Actual or Target]+' ','')

After removing some fields and changing the data types, our data should now look like this: 


The final part of this section is to pivot the data so that the Actual and Target values have their own columns: 


After the pivot our data should now look like this: 

Step 4 - Process Stage

The last bit of parsing that we need to do is identify what stage each of the processes take place in. To do this we want to create another branch from our Datetime step, but before we filter for only 'Name of Process Stage' in the Data Parameter field, we first need to find the last update time for each of the batches. 

Final Process End Time 


As a result our data should now look like this: 


The next step is to identify a start and end time for each of the Process Stages. To do this we first need to create an order field to what order each of the process stages occurs:

Process Order 
{ PARTITION [Batch No.]: { ORDERBY [Datetime] ASC : ROW_NUMBER()}}

This calculation is saying... for each Batch Number, return a row number where the order date is sorted from latest to most recent. 

As a result, we can use this to help find what the next process stage is by using a self join. First, we need to create a new step, then subtract 1 from the Process Order, and then we can join on the Batch No and the Process Order fields. We subtract one as this will help us to align the next stage of process when we join, meaning that it will bring the details onto the same row, therefore giving us a start and end date for the process. 

Before the join you may want to add a prefix to the fields so that you can see where each of the fields have come from:


After the join we can remove any duplicated fields, and then merge the Datetime from the self join with final datetime that we calculated earlier. This will remove any null values and we will now have a start and end time for each of the process stages. Our data for this branch should now look like this: 


Step 5 - Combine Branches

The final step is to bring all of the different branches together into a single table. First, we can join the result data with the process parameters, by using an inner join on Batch No.: 


Then we can combine the process stages by using a slight more complex join. As we need to duplicate the stage over multiple different parameter updates, we need to join this using the Start & End dates as well as the Batch No.:


As a result our data should now look like our final output: 


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text