2020: Week 35 - Solution

Solution by Tom Prowse and you can download our solution here.


This week we had the option to take a look at the scripting tool for the first time! In the challenge we looked at creating a Row ID in the script tool by using either TabPy or RServe. Within this solution we will cover how to solve using RServe, but we would recommend looking at this blog post for how to create a Row ID using TabPy.

Our task was to format the Sales and Budget table being used by Chin & Beard Suds Co. We want to do this in the most dynamic way possible so that we can automate the process going forwards. 

We will cover both the solution for the input including the Row-ID and also using RServe to create the Row-ID.

Step 1 - Data Interpreter

When we first bring the data into Tableau Prep, it's a bit of a mess as it is bringing it through in the same format as in the spreadsheet. Therefore, our first task is to use the Data Interpreter to help clean this up, so that the data now looks like this: 


Notice that the headers are now in the correct position and we can continue to clean the data. 

The data interpreter hasn't quite cleaned all of the fields, and there are still some Nulls. Therefore, the next step is to remove any Null values from the Store field:


Step 2 - Row ID

This is the stage where the two solutions differ slightly. If you are using the input which includes the Row-ID field then you can skip this part, but if you are using RServe then this is the stage where you input the Row-ID using the Script tool. 

First of all you need to bring in the Script tool and configure this so it's working on your machine. Jenny's blog goes through how to do this, so we would recommend following the steps there. 

Before configuring the Script tool, you will need to ensure all of your headers have no spaces between them. We changed this manually, by editing the headers, for example - 'Jan Sales Jan Target' becomes 'JanSalesJanTarget'.

Once you're setup with the script tool and changed the headers, you will need to create the following within R. This is our RowNumber Script and can be downloaded here

rowNumber <- function(n) {
n$RowID <- rep(0,nrow(n))
for (i in 1:nrow(n)) {
n$RowID[i]<-i
}
return(data.frame(Store=n$Store, JanSalesJanTarget = n$JanSalesJanTarget, FebSalesFebTarget = n$FebSalesFebTarget,MarSalesMarTarget = n$MarSalesMarTarget, 
AprSalesAprTarget = n$AprSalesAprTarget, MaySalesMayTarget = n$MaySalesMayTarget, JunSalesJunTarget = n$JunSalesJunTarget, JulSalesJulTarget = n$JulSalesJulTarget,RowID=n$RowID))
}

getOutputSchema <- function() {      
  return (data.frame (
    Store = prep_string (),
    JanSalesJanTarget = prep_int (),
    FebSalesFebTarget = prep_int (),
    MarSalesMarTarget = prep_int (),
    AprSalesAprTarget = prep_int (),
    MaySalesMayTarget = prep_int (),
    JunSalesJunTarget = prep_int (),
    JulSalesJulTarget = prep_int (),
    RowID = prep_int ()
))

}

Once you have saved this as an R File, you can add this into the script tool via the browse section and also give the function a name: 


This should create a Row-ID for your data table which we can now use in future steps to help us clean the table!

Step 3 - Pivot & Create Month

Now we have a Row-ID within both of our inputs/solutions the rest of the challenge is the same. 

First we need to pivot our data so that each of the month columns is a separate row. Using the pivot tool we can use a Wildcard Pivot with the following setup: 


Our data has now been transformed and looks like this: 


Now we have a row for each of the months, we can clean the Pivot1 Names field to extract the Month name. Luckily, all of the months are at the start of the string so we can use this calculation to extract everything before the first space: 


We can then use the split Month field to remove the Month name from the Pivot Names field: 

Pivot1 Names
REPLACE([Pivot1 Names],[Month],"")

Then finally we can trim any additional spaces from the Pivot1 Names field by using the Clean functions (Trim Spaces and Remove Extra Spaces).

Our table should now look like this: 


Step 4 - Assign a Row ID Order

The next task is to assign a Row ID order to help us understand whether the row is a Sales, Target or Difference field. We are told in the challenge post that the Sales Value will have the lowest Row ID value for each store. 

Therefore, our first step is to identify the minimum Row-ID for each store. We can use a Fixed LOD to do this: 

Min

We then want to do the same but find the Max value: 

Max

Now we have the Min and Max Row-ID for each of our Stores, we can use this calculation to calculate the order: 

Order
IF [Row ID]=[Min]
THEN 1
ELSEIF [Row ID]=[Max]
THEN 3
ELSE 2
END

This will allow us to have a 1, 2 or 3 for each store, which we can then use at a later stage to identify whether the value is Sales, Target or Difference. Our table now looks like this: 


Step 5 - Assign Headers

Now we have a nice order for each of our stores, we can assign the headers for each section. We want to do this in a dynamic way, therefore we are going to use the following calculation: 

Headers
REPLACE(
MID([Pivot1 Names], FINDNTH([Pivot1 Names]," ",[Order]-1))
, MID([Pivot1 Names], FINDNTH([Pivot1 Names]," ",[Order]))
, "")

If we breakdown this calculation, the MID part is returning the characters from the middle of our Pivot1 Names string. This is determined by the value within the FINDNTH part of the calculation. The FindNth, uses the Order field to determine the position of the occurrence. 

Basically, the calculation is determining which word is associated with each Order ID, depending on where it is placed within the string. So 1 = Sales, 2 = Target, 3 = Difference. If another Order ID and word is added then 4 = New Word, meaning that this would automatically and dynamically update in the future!

The final part of this step is to do some tidying of the string field. We want to trim any additional spaces from the Headers field, and then we can remove Pivot1 Names and Order. Our table should now look like this: 


Step 6 - Pivot & Tidy

The final step this week is to re-pivot our data so that we have a column for each of the Sales, Target, & Difference. To do this we want to use a Rows to Columns Pivot with the following setup: 


After the pivot, the final step needed is to update of Month field so that it's a Date. We can use this calculation to do that: 

Month 
DATE(DATEPARSE('MMM/yy',[Month]+"/20"))

As a result our final output table looks like this: 


After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14, @JonathanAllenby & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text