2020: Week 30 - Solution
Solution post by Tom Prowse
Download our full solution here.
For this week's challenge we looked at a slightly different input method, by using Google Sheets as a data source so that you can fully refresh and make it relevant by inputting your own city! However, for the solution post, I'll use an example from an Excel document so that you can download and see the solution. If you wanted to connect to a Google Sheet then the process is explained within the challenge post.
Step 1 - Replace Characters
For the initial steps, this will be repeated for all three tables (Next 5 Days, Next 24 Hours, and Next 5 Hours). However, as the tables have different structures, we will still split them out into three different work streams.
The first step this week is to split the string so that each category is within it's own column. The categories are broken up by the '\n' which normally represents a 'new line', however the Split functionality within Tableau Prep doesn't recognise this as a separator. As a result, we will need to replace the '\n' with something we can split on, therefore we have used this calculation:
Data
REGEXP_REPLACE([Data],'\n','|')
This allows us to replace the '\n' with a '|', giving us the ability to then split the by using this separator at a later stage.
We also need to replace the '°' and '%' symbols. We can again use Regex to replace these by using this calculation:
Data
REGEXP_REPLACE([Data],'°|%','')
This replaces '°' or '%' with a blank string, therefore removing them from the string. Usually, we would use the Remove Punctuation functionality, however this doesn't quite pick up all of the punctuation symbols.
The data within our three tables should look like this:
Next 5 Days
Next 24 Hours
Step 2 - Split Strings
We are now ready to split the strings into separate columns by using the '|'. For each of the tables, we can use a custom split to split all values on '|':
We now how the categories split out into different columns, so now we'll need to rename the column headers to make the output.
For Next 5 Hours and Next 24 Hours we rename the following:
- Data - Split 1 to Date or Time
- Data - Split 2 to Temperature
- Data - Split 3 to Precipitation Chance
- Remove all other fields
For Next 5 Days we rename the following:
- Data - Split 1 to Date or Time
- Data - Split 2 to Max Temp
- Data - Split 3 to Min Temp
- Data - Split 4 to Precipitation Chance
- Remove all other fields
We should now have the following for each of the tables:
Next 5 Hours
Next 24 Hours
Next 5 Days
The final step for this week's challenge is to bring all of the tables together into a single table. As we have similar fields, and we want to stack each table on top of each other, we can use the union to do this.
First, drag the Next 5 Hours ('Clean String' Step) onto the Next 24 Hours ('Clean String' Step) then onto the section that says Union. This will bring the two tables together with the field names matching up.
Next, we then drag the Next 5 Days ('Clean String' Step) onto the Union tool, but this time instead of Union, we are going to drop it onto 'Add'. This will add the flow to the original union tool which now should have three inputs and look like this:
The final step for this week's challenge is to tidy some fields so that it is ready for the output. We followed these steps:
- Replace '--' with Null. There is a row within Precipitation Chance that contains '--', we want to Right-Click and Replace with Null.
- Remove 'Updating Weather...'. We want to keep just the table name (Next 5 Hours etc) from the Tables Name field. Therefore, we can use the custom split, to keep only the last value after the '/' separator:
- Remove & Rename Table Names. Finally, we can remove the Tables Names field and rename the newly split field to Forecast Type.
We are now ready to output the data, which should look like this:
Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows!