2022: Week 48 - Solution

Solution by Tom Prowse and you can download the workflow here.


We have another collaboration with #GamesNightViz and this time we are looking at the game of tiddlywinks!

Step 1 - Extract & Parse

First we need to input our data, then we can extract the Event ID from the Event field.

To do this, we want to duplicate the Event field, then use the Clean functionality to Remove Letters and Trim Spaces. From here we can turn the data type into a Whole Number and rename to Event_ID.

After the Event ID we can then parse the Competitor and Association. Both of these pieces of information are within the Competitors field and we need to utilise the Split functionality to extract the information.

To split off the competitor name, we can use a custom split on all values with '(' as a separator. 


This allows us to have 2 new fields - Split 1 contains the Competitor name, Split 2 contains the association. 

We can tidy these up a bit by renaming Split 1 to Competitor Name and then trim spaces. Then rename Split 2 to Association and remove any punctuation. 

At this stage our table should look like this:


Step 2 - First Game Output

We can now split our workflow into two branches - 1. Games, 2. Results.

First up we are going to look at Games, so on a new clean step we want to remove the following fields: 
  • Pts
  • Association
  • Competitors
  • W
  • L
  • T
  • Event
  • Descriptions
We can then pivot the data so that all the fields with 'G' are pivoted from Columns to Rows. Within the pivot we can use a wildcard pivot on 'G' to bring through all of our game fields: 


After the pivot we can then identify if there has been a Potout by using this calculated field: 

Potout
IF RIGHT([Score],1)='*'
THEN TRUE
ELSE FALSE
END

Then finally we can clean the score by removing any punctuation and replace the fractions with decimals:

Score 
REPLACE(REPLACE(REPLACE([Score],'½','.5'),'⅓','.33'),'⅔','.67')

Then changing the score to a decimal number. 

We have then completed out first output for the Games that looks like this: 


Step 2 - Results Output

On a separate branch, we can now focus on the 2nd output relating to Results. 

First we need to remove any fields relating to Games, therefore we can remove all the fields with the 'G' along with Competitors and Note.

Next we want to clean the Pts fields so that the fractions are decimals

Pts 
REPLACE(REPLACE(REPLACE([Pts],'½','.5'),'⅓','.33'),'⅔','.67')

Then make sure this is a decimal data type.

We then need to clean the Event Start Date, by splitting the description on the '·' value. We just need the first value before the · here: 


We then need to split the field that we have just extracted by the word 'and' to extract the first date. 

The next step is to rename the split fields so description split 1 split 1 = Event Start Date, then we can remove any other split fields and make sure it's a date.

The last step is to rename the following fields: 

  • Pts = Points
  • W = Wins
  • L = Losses
  • T = Ties
  • description = Event Description
  • event = Event
  • event_id = Event ID
We're then ready to output out 2nd data table: 



You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and #GamesNightViz and tagging @Datajedininja@JennyMartinDS14@TomProwse1 & @WJSutton12

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text