2022: Week 22 - Solution


Solution by Tom Prowse and you can download the workflow here.


This week we joined forces with a couple of other community initiatives (#GamesNightViz and #DataFamCon) to bring you a challenge all about Dungeons & Dragons. We want to look at how long each character speaks during each episode of the podcast 'Critical Role' so that we can create a visualisation from this.

Step 1 - Union Data

First we want to union our data sets together so that we can work out the difference from one timestamp to the next, and also identify when an episode ends based on the last timestamp.

We can input both of our data sources, we don't need to do anything with the Dialogue table but in the Episode Details we need to rename 'runtime_in_secs' to 'time_in_secs' then keep only the Episode and the Time_in_secs fields.

Now we can union both tables together so that the table look like this:


Step 2 - Timestamp Rank

Next we need to rank the timestamps in each episode so that we can identify the earliest to latest ones. We can use the Rank analytical calculation here, and we want to use Dense Rank to ensure that each line has an equal number if the time_in_secs is the same: 


Step 3 - Compare Previous Timestamp

Using the newly created Order field, we can now compare the current and previous timestamps. To do this we first need to create a new clean step where we calculated the next line based on the Order field: 

Next Line
[Order] - 1 

At this stage we want to remove all of the additional fields as well. We should have 3 fields remaining - Episode, Time_in_secs, and Next Line. 

Now we have identified the next line, we can join these together using an inner join on Episode = Episode and Order = Next Line.


Now that we have the current and next line on the same line, we can calculate the difference between the two timestamps:

Duration  
ZN([time_in_secs-1])-[time_in_secs]

At this stage our data should look like this:


Step 4 - Parse Names

Some of the character names have been separated with a comma, so we want to split these out and then reshape the data so that we have a row per character.

First we split the names at each comma using the custom split: 


This creates 4 new fields that contain different character names. 

To bring these into a single column and a row for each character, we can use a Columns to Rows pivot. In the setup of the pivot, we want to use the Wildcard Pivot to bring in all of the fields that contain the word 'Split'.


Then from here we can remove the Pivot Names field, then exclude any blank ('') values from the Name Split field. At this stage our table should look like this: 


Step 5 - Filter & Deduplicate

The final steps of this challenge is to keep only the rows that are classified as 'Gameplay' in the section field, then we want to ensure that now rows are duplicated. To deduplicate the table we want to use an aggregate field and bring all of the fields into the Grouped Fields section with nothing in the aggregated fields section.

After the aggregation we are ready to output our data ready to create the visualisation.


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData#GamesNightViz and #DataFamCon and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1@WJSutton12@StyleSTEAMed & @NicoleKlassen12

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text