2020: Week 41 - Solution

 


Solution by Tom Prowse and you can download the workflow here


This week was Tableau Conference-ish so we had a special challenge related to Carl's Tableau Prep session! We looked at the comments, from each of the sessions in the different locations, and wanted to know who was commenting, from where they were commenting, and also how many questions & answers there were.

Step 1 - Input Files & Create Local Date

The first step is to input all of the files from each session. As all of the sessions have the same structure, we can use the Wildcard Union to input the three files within the same input tool.


In this case we don't need to use anything within the Matching Pattern, but it would be recommend to use a matching string in case any additional files are added in the future. 

Once we have all the files within the workflow, we can create a date time field to show the local date for each of the sessions. The session took place at 2pm on 7th October so we can use the following calculation to input this date:

Local Date 
MAKEDATETIME(
MAKEDATE(2020,10,7)
,DATEADD('hour',14,[Time])
)

This calculation can be split into three parts, starting from the bottom:
1, DATEADD - we are adding 14 hours to the original time field as the sessions started at 2pm.
2, MAKEDATE - the sessions took place on the 7th October 2020, so we use the Year, Month, Day structure.
3, MAKEDATETIME - to create the date time field correctly, we use the two calculations above to create a Date, Time field. 

Finally we want to identify which session participant attended, therefore we can use the Table Names field. All we need to do is remove the word Session from the field, this can be done in multiple ways, but we have use a custom split for this: 


This splits off the first field before the ' Session' string, therefore leaving us with just the session name (EMEA, APAC, AM). 

At this stage the table looks like this:


Step 2 - Clean Comments

In the next step we need to clean the comments so we can identify where everyone is based (1st comment) and if they are asking or answering a question (not 1st comment). To do this we want to use the ranking functionality to identify the order of each of the comments. 

Comment Order - Group by Location, rank Local Date ascending.


This identifies the order of each of the comments for each session. 

We can then use a similar technique to split this further and find the order for each individual:

Comment No. per Person - Group by Location & Who, rank Local Date ascending. 


Now we have the comment order for each person, we can start to filter the table so that we can identify where everyone is located. We want to make the following filters: 
  1. Comment Number - Exclude 1. This excludes Carl's first comment from each of the sessions. 
  2. Comment No. per Person - Keep Only 1. This only returns everyone's first comment, which is related to their location and if it is their first conference or not. 
Now we have isolated each person's first comment, we can now extract their location and if it is their first time. 

First we want to split the Comment field using a custom split on the full stop:


This will allow us to split the field into City/Country and whether it is their first time: 


At this stage we can create the First Time Flag from the Comment - Split 2 field. There are various ways of doing this, including a Contains calculation. However we are just going to double click on the value within the Profile Pane, and rename the field from there: 



Making these changes manually is easy, but it would be better to create a calculation to pick up any other comment types in the future.

Next, we want to focus on the Comment - Split 1 field which will become our City & Country. First we can use the Automatic Split function to into two fields (City and Country). 

Our table should now look like this: 



The last step to clean the comments, is to use the correct data role to identify the locations. For this we need to split off the AM session as this includes State information as well. 

Within a new clean step we are going to Exclude AM from the sessions, then use the Data Roles to select Country and City for each of the fields. Tableau Prep recognises that these are locations therefore gives you a recommendation to update these roles: 


When these roles have been updated, you will notice that there are some fields that are recognised and have a '!' next to them. We want to remove these values: 


For the AM section, we want to use another new clean step from before we split the EMEA/APAC regions and Keep Only the AM sessions. 

For AM, we first need to classify the country field. To do this, we need to identify where participants have entered a two letter State, then change this to United States. We can do that using this calculation:  

Country 
IF LEN([Country]) = 2 THEN 'United States' 
ELSE [Country]
END

We can then update the data roles for both City and Country then remove any which aren't successfully identified. 

After completing the cleaning for both EMEA/APAC and AM, we can then bring both of these tables back together via a Union.

The table now looks like this: 


Step 3 - Convert Time

Finally we just need to transform the Local Date in GMT. To do this we use the following calculation with the time differences given within the challenge post:

Date (GMT) 
IF [Location] = 'AM' 
THEN DATEADD('hour',5,[Local Date])

ELSEIF
[Location] = 'APAC' 
THEN DATEADD('hour',-11,[Local Date])

ELSEIF [Location] = 'EMEA' 
THEN DATEADD('hour',-1,[Local Date])
END

After this calculation we are ready to output our first output:


Step 4 - Questions & Answers

We can now turn our attention to the 2nd output which is based around who asked what questions & answers. First we need to take another branch from the Comment step and then exclude '1' from the Comment No. per Person field: 


We can now classify whether the comment was a question or an answer. We have assumed that all questions will have a '?', therefore we can use the following calculation: 

Question or Answer 
IF CONTAINS([Comment],'?') 
THEN 'Question'
ELSE 'Answer'
END

The final step is to total the questions and answers by using an aggregation tool: 


After the aggregation we are now ready to output our data which looks like this: 


Both of the outputs can be downloaded in full here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14@JonathanAllenby & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text