2024: Week 4 - Solution



Solution by Tom Prowse and you can download the workflow here.



This week we are going to focus on the join functionality and how we can use the conditions so that they act as a filter within our analysis. Again, this is adding to our fundamental knowledge bank of examples that are key to getting started with data preparation.

Step 1 - Input Flow Card Data

First we want to input all of our data related to whether someone has a flow card or not. To do this we can input one of the sheets that includes the flow card information. From here we can create a wildcard union within the input step so that we include the 3 sheets containing information about the flow cards. 

To do this we need to set a worksheet filter so that we only include the relevant sheets (exclude the Seat Plan). Within the worksheet filter we can set it to include sheets where the pattern matches "*Card*". By including the * this acts as a wildcard and will include any other characters before or after the word Card.

This is added when we select 'Union multiple tables' within the input step: 



As a result all three tables should be 'stacked' on top of each other into a single table.

Step 2 - Flow Card field

Next we want to identify where someone has a flow card or not. We can tell this depending on what sheet their name came from within the input. As a result of the wildcard union, a Table Names field is created automatically by Tableau Prep. We can use this field to identify which table each name came from and therefore if they have a flow card or not.

To clean up the Table Names field there are multiple ways around it, but in this case I have used an IF statement:

Flow Card?
IF CONTAINS([Table Names],'Non') THEN 'Non-Flow Card'
ELSE 'Flow Card'
END

Alternatively you could use the clean functionality within Tableau Prep to remove the numbers and punctuation from the values.

Once we have the Flow Card? field we can aggregate the data so that we know the totals for each seat, row, class, and flow card combination. Within the aggregate step we want to group by Flow Card?, Seat, Row, and Class then Sum the Number of Rows field: 



At this stage the data should look like this: 



Step 3 - Filtering with a Join

We now want to keep only the seats that haven't been booked on the plane. These are the seats that do not match our current customer list (the flow card data table). In order to only return the values that do not match, we need to utilise a join where we only return the values from the Seat Plan which is known as an outer join. 

To create the join we first need to ensure that the conditions match. In this case we want to join on the fields Seat, Class, and Row from both tables. Then we can set up the outer join by clicking on the venn diagram until only the outside segment of the Seat Plan data is selected. Depending on which table you joined first will determine whether it is an outer right or outer left join, in our case it's an outer right join: 



If selected correctly then you should see the number of rows that has been excluded from the view and we should have 7 rows remaining.

Finally we can remove any additional values and then rename the fields so they match the output. Our final output should look like this: 


You can download the outputs from here. If you want to check your results. 

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text