2020: Week 38 - Solution



Solution by Tom Prowse and you can download our solution here


The challenge this week was to see if each of the given word pairs were anagrams of each other or not. There were lots of different solutions on Twitter, some long some short and it was great to see everyone learning different techniques from each other!

Step 1 - Split Words

The first task that we want to achieve is to split the words so that each of the letters are on a single row. This isn't something that is native within Tableau Prep so there are a few steps which we will need to cover first. 

To start we want to give each of our word pairs a row ID. The order doesn't matter here therefore we can use the Rank feature to give each row a unique ID: 

Row ID


After the rank we now have a row ID for each row: 


At this stage we are going to split the workflow into two separate branches, one for each of the words. These are identical so we will cover the Word 1 branch and this can be repeated but for Word 2. 

The first task, for this branch, is to make all the letters lowercase. We can use the Clean functionality within Tableau Prep to do this: 


We then want to find out how many characters are in each of the words. To do this we use the LEN function: 

Length
LEN([Word 1])

Our table now looks like this: 


Now we have the length of the strings we can now join the Scaffold data set. We are going to join where the Length field is greater than or equal to the Scaffold, so we will get an extra row depending on how many letters each word has: 


With the additional rows, we now want to use the MID function to identify each of the different letters depending on their position within the word. 

Word 1

MID([Word 1],[Scaffold],1)

This will effectively split each of the letters onto an individual row, so our table now looks like this: 


Step 2 - Total Letters per Word

The next step is to identify how many of each character each of the words have. Before we do this, we want to duplicate the Word 1 field as we will need to use this twice within the next aggregation step, In the aggregation we want to Group By Row ID & Word 1, then Count Word 1: 


After the aggregation step, we now know how many letters are in each of the different words of each row: 


This is the end of the Word 1 branch, therefore you will need to repeat these steps but this time for Word 2. 

Once you have completed this for both words, it's time to join these two tables together. For this join we want to use a Full Join, this means that we are going to include everything from both tables, even if there are no matches. The join setup looks like this: 


Then as a result of the join we have the following table: 


Step 3 - Is it an Anagram?

Now we have split each of the characters into a row for each of the words and joined them back together, it's time to identify whether each of the words is an anagram or not. 

First, we want to identify whether we have any null values in our Row ID-1 field. If this field is null then we want to give it a 1, else 0:

Row ID-1 Null

IF ISNULL([Row ID-1]) THEN 1 

ELSE 0 

END

We can then use this field to calculate how many rows are null by using an aggregation tool with the following setup:


By identifying how many nulls we have, this is helping us to identify whether or not the two words were anagrams. If there were no null values then this means that all of the letters match up, and they are anagrams. If there are nulls, then this is how many letters don't match between word 1 & 2. 

Anagram?
IF [RowID-1 Null]>0 THEN 'No'
ELSE 'Yes'
END

We have now identified whether each of the words is an anagram or not so our table looks like this:


The final step this week is to bring back all of the original words from the Anagrams data set, and join these on our table using the Row ID: 


Lastly we need to remove any unwanted fields so that our table is in the format of our desired output: 


After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14, @JonathanAllenby & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text