2021: Week 25 - Solution

Solution by Tom Prowse and you can download the workflow here


This week's challenge was a bit of a big one! Jenny took the inspiration from a YouTube video about how to categorise the least favourite Pokémon... and thought it would make a great Preppin' Data challenge. Let's look how to solve it. 

Step 1 - Table Clean Up

The first daunting part of this challenge is all of the different inputs. We have 9 in total so that's lots of different tables that we'll need to clean and combine to make our final output. 

Gen 1
First we'll start by bringing in the Gen 1 table and then removing any null values and keeping only the '#' and 'Name' fields so the table looks like this: 


Evolution Group
Next we want to focus on the Evolution Group table, where we want to convert the '#' field to a number. We can't just change the data type as there are some leading spaces, therefore we can use the 'Trim' function to remove these, and then change to a Whole Number.  

Once we have the '#' as a number, we need to filter the data to exclude any Starter or Legendary Pokémon. We can do this by using an Exclude filter and then removing the Starter? & Legendary? fields so our table now looks like this: 


Evolutions
Finally the last table that we need to clean is the Evolutions one. All we are going to do with this one is remove all fields until we are left with Evolving From & Evolving To. 


We are now ready to starting joining the tables to understand the evolutions.

Step 2 - Join Evolutions

Next we need to join the three tables so that we can understand the evolution path. 

First we can simply join the Gen 1 table to the Evolution Group using '#' = '#' in the condition: 



Next we need to join the Gen 1 to the Evolutions table. This isn't as simple as joining the tables together with a single join, we need to separate out the Evolving To and Evolving From in separate right joins. This is how the joins have been set up: 

Evolving From 

Evolving To 

After the joins we have a few null values so we need to exclude these with a filter: 

ISNULL([Evolving From #])
OR
ISNULL([Evolving To #])

As a result our table now looks like this: 


And our workflow looks like this, with two branches for each of the different tables that we have joined so far


Finally we can do some similar joins to combine this into a single table. Again we need to split out the joins into Evolving To & From but this time we want these on separate branches because we want to keep all eventualities. 

Our join conditions this time look like this: 

Evolving From 


Evolving To 


Then finally we want to stack these results on top of one another by using the union tool, then removing all fields apart from the Evolution Group & Name: 


Our workflow now looks like this:


The final part of this step is to bring back all of the original Pokémon that have been left out of the joins. To do this we can use another join, but this time an outer left which will only bring back the names of Pokémon who aren't involved in an evolution group. We are joining the union to the Gen 1/Evolution Group join with the following conditions: 


Our table should look like this:


Step 3 - Combine Remaining Tables

The next step is to combine all of the remaining tables that have been provided for inputs. This includes the following tables:
  • Mega Evolutions
  • Alolan
  • Galarian
  • Gigantamax
All of these tables have the same structure (name field only), therefore we can input all of them and then union these together so they are in a single table.

We don't mind what table each of the Pokémon come from as we are going to exclude them from our results, however we do need to parse the Name from the Name field as this current includes the table name as well. 

We can do this by using an automatic split and then removing all of the other fields that don't contain the Pokémon name. We should now have a clean list of names of Pokémon that we want to exclude:


To remove these from our original workflow, we can use a couple of joins that act as a filter. First, we want to use an inner join to identify which Pokémon are in both tables: 


Then we can use a second join to remove these from our original workflow. The second is an outer join which will only keep the records that aren't matched in the join:


As a result we are left with a filter list of Pokémon:


The final two data sets that we need to include are the Unattainable Pokémon in Sword & Shield and the Anime appearances for Pokémon. 

First we'll start with the Unattainable Pokémon in Sword & Shield table. Again we are going to combine these tables with a couple of joins to identify the correct Pokémon to keep from the most recent games. 

The first join is on Name


Then the second is to include the Evolution Group: 


As a result we have now got down to 21 rows which should look something like this: 


Finally we want to include the last data input, Anime Appearances. Again, we are going to use another join (it's the last one, I promise!!) so that we can compare what Pokémon are in both tables. The join condition is Name = Pokémon but this 'explodes' our data so we have over 10x as many rows: 


The rows have increased massively here, as the Pokémon appear in multiple episodes meaning that there will be a row for each episode and Pokémon combination. 


Step 4 - Rank Pokémon

The final step this week is to rank the Pokémon based on how many episodes they have appeared in. 

First we need to use an aggregation tool to find the number of episodes each of the Pokémon has appeared in:


Then finally we can rank Evolution Groups by using a Rank calculation: 


Then we are done! The output should look like this: 


After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text