2021: Week 25 - Solution
Solution by Tom Prowse and you can download the workflow here.
This week's challenge was a bit of a big one! Jenny took the inspiration from a YouTube video about how to categorise the least favourite Pokémon... and thought it would make a great Preppin' Data challenge. Let's look how to solve it.
Step 1 - Table Clean Up
The first daunting part of this challenge is all of the different inputs. We have 9 in total so that's lots of different tables that we'll need to clean and combine to make our final output.
Gen 1
First we'll start by bringing in the Gen 1 table and then removing any null values and keeping only the '#' and 'Name' fields so the table looks like this:
Evolution Group
Next we want to focus on the Evolution Group table, where we want to convert the '#' field to a number. We can't just change the data type as there are some leading spaces, therefore we can use the 'Trim' function to remove these, and then change to a Whole Number.
Once we have the '#' as a number, we need to filter the data to exclude any Starter or Legendary Pokémon. We can do this by using an Exclude filter and then removing the Starter? & Legendary? fields so our table now looks like this:
Evolutions
Finally the last table that we need to clean is the Evolutions one. All we are going to do with this one is remove all fields until we are left with Evolving From & Evolving To.
We are now ready to starting joining the tables to understand the evolutions.
Step 2 - Join Evolutions
Our join conditions this time look like this:
Evolving From
Evolving To
Then finally we want to stack these results on top of one another by using the union tool, then removing all fields apart from the Evolution Group & Name:
Our workflow now looks like this:
The final part of this step is to bring back all of the original Pokémon that have been left out of the joins. To do this we can use another join, but this time an outer left which will only bring back the names of Pokémon who aren't involved in an evolution group. We are joining the union to the Gen 1/Evolution Group join with the following conditions:
Our table should look like this:
Step 3 - Combine Remaining Tables
- Mega Evolutions
- Alolan
- Galarian
- Gigantamax
We don't mind what table each of the Pokémon come from as we are going to exclude them from our results, however we do need to parse the Name from the Name field as this current includes the table name as well.
We can do this by using an automatic split and then removing all of the other fields that don't contain the Pokémon name. We should now have a clean list of names of Pokémon that we want to exclude:
To remove these from our original workflow, we can use a couple of joins that act as a filter. First, we want to use an inner join to identify which Pokémon are in both tables:
Then we can use a second join to remove these from our original workflow. The second is an outer join which will only keep the records that aren't matched in the join:
As a result we are left with a filter list of Pokémon:
The final two data sets that we need to include are the Unattainable Pokémon in Sword & Shield and the Anime appearances for Pokémon.
First we'll start with the Unattainable Pokémon in Sword & Shield table. Again we are going to combine these tables with a couple of joins to identify the correct Pokémon to keep from the most recent games.
The first join is on Name
Then the second is to include the Evolution Group:
As a result we have now got down to 21 rows which should look something like this:
Finally we want to include the last data input, Anime Appearances. Again, we are going to use another join (it's the last one, I promise!!) so that we can compare what Pokémon are in both tables. The join condition is Name = Pokémon but this 'explodes' our data so we have over 10x as many rows:
The rows have increased massively here, as the Pokémon appear in multiple episodes meaning that there will be a row for each episode and Pokémon combination.
Step 4 - Rank Pokémon
The final step this week is to rank the Pokémon based on how many episodes they have appeared in.
First we need to use an aggregation tool to find the number of episodes each of the Pokémon has appeared in:
Then finally we can rank Evolution Groups by using a Rank calculation:
Then we are done! The output should look like this: