2021: Week 10 - Solution

Solution by Tom Prowse and you can download our workflow here


This week we looked at the Pokémon evolution hierarchies and group these into the correct evolution groups. Although we focused on Pokémon this week, it could be transferrable to pretty much any hierarchical data so hopefully it will help you out for those challenges at work as well.

Step 1 - Filter Relevant Pokémon

From our Pokémon data set, we are first going to give it a bit of a clean and then filter out the Pokémon that we aren't going to look at in the challenge. Therefore the first step is to duplicate the '#' field and then trim any spaces from the duplicated field ('#-1').

We can then change the '#-1' field to a decimal number and use the following Wildcard Match filter to exclude any rows that contain a '.' from the '#' field. This is going to exclude any of the 'Mega' evolutions that we aren't interested in. 


We then need a second filter on the '#-1' field that we only keep numbers from 0-386 which is the number of Pokémon from the first 3 generations. 


The final step of this part is to make sure we only have one row for each Pokémon so we can use an aggregate tool with the following setup to achieve this: 


We are just grouping by all fields apart from the '#-1' and 'Type' fields, we also have nothing in the aggregated fields part as we are only removing duplicate rows.

Our data now looks like this: 


Step 2 - Combine Evolution Data

We can now turn our focus onto the evolution data set, and we again need to remove any unneeded Pokémon from that list. 

In order to combine the two lists and make sure we have matching Pokémon we are going to use a series of joins which will bring both data sets together.

Join 1 - Evolving to Info 


The first join is to join both data sets on Name and Evolving From. As we want to ensure that we keep all of our Pokémon from the original list we need to make sure we are doing a Left (or Right) join:
 

Join 2 - Filter Out Irrelevant 


We can now use another join to act as our filter and remove any Pokémon that we don't need from the Evolving list. This is a simple join on Evolving To = Name: 


Notice how this has removed some rows from our 'De-Dup Pokémon' which is our original list so now we have to go and ensure that we aren't leaving any out!

Join 3 - Evolving From 


We can now start to bring it all back together, first we are going to focus on the Evolving From field. We are going to take our filtered list of the Evolution table and join that back to our original Pokémon list (the aggregate tool). This will allow us to see where each of our Pokémon evolves from and we need to make sure we are including all of the Pokémon from our list as some are already the bottom of the hierarchy. 

We can join on Name = Name as we have already used the Evolves From field in the previous joins.



Join 4 - Evolving To 


Next we want to focus on where the Pokémon is going to evolve to, so this is the same as the previous step but we want to use the Evolving To field instead of Evolving From. 


Join 5 - First Evolution 


The last join that we need to do is to find out what the first evolution was for any Pokémon with 3 evolutions. Therefore we need to join back to the evolution list on the Evolving From and Evolving To (Evolution list) fields. 


And that's all of the joins that we need! Phew!

The final step to clean up the joins is to remove any duplicated rows, therefore we can use an aggregate tool again with all of the fields in the Group By to remove any duplicates: 


Our data now looks like this: 


Step 3 - Evolution Groups

The final part of the challenge is to create the evolution groups for each Pokémon. To do this we want to use the following calculation: 

Evolution Group 
IF ISNULL([Evolving from])
THEN [Name]
ELSEIF NOT ISNULL([First Evolution])
THEN [First Evolution]
ELSE [Evolving from]
END

This allows us to rename the group after the first Pokémon at the bottom of the hierarchy. 

Our data should now look like this and the challenge is complete! 


You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text