2024: Week 18 - San Diego Zoo TC24 Special

Challenge by: Jenny Martin

Welcome to the Tableau Conference Special for Preppin' Data. If it's your first time here, then a very warm welcome to you! This week we have something pretty special planned for you, with a 2 stage challenge!

Seeing as Tableau Conference is in San Diego this year, we thought we'd take a trip to the world famous San Diego Zoo! They've got lots of data on the animals and plants that live at the zoo, but it's spread across multiple data sources. That's where they need our data prep skills to help them out!

Stage One

For the first stage, we need to bring together information about the animals and plants at the zoo, with their endangered status and wildlife class. This will help prioritize care for the wildlife in Stage Two of the challenge.

Inputs

All the data required for this challenge is contained within a single Excel file with multiple sheets. For Stage One, you will need 3 tables:
  1. Wildlife - a table listing all the animals and plants at the Zoo 

  2. Animal Details - a table containing further details about the animals at the Zoo 

  3. Plant Details - a table containing further details about the plants at the Zoo 

Requirements

  • Input the data
  • Bring the Animal & Plant Details Tables together to form one table (hint)
    • There's been an error in the Plant Details table where the Status field has been incorrectly spelt as Stattus. Be sure to make sure these 2 fields align and have the correct spelling for their field name
    • Make sure the Animal and Plant fields from their respective tables are also aligned. Call this new field Name
  • We only wish to keep the Name, Table Names, Status and Class fields for our analysis
  • Use the Table Names field to extract the information of whether each row of data is an Animal or a Plant. Call this new field Wildlife (hint)
  • Many of the Class names have the Latin name followed by the English name in brackets. Update the Class field so that it contains the English name, where possible (hint)
    • e.g. Aves (Birds) should become Birds
  • Combine this data with the Wildlife Table (hint)
    • All animals and plants should exist in both tables so we're expecting to still have 326 rows
    • There's an issue that needs to be fixed to achieve this. Some Names in the Wildlife Table have an error where the apostrophe's (') have been encoded as '
      • e.g. Bird's-nest Fern should be Bird's-nest Fern
  • Remove any unnecessary fields
  • Output the data

Stage One Output


  • 6 fields
    • Wildlife
    • Name
    • Status
    • Habitat
    • Class
    • Region
  • 326 rows (327 including headers)
You can view the output here.

Stage Two

That's fantastic to have all that information in one table now. It's much easier to work with and find answers to questions you may have about the wildlife at the zoo! Now we're looking to take things a step further. 

As you can tell from Stage One, the wildlife comes to the zoo from all over the world and some of the wildlife are endangered species. Therefore staff want to make sure they're looking after the wildlife as best they can. 

To do this they need to consider the natural habitats of the wildlife and how that differs from San Diego's climate. They particularly want to make sure the most endangered wildlife is in the best environment possible, so we're going to create them a priority order for reviewing the wildlife exhibits.

Inputs

You'll need 3 additional tables for Stage Two (plus the output from Stage One of course) :
  1. Stage One Output (as above)
  2. San Diego Climate - monthly average temperatures in San Diego 

  3. Habitats (estimates) - the climates of the wildlife habitats 

  4. Care Priority - a lookup table to classify the priority order of the wildlife status 

Requirements

  • Reduce the San Diego Climate Table to a single row which describes the temperature range in San Diego. You may choose to work in either Fahrenheit or Celsius! (hint)
  • Append this data onto the Stage One Output (hint)
  • Some Wildlife can exist in multiple Habitats. In our dataset these different habitats are separated by commas. Transform the data so that we have a row per Habitat for each Wildlife Species (hint)
  • From the Habitats (estimates) Table, extract the information about the temperature ranges for each Habitat
  • Combine the tables.
  • Reduce the dataset down to a single row per Wildlife Species, with the minimum and maximum temperatures they live in in the wild
  • Create 2 calculated fields:
    1. Habitat Notes - classifies whether the animal is within its natural temperature range or not
      • Ideal = within temperature range of habitat
      • Above = San Diego is sometimes hotter than its habitat
      • Below = San Diego is sometimes colder than its habitat
    2. Degrees outside Ideal - if the wildlife is not in their ideal temperature range, the number of degrees San Diego can be above or below the ideal temperature range
  • Filter the data to only those who are not in their ideal temperature range
  • Staff need to check all these exhibits to make sure the wildlife are being properly cared for. They need a plan for what order they should check the exhibits in. It's decided that the most endangered wildlife should be checked first, in order of how far they are from their ideal temperature range. To create this ranking:
    • Bring in the Care Priority lookup table
      • Be careful not to lose any wildlife
      • If a wildlife species doesn't have a Status, make their Priority number 6
    • Take the absolute value of the Degrees outside Ideal field
    • Create the ranking, as described above, called Priority Order
      • If a wildlife species doesn't have a Status, make sure the Priority Order is also null
  • Remove unnecessary fields
  • Output the data

Stage Two Output


  • 9 fields
    • Wildlife
    • Name
    • Priority Order
    • Habitat Notes
    • Habitat
    • Degrees outside Ideal
    • Status
    • Class
    • Region
  • 80 rows (81 including headers)
You can view the output here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text