2020: Week 14 - Solution

April 07, 2020

Solution

Our solution for this week can be downloaded from the Preppin' Data Tableau Forum.

This week was all about getting data into the right structure so that we can produce certain types of charts. One of these charts is a Radar or Spider chart, so what better data to use than measurements relating to Spider characteristics!

Step 1 - Clean Fields

Our first task is to clean any fields that have don't have a measurement, or if they contain unwanted characters such as '*' or '_'.

To do this we first need a unique row number for each row. We can use the Row_Number function within a calculation. This looks like the following:

Row Number

{ ORDERBY [Species] ASC : ROW_NUMBER()}

As each of our measurements need to be cleaned, instead of doing this multiple times, we can pivot our data so that it is all in one column. Our pivot is set up like this:

After the pivot we have renamed the 'Pivot Names' to 'Trait' and 'Pivot Values' to 'Measurement'

Now we have all our measurements in one column, we can now start to filter and clean the results. We can do this with the following steps:

1. Convert 'Measurement' to a number (decimal). This converts any fields with non-numerical characters to Nulls.

2. Remove Nulls. This leaves just the numbers, so our measurement field is now clean.

3. Replace '_' with a space. You do this by using the following calculations:

Trait

Replace([Trait],'_',' ')

Species

Replace([Species],'_',' ')

Our final cleaning step is to determine whether each species has a value in each measurement. To do this we can use the Row Number field that we created earlier along with a Fixed LOD. The calculation will be:

Number of Traits Recorded

{ FIXED [Row Number] : COUNTD([Trait]) }

Here we are saying, for each Row Number, count how many species there are. After the pivot, each of the row numbers will have been duplicated, depending on how many measurements had a value. Therefore, we can use a Selection Filter to Keep Only the rows with 6 traits.

Step 2 - Aggregate & Normalise

To make our analysis easier, we only want to consider species where there were measurements for at least 10 different specimens. We are again going to use a Fixed LOD to solve this part of the challenge. Similar to before, we are going to use row number and species but this time for each Species we want to Count Distinct the Row Number. The calculation will look like this:

Num of Species

{ FIXED [Species] : COUNTD([Row Number])}

Note that this is an alternative way of writing our fixed LOD analytical calculation.

We can then use the Range of Values Filter to return values that are greater than or equal to 10.

Now we have filtered to show only the species with at least 10 specimens recorded, we want to find the average measurement for each species. We use an aggregation tool to calculate this with the following setup: