2020: Week 14 - Solution



Solution 




Our solution for this week can be downloaded from the Preppin' Data Tableau Forum.

This week was all about getting data into the right structure so that we can produce certain types of charts. One of these charts is a Radar or Spider chart, so what better data to use than measurements relating to Spider characteristics!

Step 1 - Clean Fields

Our first task is to clean any fields that have don't have a measurement, or if they contain unwanted characters such as '*' or '_'.

To do this we first need a unique row number for each row. We can use the Row_Number function within a calculation. This looks like the following: 

Row Number
{ ORDERBY  [Species] ASC : ROW_NUMBER()}

As each of our measurements need to be cleaned, instead of doing this multiple times, we can pivot our data so that it is all in one column. Our pivot is set up like this: 


After the pivot we have renamed the 'Pivot Names' to 'Trait' and 'Pivot Values' to 'Measurement'

Now we have all our measurements in one column, we can now start to filter and clean the results. We can do this with the following steps: 

1. Convert 'Measurement' to a number (decimal). This converts any fields with non-numerical characters to Nulls. 
2. Remove Nulls. This leaves just the numbers, so our measurement field is now clean.
3. Replace '_' with a space. You do this by using the following calculations: 

Trait
Replace([Trait],'_',' ')

Species
Replace([Species],'_',' ')


Our final cleaning step is to determine whether each species has a value in each measurement. To do this we can use the Row Number field that we created earlier along with a Fixed LOD. The calculation will be:

Number of Traits Recorded
{ FIXED [Row Number] : COUNTD([Trait]) }

Here we are saying, for each Row Number, count how many species there are. After the pivot, each of the row numbers will have been duplicated, depending on how many measurements had a value. Therefore, we can use a Selection Filter to Keep Only the rows with 6 traits. 

Step 2 - Aggregate & Normalise

To make our analysis easier, we only want to consider species where there were measurements for at least 10 different specimens. We are again going to use a Fixed LOD to solve this part of the challenge. Similar to before, we are going to use row number and species but this time for each Species we want to Count Distinct the Row Number. The calculation will look like this: 

Num of Species
{ FIXED [Species] : COUNTD([Row Number])}

Note that this is an alternative way of writing our fixed LOD analytical calculation.

We can then use the Range of Values Filter to return values that are greater than or equal to 10. 

Now we have filtered to show only the species with at least 10 specimens recorded, we want to find the average measurement for each species. We use an aggregation tool to calculate this with the following setup:


The final task for this week's challenge is to normalise our measurements. We need to calculate the following various parts:

1. Max Value. We calculate this using the following Fixed LOD: 



2. Min Value. We calculate this in the same way as the maximum, but instead find the minimum: 


Now we have all the parts of the calculation we can normalise the values by using the following calculation: 

Normalised Value
([Measurement]-[Min Value per Trait])
/
([Max Value per Trait]-[Min Value per Trait])


We have now ready for the output, and have completed Week 14! 

Our output:


Make sure to fill in the participation tracker, and post you solutions onto our Tableau Forums community page so that we can compare our workflows!

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text