2021: Week 20 Controlling Complaints
Challenge by: Carl Allchin
This week's challenge continues the focus on calculations, this time the focus is numbers. When using measures in data it is very easy to make mistakes if you don't check the realistic nature of the values, especially when entering data or forming calculations. By creating your calculations in your data preparation tool, you might be saving the users of your data set a lot of work and reducing the skills required to use the data.
Challenge
Control charts are a really useful way to visualise data but in Tableau Desktop they often require a few Table Calculations putting people off creating them. This week you will be building the calculations you need to build a control chart without using table calculations in Desktop.
Different people like to use differing numbers of standard deviations to assess whether the data point falls outside of the normal levels of distribution or not. Some people class normal distribution as:
- 1 standard deviation either side of the mean - in a normal distribution this would cover 34.1% of the data either side of the mean. So 1 standard deviation either side of the mean would cover 68.2% of your data in a normal distribution so start to show interesting outliers.
- 2 standard deviations either side of the mean - this would add another 13.6% of you data either side of your mean so two standard deviations either side of the mean would cover 95.4% in a normal distribution.
- 3 standard deviations either side of the mean - this is the version that is used in the six sigma process improvement theory founded in manufacturing but has progressed into other industries too.
This challenge will ask you to create 3 main calculations after some initiation data preparation that will help you form the control chart:
- Mean - the value that shows the average of all the data points being assessed. This can be broken down into different partitions in the data, often by dates or dimensions.
- Upper Control Limit - this value is 1, 2 or 3 standard deviations above the mean.
- Lower Control Limit - this value is 1, 2 or 3 standard deviations below the mean.
Measuring the difference between the Upper and Lower Control Limits (in this challenge called the Variation) is a demonstration of how much control there is in your process you are measuring. The smaller the variation, the more a process is controlled.
With all delays in Prep Airs projects recently, we need to focus on the complaints that might be generated by the delays.
Input
Requirements
- Input the data file
- Create the mean and standard deviation for each Week
- Create the following calculations for each of 1, 2 and 3 standard deviations:
- The Upper Control Limit (mean+(n*standard deviation))
- The Lower Control Limit (mean-(n*standard deviation))
- Variation (Upper Control Limit - Lower Control Limit)
- Join the original data set back on to these results
- Assess whether each of the complaint values for each Department, Week and Date is within or outside of the control limits
- Output only Outliers
- Produce a separate output worksheet (or csv) for 1, 2 or 3 standard deviations and remove the irrelevant fields for that output.
Output
1 file containing 3 worksheets (if you are using an older version of Prep of another tool, please feel free to output CSVs instead).
Each worksheet should contain:
10 fields
- Variation
- Outlier
- Lower Control Limit
- Upper Control Limit
- Standard Deviation
- Mean
- Date
- Week
- Complaints
- Department
For each of the outputs here are the number of rows they should have:
- 1 Standard Deviation - 24 rows (25 including headers)
- 2 Standard Deviations - 5 rows (6 including headers)
- 3 Standard Deviations - 2 rows (3 including headers)
If you want to test the values you have produced, I've built the Control Chart using Table Calculations in Desktop and shared here:
Here's our full output for comparison. Remember we don't care about the order of the columns or rows as Tableau Desktop will import them in whatever order they come in!
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!