How to...deal with Numbers

For many people, when you say “data”, they actually think about numbers. Numerical data sits at the heart of most of our analysis and therefore, being confident and comfortable with numbers is key to successfully preparing data for analysis and sharing. In this post we will cover what considerations you will need to make when using numerical data. For example, the format of your numbers will significantly change your analysis. 

What do we mean by numbers?

If only we had to think about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. These characters obviously form the basis to using numbers in data analysis but there is much more complexity than that. Numbers will often be the focus of your analysis, for example:
  • What is my total sales?
  • How many staff work in our organisation?
  • How many customers have we sold to?
  • What percent of suppliers have billed us?
We commonly call these numbers the ‘Measures’ of our data. Our analysis will typically be comparing these measures or breaking them down by categories to offer the measures more context. 

What type of numbers are there?

There are two types of numbers:
  • Integer - whole numbers (so no decimal places) ie 4 or 16874
  • Decimal / Float - numbers with decimal places ie 3.1415 or 0.31

Category or Measure?

But numerical data can often be used as a categorical data item themselves and not just the measure. Customer ID (identifiers) are a classic example of numerical data that you would not want to simply aggregate and pretend that they are a measure. The average of a Customer ID field wouldn’t make much sense after all. 


Numerical fields can be both Categorical and Measurable numbers though. Age for example fits into both of these boxes:
  • Age as a Measure - Using average age to understand more demographical information is useful within analysis especially when designing the services to fit around products or marketing channels. 
  • Age as a Category - Looking at the number of purchases or average sales by the different ages of your customers can also be useful when understanding customers’ lifecycles. 

Aggregation

After determining the numerical data field is a measure, the next question an analyst will often ask themselves is, so how should we aggregate the values? Even in a simple one hundred row data set, just knowing each individual sales figure doesn’t really help us analyse the numbers. Therefore, finding the total, average, maximum or standard deviation is key in understanding the dataset and what story hides within it. 


The Business Intelligence (BI) tools are not the only place where you need to start considering this. During the data preparation stage of your work, you should begin to assess this. The “Big Data” challenge that adorned most data software’s keynote talks and marketing was often a reference to the volume of rows of data that needed to be analysed in the BI tools. Determining if you can aggregate that deeper level of granularity to a higher level can help make analysis faster. The data preparation stage is the perfect place to do this aggregation. 

Formatting Numbers

During the stages of data preparation, you don’t need to be too precise about the final formatting of the values in your dataset. There are a couple of variances too this that will save you writing excess calculations in the future though:
  • Currencies - Rounding values to two decimal places will give values that are easy and sensible to use in your analysis. But do avoid adding currency symbols to your numbers. If necessary specify the symbols in your column headers (often better to use the currency abbreviations like GBP or USD) as if you add the symbol to your column data you will convert this number into a string preventing aggregations like average or sum. 
  • Percentages - Depending on your analysis, you may or may not want to multiply your percentages by 100. When using software like Tableau Desktop, if you set the formatting as a % then you will want your 31% to actually be held as 0.31 in the data set. If you are not going to change the formatting in the tool then you may want to multiply the 0.31 by 100 to leave your number as a more logical 31. As per currencies, naming your data field the correct term will help clarify to the data source users how they should interpret and use the column.

What functions will help you master numerical data?

  • round() - round your values to a set number of decimal places
  • ceiling() or floor() - round your numbers up or down to the nearest integer.
  • abs() - absolute function returns only positive values ie abs(-7) becomes 7
  • zn() - zero if null will return a value of zero if the data field has a null response
  • sign() - returns -1 if the value is below zero, 0 if the value is zero or 1 is the value is positive

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text