Posts

Showing posts from February, 2020

How to... deal with Nulls

Image
Nulls, or the absence of data, are a fickle challenge within data preparation. Experienced data preppers will almost instinctively know how to deal with them, or at least manage the challenges that come with a dataset that contains nulls. Newer data preppers, often do not have the same set of use cases to draw on to know how to handle the null fields. Therefore, this post is looking to share the considerations you should make when working with a dataset with a null in. What is a null? A null is the absence of a value in a data field within a dataset. The absence of data is very different to a zero, a new row or a space. These are all values that although look similar to the absence of the data, they are actually a value of some kind. Nulls appear in datasets for many reasons including: The result of mismatched fields in a Union Mismatched fields in a Left, Right or Full Outer Join No original data entry for that record but other data points for that record existing (ie other f

2020: Week 9

Image
This week, things get a little more political. No Chin & Beard Suds Co don't have a political opinion in the US Elections but we do want to monitor what is going on. So let's look at how the 2020 Democratic Presidential Nomination is developing. Using data from RealClearPolitics.com , we are also making it easier for our Tableau Desktop users to follow the results by forming the dataset into a Bump Chart. Normally this would involve Table Calculations in Tableau Desktop but we want to prepare the Dataset so it makes it a lot easier for anyone to build this chart type and complete the ranking for them (this can be done in Prep 2020.1 or a longer work around in previous Prep versions - I built it pre-2020). Link to the viz Requirements Input data Remove the Average Record for the polls Clean up your Dates Remove any Null Poll Results Form a Rank (modified competition) of the candidates per Poll based on their results Determine the spread for each poll fro

2020: Week 8 Solution

Image
This week we focused on the qualities and functionalities that Tableau Prep has to offer with a 'clicks only' challenge, where no typed calculations were allowed. Throughout this challenge we are going to be using unions, joins, pivots and aggregations, so it will be a great introduction to some of Tableau Prep features. For the challenge we are going to focus on pulling together multiple weeks worth of data, from different spreadsheets and turning this into a useful datasource. It was split into two outputs: Week's records where profit expectations have been exceeded.  Week's records where budgets have not been reached Output 1 - Profits Input Data The first task is to input the correct data so that we can start to solve the challenge. This week we are given multiple worksheets within a single Excel document, however for the first part of this challenge we only want to focus on the Weekly Sales data. Each of these sheets have the same naming conven

How to... breakdown Complex Data Preparation Challenges

Image
In previous posts, techniques for determining the alterations required for preparing a dataset for analysis have been discussed albeit at relatively simple level. What about those situations where the challenge isn't straight forward, how to approach the challenge then? This post will cover this exact scenario by taking on one of the most complicated challenges Preppin' Data has covered to date - 2020 Week 3 . The challenge involves taking NBA results and forming the full rankings of Wins vs Loses, Records vs other teams in the same conference and even winning streaks. Where to begin? The difference between writing a Tweet and an Essay is a plan (unless you are very serious about your tweets!). The same likeness can be made between a simple piece of data preparation and a challenge that has multiple facets and it may even include some challenges that you are not even sure how you will solve them. Creating a plan in these situations is key as without one, you are likely

How to... Create Additional Data

Image
The challenge with data preparation is often about cleaning and removing columns or rows of data, but that isn't always the necessary action to take. When preparing datasets for analysis, there is often the need to create additional data through certain techniques. What situations require data creation? The aim of the data preparation is to reduce the level of manipulation required when analysing the data. Therefore, any effort that can be made to simplify the 'data work' during analysis is one worth taking in the data preparation stage. Any column or row (record) that is not in the dataset, but should be for the analysis, needs to be created. There are multiple techniques for this purpose but firstly, let's explore where you shouldn't be creating extra columns of rows: Dynamic Calculations in Desktop Tableau Desktop has been designed to allow the user to conduct analysis at the speed-of-thought. Part of this design are the two calculations that are desig

How to... decide where to Store your Data?

Image
One of the key considerations in data preparation is where to hold the Output. After all, what is the point of doing all that hard work if you then put the data somewhere that is: Inaccessible to someone who needs to use the data Is somewhere slow / unresponsive  May run the risk of eradicating the source data if overwriting other data incorrectly Let's consider each of these scenarios in turn to determine what should be considered when writing your Outputs to a location. Inaccessibility Data Openness vs Data Security is a consideration to make and take seriously. With more data legislation coming in across the world as the general public learns the value of their own data, and what effect their data can have on their lives, the more restrictions should not be taken lightly. At the same time, whilst not breaking any rules / legislation, giving people freedom to work with data will lead to future innovation and better, more efficient decisions. So how can this balance

How to...complete Advanced Joins

Image
Joining is one of the more powerful techniques for completing your data preparation for analysis. Adding in an additional dataset to what has been available before can either help give more context as to why something is happening or, answer more granular questions. In the previous post on How to... Join Datasets together , we covered Join Types and Join Conditions to get to the data you want in a straight forward way. However, there are many use cases of joins where sadly, the logic is more complex and that is what this post will cover. Multiple Join Conditions Frequently the way to link two datasets will be more complicated than a simple single join condition. Therefore, being comfortable adding multiple Join Conditions and understanding the impact of this is key. The key piece of information to understand about Multiple Join Conditions is that they behave as an AND rather than an OR statement. This means that if you have Multiple Join Conditions, all of them need to be met for t