Posts

2020: Week 5

Image
Rank in Tableau Prep still isn't as easy as it could be. The Prep Development team have been hard at work and will come up with a solution soon so it's the final chance to try to solve how to do a Standard Competition rank in Prep.
What is Standard Competition ranking?  This form of Rank is actually the default rank within Tableau Desktop so to be able to mimic it is important. Jonathan showed you how to complete the Modified Competition rank in the Solution post to week 3. Let's look at the differences where the values are 10, 9, 9, 8. 
Standard Competition: 1, 2, 2, 4 Modified Competition: 1, 3, 3, 4
For me, Standard Competition makes a lot more sense as the user (maybe I've been using Tableau for too long) so I really wanted to achieve this in Prep. The challenge This week's challenge leads up to one of my favourite sporting tournaments of the year - the Six Nations. For those who don't know, this tournament is where England, France, Wales, Scotland, Ireland…

How to...Connect to a Database

Image
Using data files will be a significant part of most users data preparation but connecting to databases should not be ignored. Most organisations have built up significant data assets and the majority of this data will be held within databases.
What is a database? Databases, data warehouses and data lakes will all be common terms to most people working close to those using data in their jobs everyday but not everyone will be familiar about the differences.  Database - a piece of software that resides on a computer (often a server) that specialises in ingesting, storing and providing data to other tools. The database is likely to be split up in to different objects namely tables and views, therefore, the data needs to be well structured.Data Warehouse - a collection of databases, or particularly large databases, are commonly called a data warehouse. By storing multiple databases together, it becomes beneficial as resources like memory can be shared between the servers. Data Lake - a newe…

How to... Choose an Output

Image
Tableau Prep is built primarily for preparing data for visual analysis in Tableau Desktop. This inevitably means Tableau has designed the tool to be very easy to output the data when it is ready to Desktop. The level of simplicity might mean that you miss the optimal output type for the purpose you are using Prep for though. 
What output types are there?
Within Prep there are four main output types to consider. Each has their own main reason why you would use them so let's explore each in turn:
File Types - Hyper Tableau's new form of extract made lots of data work faster; in some cases, a lot faster! Hyper files came in to Tableau Desktop and Server in version 10.5. Opening any data extract in Tableau automatically updated the extract to a Hyper format. 
If you use the Tableau tools in an version newer than 10.5 (the last version before the naming changed to 'Year.Version' (ie 2018.1) then outputting to a Hyper file is a safe bet that the data is in the best file forma…

How to...Handle Free Text

Image
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields.
What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee.
The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in the ind…

How to...Filter

Image
One of the most important factors when cleaning data is deciding whether the data:
Can be cleaned up?Should be ignored?Has to be removed? As soon as you decide on the latter option, then you need to filter out of your dataset. This sounds like a very easy decision to make but shouldn't be that easy, especially if you are preparing data for others to use. Being certain that you, and the data sets user, doesn't need this data going forward is a difficult challenge. Unless you are certain it isn't needed, don't remove the data unless it is the last step in the process before publishing for the following reasons: Losing context - does that data help the user provide context to other data points?Messy but manageable - Just because the data might be hard to tidy up, could it still be of valueBusiness logic - by your user having different business experience, does the data suddenly have meaning? So let's get back to basics first to explore what is a filter through to where …

2020: Week 4

Image
Duplicates, duplicates, duplicates. In large datasets they are very difficult to identify and remove. They may occur due to data errors, data load issues or simply poor process design. Well the latter has struck at Chin & Beard Suds Co.

When surveying our customers' opinions, we didn't take in to account that some customers would share their opinions multiple times. To get an accurate representation of our customers' opinions and form different versions of the firm's Net Promoter Score (NPS) to allow us to analyse:

A customer's first impression NPS scoreThe customer's most up-to-date NPS score To do this, we need you to clean and de-duplicate the data. The aim of the output is to be flexible to answer questions on NPS scores at not just the First or Latest but at the Country or Store Level too. Don't worry, we can leave those calculations for our analysts - they just need clean data to work with.
Requirements

Input the dataChange the Question Number for t…

2020: Week 3 Solution

Image
You can see our full workflow solution below and download it here!


There are 3 main portions to this solution:

1. Preparing the data.
2. Calculating each of the new fields.
3. Joining all the new fields together.

In order to get to the point where we begin to figure out each of the 9 new fields we need to properly filter, shape, join, and prepare the data. You can see how our flow is broken into these three groups.


Preparing the Data 1. Import the game data.Use a wildcard import which includes sheets with a matching pattern of "*Results".In the import, untick everything that we do't actually need. We only need to keep:[Visitor/Neutral] (renamed to [Visiting Team])[Home/Neutral] (renamed to [Home Team])[Date][PTS] (renamed to [Visitor PTS])[PTS 1] (renamed to [Home PTS]). 2. Remove games that haven't been played yet. As January isn't yet over, there are a lot of fixtures without any results. We can remove these just by right-clicking on the "Null" bar in…