Preppin' Data

Posts

Showing posts from April, 2019

2019: Week 11 Solution

April 30, 2019

The full solution flow can be seen below and downloaded here . The Full Workflow This week’s challenge served to showcase Tableau Prep’s versatility and prove that it can be used to fairly easily parse JSON data. Most of the steps required to complete the challenge were covered in the challenge’s requirements, so this week’s solution will gloss over these parts. Parsing and Cleaning the JSON JSON headers are always broken up using fullstops, so a Custom Split on every “.” easily breaks this up for us. There’s a reason we suggest breaking it up before filtering any rows. You could straight away filter on “ NOT CONTAINS( [JSON_Name] ,”meta”) ”, however if you split the JSON up first you can then easily click on and exclude unnecessary rows of data. From this view you can also easily remove any unnecessary information contained in the JSON_Name by identifying generated Split fields that only contain a single value. Removing fields that do not add any detail. Pivoting the ...

2019: Week 11

April 24, 2019

This week is all about stocks but you have Ian Baldwin to thank for this challenge. He posed us the challenge of taking a JSON output from a shares website and turning it in to a file for use within Tableau. Tableau Prep does not have a connector to allow us to download the data from the site (yet??), or parse JSON (yet??), but we can take a very raw file and manipulate the data file to build out a table that we would commonly use in Tableau Desktop. Requirements Input data from the .csv Break up the JSON_Name field Exclude 'meta' and '' records in the same column to just leave 'indicators' and 'timestamp' For the column containing our metrics, if this is blank, take the value from the 'indicators' / 'timestamp' column. Rename this field as 'Data Type' There is a column that will contain just numbers (up to 502). If this column is blank then take the value from the other column that contains similar values up to 50...

2019: Week 10 Solution

April 23, 2019

Our full solution can be viewed below and can be downloaded here . The full solution workflow. Figuring out the Join Logic Probably the biggest challenge of this week’s Preppin’ Data is trying to figure out how to correctly join the Mailing List 2018 data with the Unsubscribed List data. Prepping the Unsubscribed List On visual inspection we can see that each [Email] in the Mailing List begins with each person’s first name initial and their full surname. This can be easily recreated in the Unsubscribed List using the following calculated field. LOWER( LEFT([first_name], 1) + [last_name] ) Removing spaces and punctuation from this newly created [Join Field] (as some surnames have hyphens or spaces) means our Unsubscribed List is prepped to join. However, things aren’t so simple back on the Mailing List side. Prepping the Mailing List 2018 We can start by using a custom split on [email]...

2019: Week 10

April 17, 2019

Following our complaints analysis last week, Chin & Beard Suds Co. (still our mythical organisation) is looking to manage its marketing mailing lists. Sadly, our customers are not just sporadically complaining, they are also choosing not to receive of marketing. We are continually releasing new scents in our products and we want to let our customers know. Sadly for us, our website has an unsubscribe button that only let's people enter their First and Last Name. It does capture the date they want to unsubscribe so they can resubscribe at a later date. Our mailing list is a list of emails that are consistent enough that we can join these two data sets together, but not easily. The business needs to understand not just who they can market too but also, how much revenue we are losing by our customers not showing interest in us. Luckily, we have the raw data to help us understand this but: We want to have a nice list of emails that we CAN still market to (and include if they ...

2019: Week 9 Solution

April 16, 2019

First let me preface this write-up by stating that these complaints make up the minority of our feedback – most of our customers are incredibly happy with our products and we have won all lawsuits against us. Not that any lawsuits occurred. But if they did, we won them. This week I’ll once again be covering some of the main challenges faced and techniques used to overcome them. You can view our whole dynamic solution below and download it here (you can also find a less dynamic but also viable solution there too!). The Bug One of the main issues involved in this week’s challenge was finding away around a known Tableau Prep bug. As of 2019.1.3, the “ All ” option in Automatic Split and Custom Split allows you to split a text field on every instance of your chosen delimiter. No more capping at splitting off the first or last 10, hooray! However, this only applies when you’re using the split on an untouched field from your data source. If you create a calculated field that c...

2019: Week 8 Solution

April 09, 2019

This week on Preppin’ Data we once again had you all serve as labour for Chin & Beard Suds Co. It was quite a tough one! In order to overcome this challenge, our method uses a number of techniques from previous challenges. We mentioned at the bottom of the requirements that we used the Rows-to-Column pivot and another key hint provided was: although you had five questions to answer they can all be answered using the output data-set provided. What is this hinting at? Using the Profile Pane to answer questions! Before we dig into a few of the main challenges in the workflow itself, let’s take a look at the answers to the 5 questions themselves. 1) Whilst we can see both bar and liquid soap has been stolen in similar frequencies, the liquid soap has been stolen in much greater quantities. 2) & 3) We can see that 19 items of stock haven’t been updated yet: 5 in 'Wimbledon 2' and 14 in 'Oxford Street'. 4) The only store to update its stock in 1 day o...

2019: Week 9

April 08, 2019

The rollercoaster ride of Chin & Beard Suds Co continues unabated. This time we have been asked by our Social Media manager to run our analytical minds over some questions on Complaints we receive via Twitter. Our Social Media manager has just pulled out the complaints for us to analyse where someone has tagged our Twitter handle @C&BSudsCo but then followed up with some issues raised. We need to know the common themes of these tweets and be ready to re-run this analysis at any time. In order to do that, we want you to make the data available to build a view on what common words are used in the complaint tweets. Requirements: Input data Remove Chin & Beard Suds Co Twitter handle Split the tweets up in to individual words Pivot the words so we get just one column of each word used in the tweets Remove the 250 most common words in the English language (sourced from here for you: http://www.anglik.net/english250.htm ) Output a list of words used alongside ...

2019: Week 8

April 03, 2019

As we saw last week, Profits are rolling in nicely from Chin & Beard Suds Co. but all is not perfect at the company. Recently our Suds shops have been the victim of a number of thefts. Thankfully our systems allow our stores to record the thefts, when the theft occurred and when we adjusted the inventory to reflect the reduced amounts. The data has all the parts we need to answer a few questions: What product is stolen the most? How many items of stock haven't been updated in the inventory levels yet? What stores need to update their inventory levels? Which store is the fastest at updating inventory levels post a theft? Which stores have updated their stock levels incorrectly? To be able to answer these questions, we need you to create the following data set. Requirements: Input data from both sheets Update Store IDs to use the Store Names Clean up the Product Type to just return two products types: Bar and Liquid Measure the difference in days bet...

2019: Week 7 Solution

April 02, 2019

Hello everyone! So far it has been me (Carl) setting the challenges with Jonathan writing the solution posts. This week, roles have been reversed as I sneaked away to Thailand for a couple if weeks so thanks to Jonathan for keeping the (cargo) ship afloat whilst I have been away. Writing the challenge post requires Jonathan or I to have no idea about the challenge or how to approach it so was really pleased to see a challenge that has a lot of different of ways to resolve it. I will try to show a few thoughts about approaches at a couple of key points throughout the solution. The join calculation In what has become a classic Preppin' Data move already, we needed to create a calculation that broke down the 'Departure ID'. I'm not sure who coded that as an ID field as they were shortly let go from Chin & Beard Suds Co after that idea. What is needed for the join is to have both the 'Departure Date' and 'Ship ID' as if not, days with multiple ship...