This week we visited the IT Department to give them some help with how to optimally store their data by using fact and dimension tables. We received a desired schema from the IT department, so it was our job to split apart our table so that it meets their requirements.
Step 1 - Store Name
The first step is to input our table, and then we can focus on splitting out the Store name from the OrderID field. The OrderID contains the Store, then a '-', and finally a number (the OrderID). We want to separate each side of the '-' into separate fields, therefore we can use the automatic split function to do this. Once we rename the newly split fields we should have separate columns for Store & OrderID:
Step 2 - Returned Field
Next we want to determine whether or not the item was returned, and then turn this into a binary field. Currently, if the item has been returned then it has the string 'Return Processed' and if it hasn't been returned then there is a null.
Therefore, we can use the following calculation to turn this into a 1 (returned) or 0 (not returned):
Returned
IF ISNULL([Return State])
THEN 0
ELSE 1
END
As a result our table now looks like this:
Step 3 - Sales Field
The next step is to create a sales field. This is going to be calculated by multiplying the Unit Price and Quantity fields together.
First we need to remove the '£' symbol from the Unit Price field, and then turn this into a decimal number:
Unit Price
FLOAT(
REPLACE([Unit Price],"£","")
)
And then we can calculate the Sales now that the Unit Price is a decimal:
Sales
[Unit Price] * [Quantity]
As a result of these two calculations our table should now look like this:
Step 4 - Dimension Tables
We can now create each of the dimension tables that were in the IT departments desired schema. This includes a separate table for Store, Customer, and Product.
For each of the tables we need to identify the minimum order date so that the IDs do not change when later orders are placed. Also, for the customer dimensions table we want to include additional fields detailing the total number of orders and % of products that have been returned.
Store
First we need to aggregate our table so that we have a row for each Store and when the first order date was. To do this we can use the aggregation tool to group by Store and then find the Min Order Date:
We then want to create a StoreID for each of the stores using the Ranking function. To create the StoreID we can use the following calculation:
Finally we can rename the Order Date to First Order and our Store table should look like this:
Customer
Next up is the Customer table, so we can create a new branch and again we want to aggregate our table. This time we need to group by Customer, and then Sum Returned, Sum Number of Rows, Count OrderID, and Min Order Date.
We can now tidy up the table with renaming the following fields:
Number of Rows --> Number of Products
Order Date --> First Order
OrderID --> Number of Orders
Now we want to calculate the % of orders that have been returned:
Return %
ROUND(
[Returned]
/
[Number of Products]
,2)
Then we can create the Customer ID based on a ranking from their first order:
CustomerID
Our Customer dimension table should look like this:
Product
The final dimension table that we need to create is the Product table. Just like the previous two tables, we are going to create a new branch with an aggregation tool to group by Category, Sub-Category, Product Name, and Unit Price, then we want to return the Min Order Date as well.
Once we have the aggregated table, we can rename Order Date to First Order, and then create a Product ID by using the ranking just like in the other dimension tables:
ProductID
Our final Product table should look like this:
Step 5 - Replace Dimensions with IDs
The final step of this week's challenge is to replace the dimension name with the ID in our fact table. To do this we need to join each of the dimension tables to the original fact table and then remove the dimension names.
First we want to join the Store table:
This is going to be an inner join by using the Store field from both tables. After removing the Store, Store-1, and First Order fields the table should look like this:
Next up we want to join on the Customer table.
Again we are going to use an inner join but this time we want to use the Customer field from both tables. We can then remove Number of Orders, Return %, Customer, Customer-1, and First Order from the table so it looks like this:
The last table that we need to join is the Product table, and we are going to use a similar technique as the other tables.
Again we are using an inner join, but this time using the Product Name field from each table and then remove the Unit Price, First Sold, Category, Category-1, Sub-Category, Sub-Category-1, Product Name, Product Name-1 so that the table looks like this:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year...
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in ...
Created by: Carl Allchin It's the second week of our introductory challenges. This week the challenge will involve unions, aggregation and reshaping data. Input The input data set for this week is the output from week one. You can download them here . Requirements Input the two csv files Union the files together Convert the Date field to a Quarter Number instead Name this field Quarter Aggregate the data in the following ways: Median price per Quarter, Flow Card? and Class Minimum price per Quarter, Flow Card? and Class Maximum price per Quarter, Flow Card? and Class Create three separate flows where you have only one of the aggregated measures in each. One for the minimum price One for the median price One for the maximum price Now pivot the data to have a column per class for each quarter and whether the passenger had a flow card or not Union these flows back together What's this you see??? Economy is the most expensive seats and first class is the cheapest? When you go and ...