This week we visited the IT Department to give them some help with how to optimally store their data by using fact and dimension tables. We received a desired schema from the IT department, so it was our job to split apart our table so that it meets their requirements.
Step 1 - Store Name
The first step is to input our table, and then we can focus on splitting out the Store name from the OrderID field. The OrderID contains the Store, then a '-', and finally a number (the OrderID). We want to separate each side of the '-' into separate fields, therefore we can use the automatic split function to do this. Once we rename the newly split fields we should have separate columns for Store & OrderID:
Step 2 - Returned Field
Next we want to determine whether or not the item was returned, and then turn this into a binary field. Currently, if the item has been returned then it has the string 'Return Processed' and if it hasn't been returned then there is a null.
Therefore, we can use the following calculation to turn this into a 1 (returned) or 0 (not returned):
Returned
IF ISNULL([Return State])
THEN 0
ELSE 1
END
As a result our table now looks like this:
Step 3 - Sales Field
The next step is to create a sales field. This is going to be calculated by multiplying the Unit Price and Quantity fields together.
First we need to remove the '£' symbol from the Unit Price field, and then turn this into a decimal number:
Unit Price
FLOAT(
REPLACE([Unit Price],"£","")
)
And then we can calculate the Sales now that the Unit Price is a decimal:
Sales
[Unit Price] * [Quantity]
As a result of these two calculations our table should now look like this:
Step 4 - Dimension Tables
We can now create each of the dimension tables that were in the IT departments desired schema. This includes a separate table for Store, Customer, and Product.
For each of the tables we need to identify the minimum order date so that the IDs do not change when later orders are placed. Also, for the customer dimensions table we want to include additional fields detailing the total number of orders and % of products that have been returned.
Store
First we need to aggregate our table so that we have a row for each Store and when the first order date was. To do this we can use the aggregation tool to group by Store and then find the Min Order Date:
We then want to create a StoreID for each of the stores using the Ranking function. To create the StoreID we can use the following calculation:
Finally we can rename the Order Date to First Order and our Store table should look like this:
Customer
Next up is the Customer table, so we can create a new branch and again we want to aggregate our table. This time we need to group by Customer, and then Sum Returned, Sum Number of Rows, Count OrderID, and Min Order Date.
We can now tidy up the table with renaming the following fields:
Number of Rows --> Number of Products
Order Date --> First Order
OrderID --> Number of Orders
Now we want to calculate the % of orders that have been returned:
Return %
ROUND(
[Returned]
/
[Number of Products]
,2)
Then we can create the Customer ID based on a ranking from their first order:
CustomerID
Our Customer dimension table should look like this:
Product
The final dimension table that we need to create is the Product table. Just like the previous two tables, we are going to create a new branch with an aggregation tool to group by Category, Sub-Category, Product Name, and Unit Price, then we want to return the Min Order Date as well.
Once we have the aggregated table, we can rename Order Date to First Order, and then create a Product ID by using the ranking just like in the other dimension tables:
ProductID
Our final Product table should look like this:
Step 5 - Replace Dimensions with IDs
The final step of this week's challenge is to replace the dimension name with the ID in our fact table. To do this we need to join each of the dimension tables to the original fact table and then remove the dimension names.
First we want to join the Store table:
This is going to be an inner join by using the Store field from both tables. After removing the Store, Store-1, and First Order fields the table should look like this:
Next up we want to join on the Customer table.
Again we are going to use an inner join but this time we want to use the Customer field from both tables. We can then remove Number of Orders, Return %, Customer, Customer-1, and First Order from the table so it looks like this:
The last table that we need to join is the Product table, and we are going to use a similar technique as the other tables.
Again we are using an inner join, but this time using the Product Name field from each table and then remove the Unit Price, First Sold, Category, Category-1, Sub-Category, Sub-Category-1, Product Name, Product Name-1 so that the table looks like this:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year...
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in ...
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will u...