2021: Week 51 - Solution

 


Solution by Tom Prowse and you can download the workflow here


This week we visited the IT Department to give them some help with how to optimally store their data by using fact and dimension tables. We received a desired schema from the IT department, so it was our job to split apart our table so that it meets their requirements.

Step 1 - Store Name

The first step is to input our table, and then we can focus on splitting out the Store name from the OrderID field. The OrderID contains the Store, then a '-', and finally a number (the OrderID). We want to separate each side of the '-' into separate fields, therefore we can use the automatic split function to do this. Once we rename the newly split fields we should have separate columns for Store & OrderID:


Step 2 - Returned Field

Next we want to determine whether or not the item was returned, and then turn this into a binary field. Currently, if the item has been returned then it has the string 'Return Processed' and if it hasn't been returned then there is a null. 

Therefore, we can use the following calculation to turn this into a 1 (returned) or 0 (not returned): 

Returned
IF ISNULL([Return State])
THEN 0
ELSE 1
END

As a result our table now looks like this:



Step 3 - Sales Field

The next step is to create a sales field. This is going to be calculated by multiplying the Unit Price and Quantity fields together.

First we need to remove the '£' symbol from the Unit Price field, and then turn this into a decimal number: 

Unit Price 
FLOAT(
REPLACE([Unit Price],"£","")
)

And then we can calculate the Sales now that the Unit Price is a decimal: 

Sales
[Unit Price] * [Quantity]

As a result of these two calculations our table should now look like this: 



Step 4 - Dimension Tables

We can now create each of the dimension tables that were in the IT departments desired schema. This includes a separate table for Store, Customer, and Product.

For each of the tables we need to identify the minimum order date so that the IDs do not change when later orders are placed. Also, for the customer dimensions table we want to include additional fields detailing the total number of orders and % of products that have been returned. 

Store

First we need to aggregate our table so that we have a row for each Store and when the first order date was. To do this we can use the aggregation tool to group by Store and then find the Min Order Date: 



We then want to create a StoreID for each of the stores using the Ranking function. To create the StoreID we can use the following calculation: 


Finally we can rename the Order Date to First Order and our Store table should look like this: 


Customer 
Next up is the Customer table, so we can create a new branch and again we want to aggregate our table. This time we need to group by Customer, and then Sum Returned, Sum Number of Rows, Count OrderID, and Min Order Date.


We can now tidy up the table with renaming the following fields: 
  • Number of Rows --> Number of Products
  • Order Date --> First Order
  • OrderID --> Number of Orders
Now we want to calculate the % of orders that have been returned: 

Return %
ROUND(
[Returned]
/
[Number of Products]
,2)

Then we can create the Customer ID based on a ranking from their first order:

CustomerID 


Our Customer dimension table should look like this: 


Product

The final dimension table that we need to create is the Product table. Just like the previous two tables, we are going to create a new branch with an aggregation tool to group by Category, Sub-Category, Product Name, and Unit Price, then we want to return the Min Order Date as well.


Once we have the aggregated table, we can rename Order Date to First Order, and then create a Product ID by using the ranking just like in the other dimension tables: 

ProductID  


Our final Product table should look like this: 


Step 5 - Replace Dimensions with IDs

The final step of this week's challenge is to replace the dimension name with the ID in our fact table. To do this we need to join each of the dimension tables to the original fact table and then remove the dimension names. 

First we want to join the Store table:


This is going to be an inner join by using the Store field from both tables. After removing the Store, Store-1, and First Order fields the table should look like this: 


Next up we want to join on the Customer table. 

Again we are going to use an inner join but this time we want to use the Customer field from both tables. We can then remove Number of Orders, Return %, Customer, Customer-1, and First Order from the table so it looks like this: 


The last table that we need to join is the Product table, and we are going to use a similar technique as the other tables.


Again we are using an inner join, but this time using the Product Name field from each table and then remove the Unit Price, First Sold, Category, Category-1, Sub-Category, Sub-Category-1, Product Name, Product Name-1 so that the table looks like this: 


You can download the full outputs here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 


Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

How to...Handle Free Text

2023: Week 1 The Data Source Bank