This week we visited the IT Department to give them some help with how to optimally store their data by using fact and dimension tables. We received a desired schema from the IT department, so it was our job to split apart our table so that it meets their requirements.
Step 1 - Store Name
The first step is to input our table, and then we can focus on splitting out the Store name from the OrderID field. The OrderID contains the Store, then a '-', and finally a number (the OrderID). We want to separate each side of the '-' into separate fields, therefore we can use the automatic split function to do this. Once we rename the newly split fields we should have separate columns for Store & OrderID:
Step 2 - Returned Field
Next we want to determine whether or not the item was returned, and then turn this into a binary field. Currently, if the item has been returned then it has the string 'Return Processed' and if it hasn't been returned then there is a null.
Therefore, we can use the following calculation to turn this into a 1 (returned) or 0 (not returned):
Returned
IF ISNULL([Return State])
THEN 0
ELSE 1
END
As a result our table now looks like this:
Step 3 - Sales Field
The next step is to create a sales field. This is going to be calculated by multiplying the Unit Price and Quantity fields together.
First we need to remove the '£' symbol from the Unit Price field, and then turn this into a decimal number:
Unit Price
FLOAT(
REPLACE([Unit Price],"£","")
)
And then we can calculate the Sales now that the Unit Price is a decimal:
Sales
[Unit Price] * [Quantity]
As a result of these two calculations our table should now look like this:
Step 4 - Dimension Tables
We can now create each of the dimension tables that were in the IT departments desired schema. This includes a separate table for Store, Customer, and Product.
For each of the tables we need to identify the minimum order date so that the IDs do not change when later orders are placed. Also, for the customer dimensions table we want to include additional fields detailing the total number of orders and % of products that have been returned.
Store
First we need to aggregate our table so that we have a row for each Store and when the first order date was. To do this we can use the aggregation tool to group by Store and then find the Min Order Date:
We then want to create a StoreID for each of the stores using the Ranking function. To create the StoreID we can use the following calculation:
Finally we can rename the Order Date to First Order and our Store table should look like this:
Customer
Next up is the Customer table, so we can create a new branch and again we want to aggregate our table. This time we need to group by Customer, and then Sum Returned, Sum Number of Rows, Count OrderID, and Min Order Date.
We can now tidy up the table with renaming the following fields:
Number of Rows --> Number of Products
Order Date --> First Order
OrderID --> Number of Orders
Now we want to calculate the % of orders that have been returned:
Return %
ROUND(
[Returned]
/
[Number of Products]
,2)
Then we can create the Customer ID based on a ranking from their first order:
CustomerID
Our Customer dimension table should look like this:
Product
The final dimension table that we need to create is the Product table. Just like the previous two tables, we are going to create a new branch with an aggregation tool to group by Category, Sub-Category, Product Name, and Unit Price, then we want to return the Min Order Date as well.
Once we have the aggregated table, we can rename Order Date to First Order, and then create a Product ID by using the ranking just like in the other dimension tables:
ProductID
Our final Product table should look like this:
Step 5 - Replace Dimensions with IDs
The final step of this week's challenge is to replace the dimension name with the ID in our fact table. To do this we need to join each of the dimension tables to the original fact table and then remove the dimension names.
First we want to join the Store table:
This is going to be an inner join by using the Store field from both tables. After removing the Store, Store-1, and First Order fields the table should look like this:
Next up we want to join on the Customer table.
Again we are going to use an inner join but this time we want to use the Customer field from both tables. We can then remove Number of Orders, Return %, Customer, Customer-1, and First Order from the table so it looks like this:
The last table that we need to join is the Product table, and we are going to use a similar technique as the other tables.
Again we are using an inner join, but this time using the Product Name field from each table and then remove the Unit Price, First Sold, Category, Category-1, Sub-Category, Sub-Category-1, Product Name, Product Name-1 so that the table looks like this:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will use dif
Challenge By: Jenny Martin For week 2 of our beginner month, Data Source Bank has a requirement to construct International Bank Account Numbers (IBANs), even for Transactions taking place in the UK. We have all the information in separate fields, we just need to put it altogether in the following order: Inputs A list of the transactions, with information about the receiving bank account A lookup table for the SWIFT Bank Codes Requirements Input the data In the Transactions table, there is a Sort Code field which contains dashes. We need to remove these so just have a 6 digit string ( hint ) Use the SWIFT Bank Code lookup table to bring in additional information about the SWIFT code and Check Digits of the receiving bank account ( hint ) Add a field for the Country Code ( hint ) Hint: all these transactions take place in the UK so the Country Code should be GB Create the IBAN as above ( hint ) Hint: watch out for trying to combine sting fields with numeric fields - check data types Re
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in