2021: Week 51 - Departmental December - IT
Challenge by: Jenny Martin
It's the IT department's turn to receive our help this week. Rather than optimising the data for feeding into a data visualisation tool, we're going to think about how to optimally store data. In particular, we're going to be thinking about breaking a dataset down into fact and dimension tables.
In many challenges, we focus on bringing datasets together that have a common ID field. Databases will often follow a star schema, where each dimension is a separate table that can join onto the main fact table. We have received the following desired schema from IT:
Input
We have one large table to break down into the desired fact and dimension tables:
Requirements
- Input the Data
- Split out the store name from the OrderID
- Turn the Return State field into a binary Returned field
- Create a Sales field
- Create 3 dimension tables for Store, Customer and Product
- When assigning IDs, these should be created using the dimension and minimum order date fields so that the IDs do not change when later orders are placed
- For the Customer dimension table, we want to include additional fields detailing their total number of orders and the % of products they have returned
- Replace the dimensions with their IDs in the original dataset to create the fact table
- Output the fact and dimension tables
Outputs
Fact Table
- 8 fields
- Store ID
- CustomerID
- OrderID
- Order Date
- ProductID
- Returned
- Quantity
- Sales
- 2,231 rows
Store Dimension Table
- 5 fields
- CustomerID
- Customer
- Return %
- Number of Orders
- First Order
- 380 rows
Product Dimension Table
You can download the full outputs here.
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja, @JennyMartinDS14 & @TomProwse1
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!