2021: Week 51 - Departmental December - IT

 Challenge by: Jenny Martin

It's the IT department's turn to receive our help this week. Rather than optimising the data for feeding into a data visualisation tool, we're going to think about how to optimally store data. In particular, we're going to be thinking about breaking a dataset down into fact and dimension tables. 

In many challenges, we focus on bringing datasets together that have a common ID field. Databases will often follow a star schema, where each dimension is a separate table that can join onto the main fact table. We have received the following desired schema from IT:


Input

We have one large table to break down into the desired fact and dimension tables:


Requirements

  • Input the Data
  • Split out the store name from the OrderID
  • Turn the Return State field into a binary Returned field
  • Create a Sales field
  • Create 3 dimension tables for Store, Customer and Product
    • When assigning IDs, these should be created using the dimension and minimum order date fields so that the IDs do not change when later orders are placed
    • For the Customer dimension table, we want to include additional fields detailing their total number of orders and the % of products they have returned
  • Replace the dimensions with their IDs in the original dataset to create the fact table
  • Output the fact and dimension tables

Outputs

Fact Table

  • 8 fields
    • Store ID
    • CustomerID
    • OrderID
    • Order Date
    • ProductID
    • Returned
    • Quantity
    • Sales
  • 2,231 rows

Store Dimension Table

  • 3 fields
    • StoreID
    • Store
    • First Order
  • 7 rows

Customer Dimension Table

  • 5 fields
    • CustomerID
    • Customer
    • Return %
    • Number of Orders
    • First Order
  • 380 rows

Product Dimension Table

  • 6 fields
    • ProductID
    • Category
    • Sub-Category
    • Product Name
    • Unit Price
    • First Sold
  • 60 rows

You can download the full outputs here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text