2021: Week 46 Book Shop Data Modelling

 Challenge by: Carl Allchin    

For any Tableau Desktop users, the relationship model being introduced in 2020.2 created a very different way of working with data. The aim of the new functionality was to remove the complexity of having to think about tying multiple data sets together. If you've worked with multiple data sets before, I'm sure you've stumbled across a time where the number of rows in your resulting data set formed by your joins has exploded to a large number from what you initially had. 

Relationships makes use of:

  • Smart Aggregations - avoid duplication of values that come from a join condition that links one row to many rows in the other data source
  • Conditional Joins - uses different types of joins depending on what the model assess that you are trying to do within your visualisation. 
This flexibility comes from Tableau creating a 'logical data model' that it then flexes. If you've created Joins and Unions before then you will have built a 'physical data model'. 

This all sounds great but when exploring data sets I've found that many people struggle to lock on to exactly what is going on in the data. You might not even have access to Tableau to make use of the relationship model. This leads us to this week's challenge where you will build the physical data model to help you understand what is happening in Tableau's main example data set for relationships, Bookshop. 

Input

One Excel workbook with 13 worksheets of data. 


Requirements

You aim is to build a data set that is one row per item sold for each order made at the Book Shop.
  • Input data
  • Union all the Sales data together to form one row per item in a sale
    • This is the granularity of the data set throughout the whole challenge (56,350 rows)
  • Join all other data sets in the workbook on to this data
    • Never let the number of rows change
      • You may need to disregard incomplete records or summarise useful data into a metric instead of including all the detail
  • Remove any duplicate fields
  • Remove the two fields created (in Prep at least) as the result of the Union:
    • Table Names
    • Sheet Names
  • Output your resulting single table

Output



38 Fields (calculations / aggregations in Italics):
  • Book ID
  • Sales Date
  • ISBN
  • Discount
  • Item ID
  • Order ID
  • First Name
  • Last Name
  • Birthday
  • Country of Residence
  • Hrs Writing per Day
  • Title 
  • Auth ID
  • Format
  • PubID
  • Publication Date
  • Pages
  • Print Run Size (k)
  • Price
  • Publishing House
  • City
  • State
  • Country
  • Year Established
  • Marketing Spend
  • Number of Awards (avg only)
  • Number of Months Checked Out
  • Total Checkouts
  • Genre
  • SeriesID
  • Volume Number
  • Staff Comment
  • Series Name
  • Planned Volumes
  • Book Tour Events
  • Average Rating
  • Number of Reviewers
  • Number of Reviews

56,350 rows (56,351 rows including headers)

You can download the full output here

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text