2021: Week 9 - Working with Strings

Challenge By: Owen Barnes

We have a guest contributor this week! Owen has just finished up his training at the Data School and is a regular Preppin' Data participator. So here's his challenge:

This challenge will be useful for anyone trying to improve their knowledge of Tableau Prep functions, and their data parsing in general. There is an opportunity to use ReGex in this challenge, but there is a longer workaround with other steps available. There will also be the chance to use LODs in this workflow.

We have been given a set of messy strings, which contain useful information that we need to connect to other datasets to eventually find out how much revenue we have generated by selling different products. This string provides us with information such as the quantity of items sold, the product ID code, the phone number of the buyer, and the area code which will let us find out where they are purchasing from. There will also be some small calculations needed to join certain datasets together!

Inputs

There are three inputs for this week's challenge.

Customer Information: 


Area Code Lookup:

Product Lookup:

Requirements

  • Input the Customer Information file, split the values and reshape the data so there is a separate ID on each row. (help1,help2)
  • Each ID field contains the following information we need to extract: (help)
    • The first 6 digits present in each ID is the customers phone number
    • The first 2 digits after the ‘,’ is the last 2 digits of the area code 
    • The letter following this is the first letter of the name of the area that they are calling from
    • The digits after this letter resemble the quantity of products ordered
    • The letters after the ‘-‘ are the product ID codes 
  • Rename these fields appropriately, and remove any unwanted columns – leaving only these 5 columns in the workflow. 
  • Input the Area Code Lookup Table – find a way to join it to the Customer information file (help)
  • We don’t actually sell products in Clevedon, Fakenham, or Stornoway. Exclude these from our dataset (help)
  • In some cases, the ID field does not provide accurate enough conditions to know where the customer is from. Exclude any phone numbers where the join has produced duplicated records.
  • Remove any unwanted fields created from the join. 
  • Join this dataset to our product lookup table. 
  • For each area, and product, find the total sales values, rounded to zero decimal places (help)
  • Rank how well each product sold in each area. (help)
  • For each area, work out the percent of total that each different product contributes to the overall revenue of that Area, rounded to 2 decimal places. (help)
  • Output the data (updated 03/03/21 17:45 GMT)

Output


  • 5 fields
    • Rank
    • Area
    • Product Name
    • Revenue
    • % of Total – Product
  • 144 rows (145 including headers)

The full output can be downloaded here.

After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @OwenBData@Datajedininja@JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help! 

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text