2021: Week 9 - Working with Strings
Challenge By: Owen Barnes
We have a guest contributor this week! Owen has just finished up his training at the Data School and is a regular Preppin' Data participator. So here's his challenge:
This challenge will be useful for anyone trying to improve their knowledge of Tableau Prep functions, and their data parsing in general. There is an opportunity to use ReGex in this challenge, but there is a longer workaround with other steps available. There will also be the chance to use LODs in this workflow.
We have been given a set of messy strings, which contain useful information that we need to connect to other datasets to eventually find out how much revenue we have generated by selling different products. This string provides us with information such as the quantity of items sold, the product ID code, the phone number of the buyer, and the area code which will let us find out where they are purchasing from. There will also be some small calculations needed to join certain datasets together!
Inputs
There are three inputs for this week's challenge.
Customer Information:
Area Code Lookup:
Product Lookup:
Requirements
- Input the Customer Information file, split the values and reshape the data so there is a separate ID on each row. (help1,help2)
- Each ID field contains the following information we need to extract: (help)
- The first 6 digits present in each ID is the customers phone number
- The first 2 digits after the ‘,’ is the last 2 digits of the area code
- The letter following this is the first letter of the name of the area that they are calling from
- The digits after this letter resemble the quantity of products ordered
- The letters after the ‘-‘ are the product ID codes
- Rename these fields appropriately, and remove any unwanted columns – leaving only these 5 columns in the workflow.
- Input the Area Code Lookup Table – find a way to join it to the Customer information file (help)
- We don’t actually sell products in Clevedon, Fakenham, or Stornoway. Exclude these from our dataset (help)
- In some cases, the ID field does not provide accurate enough conditions to know where the customer is from. Exclude any phone numbers where the join has produced duplicated records.
- Remove any unwanted fields created from the join.
- Join this dataset to our product lookup table.
- For each area, and product, find the total sales values, rounded to zero decimal places (help)
- Rank how well each product sold in each area. (help)
- For each area, work out the percent of total that each different product contributes to the overall revenue of that Area, rounded to 2 decimal places. (help)
- Output the data (updated 03/03/21 17:45 GMT)
Output
- 5 fields
- Rank
- Area
- Product Name
- Revenue
- % of Total – Product
- 144 rows (145 including headers)