The challenge this week is to turn data set that is easy for humans to read into a data set that is easy for Tableau to read. Changing the structure is vital to being able to interrogate the data more effectively and is an important step when learning what the most optimal shape of your data should be.
Step 1 - Split Year
When we first look at this data set, my initial thought was to use the Data Interpreter however this did quite work as expected so we are going to have to go down a different route.
First we want to clean split off the year from the first row. To do this we can filter the F1 field so that we keep only 'Table 1: Store Sales 2022'.
From here we only need to keep the F1 field, and then use the following calculation to keep the year:
Year
RIGHT([F4],4)
Then we can remove F4 and change our year to a whole number. We should now just have a single field with the year 2022.
Step 2 - Pivot Rows
Now we have extracted the year, we want to turn our attention to the other rows. Therefore, we can create a new branch and then exclude the first row (where F2 is null). This allows us to have all of our other rows remaining and we can pivot these using a columns to rows pivot.
We want to pivot all of our rows apart from the F1 which contains the information about the stores. Therefore, we drag in all of the other fields apart from F1:
After the pivot we should have 3 fields
Step 3 - Row Numbers
Next we want to remove the F's from the Pivot Names field so that we have a just a row number for each.
We can use the Remove Letters functionality to remove the F from each row. Then change to a whole number.
We want to use these row numbers to group together each of the different months. If you look at the data table you can see that each row is split into either Sales or Profit for a given month. Therefore, we want to 'group' these together by changing the odd numbers to the previous even number. Eg, 3 becomes 2, 5 becomes 4 etc.
We can do this by using the Modulo (%) function that helps to identify any remainders.
Grouping
IF [Pivot1 Names]%2=1
THEN [Pivot1 Names]-1
ELSE [Pivot1 Names]
END
As a result we now have all even numbers and have grouped together each month.
Step 4 - Reshape Months
Now we have the months associated with the different row number groupings, we can start to reshape the data so it becomes more Tableau friendly.
First we just want to look at each of the months so we filter F1 to keep only the null values then we can remove this field.
Next we want to identify the max from the pivot values field for each of the groupings. This will help to replace the null values with the given month. We can do this by using a Fixed LOD:
Month
This has now provided us with a list of different months and their associated row numbers.
Next we need to join the rest of the data back to the corresponding month. Therefore, we can create a new branch and then exclude the null from F1. This returns all of the rows apart from the month.
We can then join both tables together using an inner join where Grouping = Grouping and Pivot1 Names = Pivot1 Names.
After this we can remove the duplicated Pivot1 Names-1 and Grouping-1 fields so our table looks like this:
Step 5- Reshape Headers
Now we have reshaped the months, we can now focus on the headers (Sales & Profit).
We want to do a similar technique here as we want to isolate the headers in a new clean step by keeping only the 'Store' from the F1 field.
We can then remove the Month and F1 fields and rename the Pivot1Values to Headers. Luckily we don't have any null values here so we don't need to use the LOD calculation.
Our table should look like this:
Now on a separate branch, from the previous step, we want to bring through all of the other rows. Therefore, we can exclude 'Store' from the F1 field, rename F1 to Store and Pivot1 Values to Values, then make sure the Values is a whole number.
This gives us the values table that we can use to join back to our headers table that we have just created. We can use an inner join on Grouping and Pivot1 Names again:
Then we can remove the duplicated fields as a result of the join and our table should look like this:
Now we have both the months and the headers in a better shape, we can use the pivot tool to make the final shape adjustment that we require. This time we want to use a Rows to Columns pivot where we pivot the Headers and then aggregate the Sum of Values.
This brings our Profit and Sales into separate columns and provides a better structure for use within Tableau.
Step 6 - Create Date Field
The last task for this week is to create a Date field using the month and year from the start of our flow.
From our pivot field we can join the year (2022) to every row within our data set. We can use the join condition Year != Grouping but if there were more rows of data (more than 2022 groupings) then we might want to think of using a different join condition. But for now this works just fine.
Now we can remove the Grouping field, and then use the following calculation to create the Date:
Date
(IF [Month] = 'Sep' THEN 'Sept'
ELSE [Month] END)
+ ' '
+ STR([Year])
Then we can make this a date field and remove the Month and Year so our table should now look like this and ready to output:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year&
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will use dif
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in