There have been some big changes to our lives in the past couple of years, so this week we looked back at what people have been searching for on Google and how this has changed.
Step 1 - Average Index
First we are going to focus on the Timeline data table, where we want to calculate the overall average index for each search term. Before we can calculate the average, we need to transform the table by pivoting the main search terms from columns to rows. We can use a Wildcard pivot for this by using the matching pattern 'World':
After the pivot, we have the search terms in a single column but we need to do some tidying to remove the 'Worldwide' part. This can be easily done with an automatic split, or by using a custom split to return the first occurrence before the ':' separator.
Once we have tidied and renamed some of the fields, our table should look like this:
Now we have the table in this format, we can calculate the overall average index by using a Fixed LOD. We use an LOD here instead on an aggregation tool because we want to keep all of the rows of data.
Avg Index
Then we can round the averages to 1 decimal place so that it is more manageable:
Avg Index
ROUND([Avg Index],1)
This is step 1 complete as we have calculated the overall average index for each search term:
Step 2 - Earliest Peak
Next we want to find out what the earliest was for each of the search topics to hit their peak. We are going to assume the 'peak' is when the index is the highest, therefore we first want to identify each of the max index values, and then find the earliest week that this occurred.
1. Max index
The first calculation we need is to find the maximum index for each search term. We can use the following LOD to find this:
Index Peak
Now that we have each of the 'peak' values on each row, we can use a filter calculation ([Index Peak]=[Index]) to identify the weeks where the index was the highest.
This leaves us with a single week for Online Streamer and Pet Adoption, however there are various weeks where Staycation hit it's 'peak'. Therefore, we need to use a similar technique to identify the first (min) week and then filter to return only that week.
First Peak
Then we can use the filter to remove any weeks that aren't the first one ([First peak]=[Week]).
After these changes our table now has a single row for each of the search terms, and the first week that they hit their highest index:
Step 3 - Yearly Average Index
We now want to calculate the average index for each year, however the year starts in September therefore we need to do some changes before converting the years. On a new branch, from the step before we calculated the Avg Index, we need to classify each of the dates using the following IF statement:
Year
IF [Week]<date("2017-09-01")
THEN '2016/17'
ELSEIF [Week]<date("2018-09-01")
THEN '2017/18'
ELSEIF [Week]<date("2019-09-01")
THEN '2018/19'
ELSEIF [Week]<date("2020-09-01")
THEN '2019/20'
ELSE '2020/21'
END
Then using this year we can calculate the yearly average by using an aggregation tool:
And then we can round the averages to 1 decimal place:
Yearly Avg Index
ROUND([Yearly Avg Index],1)
As we are only focused on comparing this year to last year, we can filter out the remaining years so that only 2019/20 and 2020/21 remain. Our table now looks like this:
Step 4 - Lockdown Fad or Still Trendy
We can now compare whether the trend has increased or decreased since last year by comparing the avg index this year compared to last year.
The first step is to bring each of the years averages into the same row by using a Rows to Columns pivot:
Now they are on the same row, we can use the following calculation to classify the lockdown fads or still trendy categories:
Status
IF [2020/21]>[2019/20]
THEN "Still trendy"
ELSE "Lockdown Fad"
END
Our table should now look like this:
Step 5 - Country Breakdown
It's now time to focus on the country breakdown to see if these trends are similar across various countries. By using the Country Breakdown table as a new input, we can start by cleaning the table a little bit by removing any null values from the Pet adoption: (01/09/2016 - 01/09/2021) field.
We then want to pivot each of the search terms so that they are in a single column, by using a columns to rows pivot:
Then we can tidy the fields by using an automatic split and then renaming the field names so that we have the following table:
Finally we can filter the search terms so that we only have the countries with the highest percentage. This uses a similar technique to earlier where we identified the highest percentages using an LOD and then use a filter to leave only the highest.
Highest %
Then use this filter to retain only the highest % - [Highest %]=[Percent].
Our countries should look like this:
Step 6 - Bringing Everything Together
The final step this week is to bring each of the three branches together into a single data source. First we want to join the Trend Peak branch with the Yearly Avg branch by joining on the Search Term field:
Then after this join we can join the country breakdown branch, again using the Search Term field:
After these joins we have our final output that looks like this:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year&
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will use dif
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in