This is a follow on challenge from last week where we take a further look at analysing the admissions data for our schools.
If you haven't completed Week 25 then go back and do that one first as we use the output as on of the inputs in this challenge.
Step 1 - Combine Additional Information
We want to combine both of our tables together so that we're working from a single table. To do this we need to extract the initials of the first and last names from the full name field. You can do this by splitting the name into separate fields and then use the Left() function, or you can do this in a single calculation:
Initials
LEFT([Full Name],1)
+
LEFT(SPLIT([Full Name],' ',2),1)
We can then join the tables using an inner join on Initials, Date of Birth, School Name, English, Maths, Science fields.
Step 2 - Ranking
Next we want to rank the students based on their grades, subject selections, and region. For this rank we want to group by Subject Selection and Region, then order by Grade Score (Desc) and Distance from School (Asc)
Rank A
We can then use this rank to create an Accepted or Rejected flag:
Accepted or Rejected
IF
[Rank A] <= 15 AND [Region] = "EAST"
THEN "Accepted"
ELSEIF [Rank A] <=5 AND [Region] = "WEST"
THEN "Accepted"
ELSE
"Rejected"
END
Then in our table we want to filter to keep only the Accepted students and the table should look like this:
Step 3 - Region % of Totals
We now want to identify the top students and then ensure that they are split in the 75/25% split from the given regions.
First we need to calculate how many students are from each school. We can do this by using an aggregate step where we group by Region and School Name then Sum Number of Rows:
Next we want to calculate the total spaces that are available within the region. We can use a Fixed LOD to calculate this by grouping by region then Sum Number of Rows:
Spaces by Region
We rename the Number of Rows field to Total Accepted, then we can calculate the % of total each school takes up within the region:
% of Total within Region
100 * ([Total Accepted]/[Spaces by Region])
At this stage we can rename some fields so our table looks like this:
Step 4 - School Performance
Next we want to identify the high performing schools and label them with a School Status. To do this we first need to identify the min and max % for each region, we can do this with another LOD calculation where we group by Region and find the Min & Max from the % of Total within Region field:
Min per Region
Max per Region
We can then use these fields to identify the highest and lowest performing schools and give them a flag using the following calculation:
School Status
IF [Max per Region]=[% of total within Region]
THEN 'High Performing'
ELSEIF [Min per Region]=[% of total within Region]
THEN 'Low Performing'
ELSE 'Average Performing'
END
From this table we only need the School Status and School Name fields and can remove all of the others so our table looks like this:
Finally we can add the school status onto each of the students names by joining the workflow back to the step before the aggregation by using an inner join on School Name. This will bring back all of the student information whilst adding the school rating onto each of them:
We are now ready to output the table that should look like this:
You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!
Created by: Carl Allchin Welcome to a New Year of Preppin' Data challenges. For anyone new to the challenges then let us give you an overview how the weekly challenge works. Each Wednesday the Preppin' crew (Jenny, myself or a guest contributor) drop a data set(s) that requires some reshaping and/or cleaning to get it ready for analysis. You can use any tool or language you want to do the reshaping (we build the challenges in Tableau Prep but love seeing different tools being learnt / tried). Share your solution on LinkedIn, Twitter/X, GitHub or the Tableau Forums Fill out our tracker so you can monitor your progress and involvement The following Tuesday we will post a written solution in Tableau Prep (thanks Tom) and a video walkthrough too (thanks Jenny) As with each January for the last few years, we'll set a number of challenges aimed at beginners. This is a great way to learn a number of fundamental data preparation skills or a chance to learn a new tool — New Year&
Created by: Carl Allchin Welcome to a New Year of Preppin' Data. These are weekly exercises to help you learn and develop data preparation skills. We publish the challenges on a Wednesday and share a solution the following Tuesday. You can take the challenges whenever you want and we love to see your solutions. With data preparation, there is never just one way to complete the tasks so sharing your solutions will help others learn too. Share on Twitter, LinkedIn, the Tableau Forums or wherever you want to too. Tag Jenny Martin, Tom Prowse or myself or just use the #PreppinData to share your solutions. The challenges are designed for learning Tableau Prep but we have a broad community who complete the challenges in R, Python, SQL, DBT, EasyMorph and many other tools. We love seeing people learn new tools so feel free to use whatever tools you want to complete the challenges. A New Year means we start afresh so January's challenges will be focused on beginners. We will use dif
Free isn't always a good thing. In data, Free text is the example to state when proving that statements correct. However, lots of benefit can be gained from understanding data that has been entered in Free Text fields. What do we mean by Free Text? Free Text is the string based data that comes from allowing people to type answers in to systems and forms. The resulting data is normally stored within one column, with one answer per cell. As Free Text means the answer could be anything, this is what you get - absolutely anything. From expletives to slang, the words you will find in the data may be a challenge to interpret but the text is the closest way to collect the voice of your customer / employee. The Free Text field is likely to contain long, rambling sentences that can simply be analysed. If you count these fields, you are likely to have one of each entry each. Therefore, simply counting the entries will not provide anything meaningful to your analysis. The value is in