2019: Week 44 Solution
In this week’s
Preppin’ Data we continued with some end of year tasks, and this week we looked
at how the team performed over the year.
Our first task is to
find out how many staff worked in each store on each day. To do this we input
the Team Member table, then use an aggregate tool to group by Store & Date,
then sum the No. of Rows within the table. As there is one row for each member
of staff who worked on that given day, this aggregation allows us to total how
many staff worked in each store on each individual day. We have then renamed
the No. of Rows field to something more familiar such as Staff Numbers.
Next up we need to
join the Team Member and Store Sales tables together. To do this we need to
prepare the Store Sales table so that the store field is in the same format as
the Team Member table. Firstly, we need to use the pivot tool to pivot the two
store fields so the values are in a singular column, therefore we bring the
‘Lewisham’ & ‘Wimbledon’ fields into a Columns to Rows pivot. Next, we need
to rename the pivoted fields to Store & Sales so that they are ready to be
joined to the Team Members table.
Now we have prepared
the Sales table, we are ready to join both tables together. We can do this with
an inner join, joining on Store and Date. Once both tables have been joined, we
are able to calculate the Estimated Staff Sales per Day. This is calculated by
dividing the sales field from the Store Sales table, by the number of staff
worked in each store, each day.
The final step to
this week’s challenge is to determine the rank of each team member within their
store. We need to rank each member on their estimated average sales per day,
this is important, as we need to calculate how many days each member of staff
has worked within the period first. (We noticed that a few people were
getting different numbers in the final output, and this could be an area where
you were going wrong!)
Firstly, we need to
join the original Team Members data set onto our current flow. This will allow
us to calculate the number of days that each member of staff worked throughout
the year. We have decided to join from the first clean step and our current flow,
as seen below. Within this inner join, we are joining on the Store & Date
fields.
After completing this
join, we have removed any duplicated fields, then used the aggregate tool. This
tool is used to total the number of days worked, and also the estimated sales
per staff member. The aggregation tool has the following setup - Group by Store
& Team Member. Sum Estimate Sales per Staff Member & Number of Rows. We
have then renamed the Number of Rows to Days Worked.
After the
aggregation, we can now calculate the estimated avg sales per day for each
staff member. This will be used in the final ranking, so is an important
calculation for the overall output. To calculate avg sales, we divide the
Estimated Sales per Staff Member/Days Worked.
The next step to
complete the ranking, is a self-join on the current flow. Therefore, we create
an additional step then join this back onto itself. Within this join we want
the following setup:
Store = Store
Avg Sales Per Member <= Avg Sales
Per Member
This join matches the
records where the Store is the same, and also where the team member had greater
than or equal sales. Meaning, that as Bernadette had the highest sales then
they would only join to herself and have a row count of 1. This logic continues
down the ranking, so the 2nd ranked member will have a row count of 2, 3rd will
have 3 rows and so on.
Now we have the
amount of rows for each member of staff, we can use an aggregation tool to
count the number of rows each team member has. Counting these number of rows
will allow us to calculate their final ranking. The aggregation tool is setup
as follows:
The final step is to
prepare the data for the output, within this clean step we have renamed any
fields, and also used the ROUND() function to round the sales to 2 decimal
places.