2021: Week 13 - Solution

April 06, 2021

Solution by Tom Prowse and you can download the full workflow here.

This week's challenge was brought to us from Simon Evans. He used to work as an analyst at a professional sports team so we looked at a similar data set that a football analyst will have to make their way through. There were a lot of fields and lots of the information weren't important to the questions that we wanted to answer, therefore this week was good practice about how to deal with these larger data sets.

Step 1 - Input All Files

The first step is to input all of the files for the challenge. We are given data from 5 different seasons and all of these csv files have the same structure meaning that we can use the wildcard union input so that all of the tables are stacked on top of each other.

Notice how we have left the matching pattern section blank as we want to include all of the files.

Step 2 - Filter & Calculate Open Play Goals

Now we have all of the files, we can start to filter and clean our data. First we want to filter some rows that we aren't interested in, these include excluding 'Goalkeeper' from the Position field and also excluding 0 from the Appearances field.

Next we want to replace any null values, in the Penalties Scored & Freekicks Scored, with a 0. This can be done by double clicking on the 'null' value and typing 0 as a replacement.

Finally we can calculate the Open Play Goals with the following calculation:

Open Play Goals

[Goals] - [Penalties scored] - [Freekicks scored]

Then renaming the Goals field to Total Goals Scored.

As our table has so many field, it can be hard to navigate and find the fields that you are looking for. I'd recommend using the 'Search' feature as this is a great way to limit the fields that you are viewing and helps to minimise the amount of scrolling that you need to do!

Step 3 - Aggregate Key Metrics

Now we have our calculation ready, we want to limit our fields and focus on the information that will help us answer the required questions.

As we are looking at 5 season's worth of data, we want to aggregate these so that we have a total for each player over this period. Our aggregation tool looks like this:

Note, this time I'm using the metadata view so again it's easier to see all of the fields without having to scroll.

After the aggregate our data should now look like this:

Step 4 - Output 1

In the first output we want to look for the top performing players in terms of goals scored and how many goals they have scored per appearance made.

To calculate the goals per appearance we use:

Open Play Goals per Appearance

[Open Play Goals]/[Appearances]

We then want to rank our players on how many Open Play Goals they have scored so we use a rank calculation:

Then finally we only want to keep the top 20 players, so we can use a filter on the rank

Our table for the first output looks like this and as you can see, Dele Alli is the only non forward to make the list:

Step 5 - Output 2

For the second output we want to focus on the top players in each position. We start we another rank calculation, but this time we include position into the group by:

Then we filter for the top 20 again on the rank field. This time out table should look like this:

The full outputs can be downloaded here.
After you finish the challenge make sure to fill in the participation tracker, then share your solution on Twitter using #PreppinData and tagging @si_ev2101, @Datajedininja, @JennyMartinDS14 & @TomProwse1

You can also post your solution on the Tableau Forum where we have a Preppin' Data community page. Post your solutions and ask questions if you need any help!

Search This Blog

Preppin' Data