2020: Week 24

There has been many a battle fought on British soil across the years and this website provides a lot of information about many of the battlefields. It's a great website to practice our webscraping skills, but webscraping doesn't always lead to the cleanest of datasets. So this week's focus will be on cleaning up a subset of our webscraped data.

Input


Requirements

  • Input the data.
  • Find a natural way to split the data into different fields.
  • Remove rows which are incomplete.
    • i.e. if they do not have information in each field.
  • Clean battle names.
    • Ensure each row has a unique battle name.
  • Clean the dates.
    • For those dates with a date range, just use the start date.
    • The dateparse function may be useful here.
  • Clean the Victors, War and Description fields.
  • Output the data.

Output

  • 5 fields
    • Date
    • Battle
    • War
    • Victors
    • Description
  • 63 rows (64 including headers)
The full output can be found here for comparison.

Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows! 

Popular posts from this blog

2024: Week 1 - Prep Air's Flow Card

2023: Week 1 The Data Source Bank

How to...Handle Free Text