2020: Week 24
There has been many a battle fought on British soil across the years and this website provides a lot of information about many of the battlefields. It's a great website to practice our webscraping skills, but webscraping doesn't always lead to the cleanest of datasets. So this week's focus will be on cleaning up a subset of our webscraped data.
Input
Requirements
- Input the data.
- Find a natural way to split the data into different fields.
- Remove rows which are incomplete.
- i.e. if they do not have information in each field.
- Clean battle names.
- Ensure each row has a unique battle name.
- Clean the dates.
- For those dates with a date range, just use the start date.
- The dateparse function may be useful here.
- Clean the Victors, War and Description fields.
- Output the data.
Output
- 5 fields
- Date
- Battle
- War
- Victors
- Description
- 63 rows (64 including headers)
The full output can be found here for comparison.
Make sure to fill in the participation tracker, share using #PreppinData on Twitter and post you solutions onto our Tableau Forums community page so that we can compare our workflows!