2019: Week 30
Data is everywhere and when your colleague finds a fun dataset then it instantly needs to be cleaned. This is what happened when Andy Kriebel found the Serpentine Swimming Club tweets that record the temperature and a fun comment about what happened that day.
So what words get used more as the temperatures increase? How about those 'nippy' 10oC days? Well I've produced a simple Tableau Public view to let you analyse the output and check your results.
So what words get used more as the temperatures increase? How about those 'nippy' 10oC days? Well I've produced a simple Tableau Public view to let you analyse the output and check your results.
Requirements
Serpentine Swim Club tweets
Common Words file (same as PD week 6)
- Input the Serpentine Tweets
- Only keep tweets that give water / air temperatues
- Extract Water and Air Temperatures as separate columns
- Remove Common English words by linking the 2nd Input (Common English words)
- Remove unrequired fields and remove punctuation from your words from the tweets
- Output csv or file type of your choice if you want to build the view
Output
- 7 Columns (Comment Split, Category, TempF, TempC, Comment, Tweet Id, Created At)
- 15,684 Rows (15,685 including headers)
The output can be found here for comparison. Don't to forget to fill in our participation tracker and share your solutions with us using #PreppinData on Twitter!