2019: Week 30

Data is everywhere and when your colleague finds a fun dataset then it instantly needs to be cleaned. This is what happened when Andy Kriebel found the Serpentine Swimming Club tweets that record the temperature and a fun comment about what happened that day.

So what words get used more as the temperatures increase? How about those 'nippy' 10oC days? Well I've produced a simple Tableau Public view to let you analyse the output and check your results.


Requirements

Serpentine Swim Club tweets

Common Words file (same as PD week 6)

  • Input the Serpentine Tweets
  • Only keep tweets that give water / air temperatues
  • Extract Water and Air Temperatures as separate columns
  • Remove Common English words by linking the 2nd Input (Common English words)
  • Remove unrequired fields and remove punctuation from your words from the tweets
  • Output csv or file type of your choice if you want to build the view

Output


  • 7 Columns (Comment Split, Category, TempF, TempC, Comment, Tweet Id, Created At)
  • 15,684 Rows (15,685 including headers)

The output can be found here for comparison. Don't to forget to fill in our participation tracker and share your solutions with us using #PreppinData on Twitter!

Popular posts from this blog

2023: Week 1 The Data Source Bank

2023: Week 2 - International Bank Account Numbers

How to...Handle Free Text