2020: Week 40
Challenge by: Jenny Martin I often see dashboards and wonder about the data prep behind them. Sometimes the most beautiful of dashboards can be hiding the most horrendous of data preparation. Let's take this Viz of the Day from dataschooler Matthew Armstrong . The visualisation itself is fairly simple, but how did the data start off? Explore Matthew's viz here Inputs There are three inputs this week: The poems, scarped from everypoet.com The Scrabble scores for each letter (Optional) Scaffolding list Requirements Input the data Lines of the poem will not contain any HTML, css or js e.g. <head>, e9=new Object() etc. Filter out any rows which are not lines of the poem Wordsworth is very original, so there shouldn't be any duplicate lines in our data set. Filter out any repeated rows The first line of each poem is also the title of the poem. Ensure this is the case and number the lines of each poem Split the data out so there is a line for each word and assign a word ...