I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ’state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.
The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)
Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:
Another avenue I explored was to use the word weights to determine a ’score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:
The Netherlands:
And Botswana:
Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with:



























4 Comments
Here's a tag cloud of Obama's 2010 State of the Union Address:
http://robvstate.com/2010/01/27/tag-cloud-of-obam...
Beautiful visualization, I really love the way the tentacles float off into space and fold over themselves. That effect is completely lost in the cropping on the Time's site.
Looking at the results, it would have also been interesting to also see words that trended in only particular countries. It's not surprising that words like "government", "people" and "country" were popular amongst many of the speeches.
Jer,
When you're poking around to find interesting patterns in data, as you did with this project, are you doing so in Processing and writing scripts from scratch? Or are there other tools that you work with (spreadsheets, pre-made scripts, etc.) to play and experiment before settling on an approach that you take all the way to a finished product?
Nice Work! Can you tell us a little more about how you performed your word weighting/Score calculations?