I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ‘state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.
The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)
Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:
Another avenue I explored was to use the word weights to determine a ‘score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:
Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with: