State of the Union(s)

New York Times, 01/27/10 - State of the Union Graphic

I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ‘state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.

The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)

Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:

State of the Union Graphs

Another avenue I explored was to use the word weights to determine a ‘score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:

SOTU analysis - Sentence Weighting- UK

The Netherlands:

SOTU analysis - Sentence Weighting - Netherlands

And Botswana:

SOTU analysis - Sentence Weighting - Botswana

Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with:

The Most Important Sentence From Obama's State of the Union Address?

Tokyo | Cairo: Comparing Obama’s Foreign Policy Speeches

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

I spent a little bit of time today working on my text comparison tool, which I built last weekend to satisfy my curiosity about the similarities and differences between two very similar articles published on head injuries in the NFL (you can read the post here). I wanted to test out the tool with a different kind of content, and settled on something more political: two high profile foreign policy speeches by US President Barack Obama.

The first speech is Obama’s famous open address to the Muslim world, given in July at the University of Cairo. The second is much more recent – yesterday’s speech delivered at the Suntory Hall in Tokyo. As you might expect, the two speeches share a lot of common language. Here is the big picture, showing the top 100 words:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

The shared words – ‘america’,’world’,’common’,’human’,’responsibility’, ect don’t offer much in the way of analysis. Things start to get interesting, though, when we look towards the edges (click on the images to see larger versions):

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

At the far extremes, the speech in Cairo was about Islam, about Palestinians, about peace, faith, and communities. The Tokyo address was about China, North Korea, security, agreement and growth. If we look at some of the common words that were used in both speeches, we can see some more interesting patterns emerge.

It seems, for instance that the Egyptian address was more about people, whereas the speech in Tokyo was directed towards nations:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

Obama makes many more mentions about peace in Cairo (in Japan, this word seems to have been replaced by ‘security’), and far more mentions of prosperity in Tokyo:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

There was a lot of speculation prior to Obama’s speech in Asia about how much focus the President would put on human rights. In the speech, Obama mentions ‘rights’ only five times – once at the beginning of the speech and four times near the end. This weighting is interesting when we compare it to Obama’s reference to China during the speech, which is heavily concentrated at the beginning (China is not mentioned at all past the half-way point of the speech):

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

Though one of the five occurrences of ‘rights’ is in reference to China, it appears from this analysis that there may have been a deliberate plan to keep the ‘human rights part’ of the speech separated from the ‘China part’.

There likely many interesting things in this data set, a lot of which are open to interpretation. While it’s doubtful that one can steer entirely clear of political biases during this kind of comparison, the quantitative nature of the data makes it a little bit easier to make an attempt at nonpartisan analysis. I will be including these speeches as sample texts when I release the tool to the public (hopefully next week).