I spent a little bit of time today working on my text comparison tool, which I built last weekend to satisfy my curiosity about the similarities and differences between two very similar articles published on head injuries in the NFL (you can read the post here). I wanted to test out the tool with a different kind of content, and settled on something more political: two high profile foreign policy speeches by US President Barack Obama.
The first speech is Obama’s famous open address to the Muslim world, given in July at the University of Cairo. The second is much more recent – yesterday’s speech delivered at the Suntory Hall in Tokyo. As you might expect, the two speeches share a lot of common language. Here is the big picture, showing the top 100 words:
The shared words – ‘america’,'world’,'common’,'human’,'responsibility’, ect don’t offer much in the way of analysis. Things start to get interesting, though, when we look towards the edges (click on the images to see larger versions):
At the far extremes, the speech in Cairo was about Islam, about Palestinians, about peace, faith, and communities. The Tokyo address was about China, North Korea, security, agreement and growth. If we look at some of the common words that were used in both speeches, we can see some more interesting patterns emerge.
It seems, for instance that the Egyptian address was more about people, whereas the speech in Tokyo was directed towards nations:
Obama makes many more mentions about peace in Cairo (in Japan, this word seems to have been replaced by ‘security’), and far more mentions of prosperity in Tokyo:
There was a lot of speculation prior to Obama’s speech in Asia about how much focus the President would put on human rights. In the speech, Obama mentions ‘rights’ only five times – once at the beginning of the speech and four times near the end. This weighting is interesting when we compare it to Obama’s reference to China during the speech, which is heavily concentrated at the beginning (China is not mentioned at all past the half-way point of the speech):
Though one of the five occurrences of ‘rights’ is in reference to China, it appears from this analysis that there may have been a deliberate plan to keep the ‘human rights part’ of the speech separated from the ‘China part’.
There likely many interesting things in this data set, a lot of which are open to interpretation. While it’s doubtful that one can steer entirely clear of political biases during this kind of comparison, the quantitative nature of the data makes it a little bit easier to make an attempt at nonpartisan analysis. I will be including these speeches as sample texts when I release the tool to the public (hopefully next week).