The New York Times recently released a fairly thorough API which allows anyone to access a huge database of articles, as well as APIs for movie reviews, congressional vote data, campaign finance data, and more.
The API is very easy to use – you send it a custom-built query along with your API key, and it returns you your requested data in JSON format. It’s easy enough, in fact, that it only took me a few hours to build a simple applet in Processing for examining the frequency of words in NYT articles over time. The image at the top of the post compares the occurrence of the word ‘iran’ (in red) with the word ‘iraq’ (in yellow). The result can be read like a clock: You can see the Iran Contra Affair at about 2:30. The first gulf war is at about 4pm. The second Iraq invasion is the biggest spike, continuing up until the current day. Interestingy, Iran shows a large increase in frequency in the months leading up to the end of 2008.
Here is the same data shown in a more traditional format:
Here is a frequency chart for the words ‘internet’, ‘web’, and ‘twitter’, to remind us how recent all of this fancy technology is (this chart runs from 1981-2008). It’s also interesting to see the term ‘web’ overtake the term ‘internet’:
This one compared the terms ‘sex’ and ‘scandal’ – see how many times you can find the term ‘catholic church’. The sheer number of peaks in this chart give you some idea of the mad times that the last two decades have held:
You can see the full-size versions of these charts and a few others in this Flickr set. Individual developers are limited to 10,000 API calls per day, so I won’t be able to run any more until tomorrow. I would love to hear suggestions about possible word (and time) combinations to run. I will be releasing the source for this very simple app soon – in the meantime, feel free to get in touch if you have any questions or suggestions.