Category Archives: Processing

NYT: This was 1984

NYTimes: 365/360 - 1985

This series of images uses the faceted searching abilities of the NYTimes Article Search API to construct maps of the top organizations & people mentioned in articles for a given news year. Connections between these entities are drawn, so that relationships can be found and followed.

NYTimes: 365/360 - 2001

NYTimes: 365/360 - 2009

The maps posted so far in the Flickr set are general ones – but these can also be generated for any refined keyword search. Similarly, while the current maps are for individual years, a map could be made for any given period of time (the Bush presidency, the Gulf War, September 2001), or indeed for the whole period of time available through the API (1981-present).

One of the best things about using Processing for these types of projects is that the final result can be output to different formats. These, for example, can be output very easily as .PDFs, and I do think they’d look particularly striking as wall-sized prints. 25 years * 10 feet… does anyone have a 250-foot long wall they can lend me?

Multi-faceted Searching with the NYTimes APIs

In my last post, I walked through the process of making simple requests to the New York Times Article Search API. Near the end of the post, I mentioned that the API allows for something called ‘faceted searching’ which can make the whole process a bit easier, and which can also allow users to uncover more interesting data and data relationships. In this post, I’ll show how to use facets in searches with the Article Search API and Processing – I’ll also share some code that I’ve written which makes the whole process a lot easier.

NYTimes: Going Organic 1981-2009

The image above makes use of the API to find out how often the word ‘organic’ appears in NYT articles between the years 1981 and 2009. I’ve used a more abstract visualization style here, but essentially it’s a bar graph, showing the increasing popularity of the term in the media, particularly in the last decade. Each blade of grass represents a month – 336 for the time period that is shown. We know from the last tutorial that we can find out the total number of articles in a certain month (here, January of 2004) using a call to the API that looks something like this:

http://api.nytimes.com/svc/search/v1/article?query=organic&begin_date=20040101&end_date=20040201&api-key=1af81d#######################:##:########

This call can be made easier if we use a search for a publication_month and a publication_year facet, instead of the begin_date/end_date:

http://api.nytimes.com/svc/search/v1/article?query=organic publication_year:[2004] publication_month:[01]&api-key=1af81d#######################:##:########

This search works because every article in the NTY database is associated with facets – pieces of information that are related to the article. So, every article published in the year 2004 has a publication_year facet with the value 2004. Similarly, every article published in the month of January (of any year) has a publication_month facet with a value of 01. There are a bunch of different facets available to search with – facets for people and organizations associated with the article (per_facet, org_facet), facets for page number in the paper (page_facet), and many more.

In Processing, I have built a set of simple classes to make using the Article Search API a bit easier. You can download them and see how they work here: NYT_API_Classes.zip . For example, to work with the above search, we could do this:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“organic”);
s.addFacetQuery(“publication_month”, “01″);
s.addFacetQuery(“publication_year”, “2004″);
TimesArticleSearchResult r = s.doSearch();
println (“There were ” + r.total + ” articles with the word organic in 2004.”);

This search will return the first 10 articles that match our search – we can access their body text, titles, author names, etc very easily:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“organic”);
s.addFacetQuery(“publication_month”, “01″);
s.addFacetQuery(“publication_year”, “2004″);
TimesArticleSearchResult r = s.doSearch();
println (“There were ” + r.total + ” articles with the word organic in 2004.”);
println (“The first article was titled: ” + r.results[0].title);

By using facets in our queries, we can conduct all sorts of interesting searches. For instance, if I wanted to find articles about that mentioned profit and talked about Apple, I could run this search:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“profit”);
s.addFacetQuery(“org_facet”, “APPLE COMPUTER INC”);
TimesArticleSearchResult r = s.doSearch();

NYTimes: Switching Enemies - Communism & Terrorism 1981-2009

The image above shows the frequency of the words ‘communism’ (bottom) and ‘terrorism’ (top) in New York Times articles since 1981. The yellow bars indicate the occurences that were on the front page – dimmer yellow bars give a picture of where in the paper the word was found. We get this information by requesting that facets be included in our search results. An example query for a single month would look like this:

http://api.nytimes.com/svc/search/v1/article?query=communism,terrorism publication_year:[2004] publication_month:[01]&facets=page_facet&api-key=1af81d#######################:##:########

With this query, we’ve asked the API to return us the page facet associated with each article. Again, I’ve tried to make it a bit easier to access this information. Here, we ask how many articles with the word ‘terrorism’ were on the front page in 2001:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“terrorism”);
s.addFacets(“page_facet”);
s.addFacetQuery(“publication_year”, “2001″);
TimesArticleSearchResult r = s.doSearch();
TimesFacetObject[] pageFacets = r.getFacetList(“page_facet”);
println (“There were ” + r.total + ” articles.”);
println (“There were ” + pageFacets[0].count + ” articles on the front page”);

Hopefully you can see that facets can be very helpful indeed when we are digging through the NYT data. I am still finding new ways to search myself, and rounding up data that can be useful for visualizations and other projects.

As always, I’d love to hear from you if you are using this code, or if you can suggest ways to make it better. I am planning on completing the code base and releasing it as a real Processing library, but that may take a while (given that I’ve never made a library before!)

Processing, JSON & The New York Times

NYTimes: Regulation and Innovation since 1981 (Radial)

Processing is my tool of choice for building visualizations for a number of reasons. First, it’s easy to start projects, quick to piece together code and simple to share the results – it’s an ideal rapid-prototyping tool. Second, the wide variety of libraries available mean that it’s easy to plug into data in various formats and process the data into an easily-usable form. Finally, output from Processing sketches can be exported as .TIFFs, .JPGs, .PNGs, .PDFs, .MOVs and more, making the end project easy to deploy.

I was reminded of all of these advantages last week when I sat down to build some simple visualizations from the newly-opened New York Times APIs. Within a few hours I was up and running – and within a few more hours I had some results that I was quite happy with. Since then, a lot of people have contacted me asking for the Processing code. In response, I’ve decided to share the code and to write this brief tutorial on connecting to the NYTimes APIs with Processing.

Here is the link to the source for my New York Times visualization tool: NYTimes.zip (168k)

For those of you who are interested in learning a bit more, here is a step-by-step guide to getting a very simple graph visualization working with Processing and the NYT Article Search API. If you don’t already have it, you’ll need to download Processing, if you don’t already have it. It’s free, and works on all platforms – download it here.

Step One: Set-up

I mentioned earlier that Processing has a bunch of libraries available to extend the basic codebase to handle all kinds of tasks. There are libraries for video, OpenGL, network communication, sound, hardware interface, and even linguistics. Strangely, though, there is no library for handling JSON – the lightweight data format that the NYTimes API uses to send the information that we request. Luckily, there is a JSON Java library, which can be very easily compiled into a .jar file and used in any sketch. For convenience sakes, I’ve packaged this into a library that can be dropped into your Processing libraries folder: JSON Processing Library. (I am assuming here that you are using the Processing 1.0 release or higher. In these new releases, libraries are added to your processing sketchbook folder (~user/Documents/Processing on a Mac) in a directory called ‘libraries’. If it doesn’t already exist, create it, and drop the unzipped ‘json’ folder inside). Once you’ve done that, open Processing and create a new sketch.

So, let’s review:

1. Download and install Processing.

2. Download the json library and drop it into your ‘libraries’ folder.

Now we’re ready to get started.

3. Open Processing and create a new Sketch.

4. From the ‘Sketch’ menu, select ‘Import Library…’. This will give you a drop-down list of the libraries available. For this example, we only want to import the json library.

Your sketch should just have one line, which should look like this:

import org.json.*;

Step Two: Plugging In

The New York Times API uses a pretty simple system. We send their server a request for some information, and they send us that information back, in JSON format. But, to control access to the information and to prevent server over-load, the Times asks everyone who will be using the system to sign up for an API Key – a unique value that lets them track who is using the system and for what. The first thing we’re going to in this section is to get an API Key and store it for use a bit later on. The NYT offers a set of different APIs to access different information, and each of them require a separate key.

1. Visit The New York Times Developer site and get an API key for the Article Search API.

2. Store this API key as a String (a string is simply a sequence of characters). We’re going to give this string the identifier ‘apiKey’. Your API key shouldn’t have the # characters!:

String apiKey = 1af81d#######################:##:########;

3. While we’re at it, we’ll store the URL that is used to access the article search API. This way, if the NYT happens to change this URL sometime down the road, we won’t have to hunt through our code to replace it:

String baseURL = “http://api.nytimes.com/svc/search/v1/article”;

4. Processing offers a set of built-in methods that are called automatically when our program is executed. We’re going to use the setup method in our example, and I’ll build the draw method as well just to get in the habit of doing this. Code inside the setup method will run once, when the program starts. Code inside the ‘draw’ method will execute once per frame for as long as our program is running:

void setup() {
};

void draw() {
};

Your code should now look something like this:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";

String apiKey = "1af81d#######################:##:########";

void setup() {

};

void draw() {

};

5. Now we’re going to write our own method to access the Article Search API and find out how many times a certain keyword was found in articles spanning a specific time period. For the sake of this example, let’s find out how many articles contained the phrase ‘O.J. Simpson’ in 1994 & 1995. To do this, we make a request that looks like this:

http://api.nytimes.com/svc/search/v1/article?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=1af81d#######################:##:########

Remember that we’ve stored the values for the base URL and our API key at the beginning of our code, so we can replace the actual values with the property identifiers:

baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;

A method to find out how many articles that mention O.J. in 1994 & 1995 might then look like this:

void getOJArticles() {

String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;

String result = join( loadStrings( request ), "");

println( result );

};

Go ahead and type or copy/paste that method into the code window. We can call this new method in the setup() wrapper to see the result when we run our program:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";
String apiKey = "1af81d#######################:##:########";

void setup() {
getOJArticles(); //THE METHOD GETS CALLED HERE
};

void draw() {
};

void getOJArticles() {
String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");
println( result );
};

If you run this code, you should see some text, wrapped up in a peculiar structure, appear in the output panel at the bottom of your Processing window. This is the data that is being returned from the Article Search API. Now we have to figure out how to get what we want out of the data.

Step Three: Digging through the JSON Data

The structure for the JSON data that is returned from this call to the NYTimes Article Search API looks something like this:


{
"offset" : "0",
"results" : [
{
"body": "Article Body",
"date" : "Article Data",
"title" : "Article Title",
"url" :"Article URL"
},
{ article 2 },
{article 3},
...
],
"tokens" : ["O", "J", "Simpson"],
"total": 2218
};

Whenever we see a set of curly braces, the data inside those braces is going to be parsed into a JSONObject. Whenever we see square braces, the data inside of those braces is going to be parsed into a JSONArray. So, we’re going to end up with one big JSONObject that contains a a string, two arrays, and an integer. Let’s get that information out.

Any code block that creates a JSONObject has to be ready to catch an exception if something goes wrong, so we wrap it in a try/catch statement:

void getOJArticles() {

String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
int total = nytData.getInt("total");
println ("There were " + total + " occurences of the term O.J. Simpson in 1994 & 1995");
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

};

We now have a function that will tell us how many times the term ‘O.J. Simpson’ was used by the New York Times in 1994 & 1995 . Which is good, if you are making the world’s most limited, O.J.-based visualization. In a real project, of course, we’d want to find out the occurrence of any phrase, in any time segment. With that in mind, let’s re-write our function to include some arguments, and to return an integer back to us (the ‘void’ at the beginning of the previous version of the method indicated that the method returned nothing):

int getArticleKeywordCount(String word, String beginDate, String endDate) {
String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

int total = 0;

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
total = nytData.getInt("total");
println ("There were " + total + " occurences of the term " + word + " between " + beginDate + " and " + endDate);
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

return(total);
};

Now we can send a whole pile of requests, with different keywords and different time periods. This is really useful when we want to compare the occurrence of different keywords over the same time period, or one keyword over many time periods. For instance, we might want to see the relative occurrences of other newsworthy items during the O.J. period:

void setup() {
getArticleKeywordCount("O.J.+Simpson", "19940101", "19960101" );
getArticleKeywordCount("Olympics", "19940101", "19960101" );
getArticleKeywordCount("Rwanda", "19940101", "19960101" );
};

Fishing out the other data from the JSONObject is reasonably easy, and follows the same general approach.

Step 4: Visualizing

The tricky business in this process is already over. Now all we have to do is use the numbers that we’ve retrieved to draw something to the screen. For the purposes of this tutorial, I’m just going to draw a set of coloured bars to indicate the relative frequency of use of each keyword during the time period. To avoid repetition, I am first building an array of words, and colours, then running through a for loop to draw the bars:

void setup() {

size(500,300);

String[] words = {"O.J.+Simpson", "Olympics", "South+Africa", "Super+Bowl", "Rwanda"};
color[] colors = {#FF0000, #00FF00, #0000FF, #FF3300, #FF9900};

int barSize = 25;
int startY = 80;

String start = "19940101";
String end = "19960101";

for (int i = 0; i < words.length; i++) {
    int freq = getArticleKeywordCount( words[i], start, end);
    fill(colors[i]);
    rect(0, startY + (barSize * i), freq/5, barSize);
};
};

You should get an image that looks like this:

One more review of the full code:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";
String apiKey = "1af81d#######################:##:########";

void setup() {

size(500,300);

String[] words = {"O.J.+Simpson", "Olympics", "South+Africa", "Super+Bowl", "Rwanda"};
color[] colors = {#FF0000, #00FF00, #0000FF, #FF3300, #FF9900};

int barSize = 25;
int startY = 80;

String start = "19940101";
String end = "19960101";

for (int i = 0; i < words.length; i++) {
   int freq = getArticleKeywordCount( words[i], start, end);
   fill(colors[i]);
   rect(0, startY + (barSize * i), freq/5, barSize);
};
};

void draw() {

};

int getArticleKeywordCount(String word, String beginDate, String endDate) {
String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

int total = 0;

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
total = nytData.getInt("total");
println ("There were " + total + " occurences of the term " + word + " between " + beginDate + " and " + endDate);
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

return(total);
};

And there we have it – a very simple graph visualization working from NYTimes data. Of course, this particular visualization is about as simple as it gets. Still, it gives us an idea of how easy it is to dig through the information available from this API and make some visual sense out of it.

What now?

Download the more complex example of NYT visualization to get a look at how to take things a bit further. One of the most interesting aspects of the NYT APIs is that they allow for faceted searching – we can get many different types of data from the same search query. For example, my ‘Sex & Scandal‘ visualization not only graphs the frequencies of the words ‘sex’ and ‘scandal’ over time; it also displays the organizations that were associated with each article (not surprisingly, the most popular organization associated with both of these keywords was the Catholic Church). We can also ask the API to tell us what page each article appeared on – giving us a very useful way to weight our search results beyond the normal ‘how many times the word appeared’ method. For more information on faceted searching, visit the NYTimes Developer’s page.

My hope is that we will see a lot more people digging into this data and using it in interesting ways. It seems to me that this information could be a boon to sociologists, artists, and data junkies everywhere. If you end up using the APIs yourself, I would certainly love to see what you create.