Category Archives: Tutorial

The Truth is In There: Research & Discovery with The Guardian Content API

Mulder & Scully

An article I wrote for The Guardian‘s Open Platform Blog was published earlier this week. It looks at some simple ways to use Processing to access information from the Guardian’s Content API. You can read the whole article and follow along with a short tutorial here.

Processing, JSON & The New York Times

NYTimes: Regulation and Innovation since 1981 (Radial)

Processing is my tool of choice for building visualizations for a number of reasons. First, it’s easy to start projects, quick to piece together code and simple to share the results – it’s an ideal rapid-prototyping tool. Second, the wide variety of libraries available mean that it’s easy to plug into data in various formats and process the data into an easily-usable form. Finally, output from Processing sketches can be exported as .TIFFs, .JPGs, .PNGs, .PDFs, .MOVs and more, making the end project easy to deploy.

I was reminded of all of these advantages last week when I sat down to build some simple visualizations from the newly-opened New York Times APIs. Within a few hours I was up and running – and within a few more hours I had some results that I was quite happy with. Since then, a lot of people have contacted me asking for the Processing code. In response, I’ve decided to share the code and to write this brief tutorial on connecting to the NYTimes APIs with Processing.

Here is the link to the source for my New York Times visualization tool: NYTimes.zip (168k)

For those of you who are interested in learning a bit more, here is a step-by-step guide to getting a very simple graph visualization working with Processing and the NYT Article Search API. If you don’t already have it, you’ll need to download Processing, if you don’t already have it. It’s free, and works on all platforms – download it here.

Step One: Set-up

I mentioned earlier that Processing has a bunch of libraries available to extend the basic codebase to handle all kinds of tasks. There are libraries for video, OpenGL, network communication, sound, hardware interface, and even linguistics. Strangely, though, there is no library for handling JSON – the lightweight data format that the NYTimes API uses to send the information that we request. Luckily, there is a JSON Java library, which can be very easily compiled into a .jar file and used in any sketch. For convenience sakes, I’ve packaged this into a library that can be dropped into your Processing libraries folder: JSON Processing Library. (I am assuming here that you are using the Processing 1.0 release or higher. In these new releases, libraries are added to your processing sketchbook folder (~user/Documents/Processing on a Mac) in a directory called ‘libraries’. If it doesn’t already exist, create it, and drop the unzipped ‘json’ folder inside). Once you’ve done that, open Processing and create a new sketch.

So, let’s review:

1. Download and install Processing.

2. Download the json library and drop it into your ‘libraries’ folder.

Now we’re ready to get started.

3. Open Processing and create a new Sketch.

4. From the ‘Sketch’ menu, select ‘Import Library…’. This will give you a drop-down list of the libraries available. For this example, we only want to import the json library.

Your sketch should just have one line, which should look like this:

import org.json.*;

Step Two: Plugging In

The New York Times API uses a pretty simple system. We send their server a request for some information, and they send us that information back, in JSON format. But, to control access to the information and to prevent server over-load, the Times asks everyone who will be using the system to sign up for an API Key – a unique value that lets them track who is using the system and for what. The first thing we’re going to in this section is to get an API Key and store it for use a bit later on. The NYT offers a set of different APIs to access different information, and each of them require a separate key.

1. Visit The New York Times Developer site and get an API key for the Article Search API.

2. Store this API key as a String (a string is simply a sequence of characters). We’re going to give this string the identifier ‘apiKey’. Your API key shouldn’t have the # characters!:

String apiKey = 1af81d#######################:##:########;

3. While we’re at it, we’ll store the URL that is used to access the article search API. This way, if the NYT happens to change this URL sometime down the road, we won’t have to hunt through our code to replace it:

String baseURL = “http://api.nytimes.com/svc/search/v1/article”;

4. Processing offers a set of built-in methods that are called automatically when our program is executed. We’re going to use the setup method in our example, and I’ll build the draw method as well just to get in the habit of doing this. Code inside the setup method will run once, when the program starts. Code inside the ‘draw’ method will execute once per frame for as long as our program is running:

void setup() {
};

void draw() {
};

Your code should now look something like this:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";

String apiKey = "1af81d#######################:##:########";

void setup() {

};

void draw() {

};

5. Now we’re going to write our own method to access the Article Search API and find out how many times a certain keyword was found in articles spanning a specific time period. For the sake of this example, let’s find out how many articles contained the phrase ‘O.J. Simpson’ in 1994 & 1995. To do this, we make a request that looks like this:

http://api.nytimes.com/svc/search/v1/article?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=1af81d#######################:##:########

Remember that we’ve stored the values for the base URL and our API key at the beginning of our code, so we can replace the actual values with the property identifiers:

baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;

A method to find out how many articles that mention O.J. in 1994 & 1995 might then look like this:

void getOJArticles() {

String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;

String result = join( loadStrings( request ), "");

println( result );

};

Go ahead and type or copy/paste that method into the code window. We can call this new method in the setup() wrapper to see the result when we run our program:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";
String apiKey = "1af81d#######################:##:########";

void setup() {
getOJArticles(); //THE METHOD GETS CALLED HERE
};

void draw() {
};

void getOJArticles() {
String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");
println( result );
};

If you run this code, you should see some text, wrapped up in a peculiar structure, appear in the output panel at the bottom of your Processing window. This is the data that is being returned from the Article Search API. Now we have to figure out how to get what we want out of the data.

Step Three: Digging through the JSON Data

The structure for the JSON data that is returned from this call to the NYTimes Article Search API looks something like this:


{
"offset" : "0",
"results" : [
{
"body": "Article Body",
"date" : "Article Data",
"title" : "Article Title",
"url" :"Article URL"
},
{ article 2 },
{article 3},
...
],
"tokens" : ["O", "J", "Simpson"],
"total": 2218
};

Whenever we see a set of curly braces, the data inside those braces is going to be parsed into a JSONObject. Whenever we see square braces, the data inside of those braces is going to be parsed into a JSONArray. So, we’re going to end up with one big JSONObject that contains a a string, two arrays, and an integer. Let’s get that information out.

Any code block that creates a JSONObject has to be ready to catch an exception if something goes wrong, so we wrap it in a try/catch statement:

void getOJArticles() {

String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
int total = nytData.getInt("total");
println ("There were " + total + " occurences of the term O.J. Simpson in 1994 & 1995");
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

};

We now have a function that will tell us how many times the term ‘O.J. Simpson’ was used by the New York Times in 1994 & 1995 . Which is good, if you are making the world’s most limited, O.J.-based visualization. In a real project, of course, we’d want to find out the occurrence of any phrase, in any time segment. With that in mind, let’s re-write our function to include some arguments, and to return an integer back to us (the ‘void’ at the beginning of the previous version of the method indicated that the method returned nothing):

int getArticleKeywordCount(String word, String beginDate, String endDate) {
String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

int total = 0;

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
total = nytData.getInt("total");
println ("There were " + total + " occurences of the term " + word + " between " + beginDate + " and " + endDate);
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

return(total);
};

Now we can send a whole pile of requests, with different keywords and different time periods. This is really useful when we want to compare the occurrence of different keywords over the same time period, or one keyword over many time periods. For instance, we might want to see the relative occurrences of other newsworthy items during the O.J. period:

void setup() {
getArticleKeywordCount("O.J.+Simpson", "19940101", "19960101" );
getArticleKeywordCount("Olympics", "19940101", "19960101" );
getArticleKeywordCount("Rwanda", "19940101", "19960101" );
};

Fishing out the other data from the JSONObject is reasonably easy, and follows the same general approach.

Step 4: Visualizing

The tricky business in this process is already over. Now all we have to do is use the numbers that we’ve retrieved to draw something to the screen. For the purposes of this tutorial, I’m just going to draw a set of coloured bars to indicate the relative frequency of use of each keyword during the time period. To avoid repetition, I am first building an array of words, and colours, then running through a for loop to draw the bars:

void setup() {

size(500,300);

String[] words = {"O.J.+Simpson", "Olympics", "South+Africa", "Super+Bowl", "Rwanda"};
color[] colors = {#FF0000, #00FF00, #0000FF, #FF3300, #FF9900};

int barSize = 25;
int startY = 80;

String start = "19940101";
String end = "19960101";

for (int i = 0; i < words.length; i++) {
    int freq = getArticleKeywordCount( words[i], start, end);
    fill(colors[i]);
    rect(0, startY + (barSize * i), freq/5, barSize);
};
};

You should get an image that looks like this:

One more review of the full code:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";
String apiKey = "1af81d#######################:##:########";

void setup() {

size(500,300);

String[] words = {"O.J.+Simpson", "Olympics", "South+Africa", "Super+Bowl", "Rwanda"};
color[] colors = {#FF0000, #00FF00, #0000FF, #FF3300, #FF9900};

int barSize = 25;
int startY = 80;

String start = "19940101";
String end = "19960101";

for (int i = 0; i < words.length; i++) {
   int freq = getArticleKeywordCount( words[i], start, end);
   fill(colors[i]);
   rect(0, startY + (barSize * i), freq/5, barSize);
};
};

void draw() {

};

int getArticleKeywordCount(String word, String beginDate, String endDate) {
String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

int total = 0;

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
total = nytData.getInt("total");
println ("There were " + total + " occurences of the term " + word + " between " + beginDate + " and " + endDate);
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

return(total);
};

And there we have it – a very simple graph visualization working from NYTimes data. Of course, this particular visualization is about as simple as it gets. Still, it gives us an idea of how easy it is to dig through the information available from this API and make some visual sense out of it.

What now?

Download the more complex example of NYT visualization to get a look at how to take things a bit further. One of the most interesting aspects of the NYT APIs is that they allow for faceted searching – we can get many different types of data from the same search query. For example, my ‘Sex & Scandal‘ visualization not only graphs the frequencies of the words ‘sex’ and ‘scandal’ over time; it also displays the organizations that were associated with each article (not surprisingly, the most popular organization associated with both of these keywords was the Catholic Church). We can also ask the API to tell us what page each article appeared on – giving us a very useful way to weight our search results beyond the normal ‘how many times the word appeared’ method. For more information on faceted searching, visit the NYTimes Developer’s page.

My hope is that we will see a lot more people digging into this data and using it in interesting ways. It seems to me that this information could be a boon to sociologists, artists, and data junkies everywhere. If you end up using the APIs yourself, I would certainly love to see what you create.