Processing, JSON & The New York Times

NYTimes: Regulation and Innovation since 1981 (Radial)

Processing is my tool of choice for building visualizations for a number of reasons. First, it’s easy to start projects, quick to piece together code and simple to share the results – it’s an ideal rapid-prototyping tool. Second, the wide variety of libraries available mean that it’s easy to plug into data in various formats and process the data into an easily-usable form. Finally, output from Processing sketches can be exported as .TIFFs, .JPGs, .PNGs, .PDFs, .MOVs and more, making the end project easy to deploy.

I was reminded of all of these advantages last week when I sat down to build some simple visualizations from the newly-opened New York Times APIs. Within a few hours I was up and running – and within a few more hours I had some results that I was quite happy with. Since then, a lot of people have contacted me asking for the Processing code. In response, I’ve decided to share the code and to write this brief tutorial on connecting to the NYTimes APIs with Processing.

Here is the link to the source for my New York Times visualization tool: NYTimes.zip (168k)

For those of you who are interested in learning a bit more, here is a step-by-step guide to getting a very simple graph visualization working with Processing and the NYT Article Search API. If you don’t already have it, you’ll need to download Processing, if you don’t already have it. It’s free, and works on all platforms – download it here.

Step One: Set-up

I mentioned earlier that Processing has a bunch of libraries available to extend the basic codebase to handle all kinds of tasks. There are libraries for video, OpenGL, network communication, sound, hardware interface, and even linguistics. Strangely, though, there is no library for handling JSON – the lightweight data format that the NYTimes API uses to send the information that we request. Luckily, there is a JSON Java library, which can be very easily compiled into a .jar file and used in any sketch. For convenience sakes, I’ve packaged this into a library that can be dropped into your Processing libraries folder: JSON Processing Library. (I am assuming here that you are using the Processing 1.0 release or higher. In these new releases, libraries are added to your processing sketchbook folder (~user/Documents/Processing on a Mac) in a directory called ‘libraries’. If it doesn’t already exist, create it, and drop the unzipped ‘json’ folder inside). Once you’ve done that, open Processing and create a new sketch.

So, let’s review:

1. Download and install Processing.

2. Download the json library and drop it into your ‘libraries’ folder.

Now we’re ready to get started.

3. Open Processing and create a new Sketch.

4. From the ‘Sketch’ menu, select ‘Import Library…’. This will give you a drop-down list of the libraries available. For this example, we only want to import the json library.

Your sketch should just have one line, which should look like this:

import org.json.*;

Step Two: Plugging In

The New York Times API uses a pretty simple system. We send their server a request for some information, and they send us that information back, in JSON format. But, to control access to the information and to prevent server over-load, the Times asks everyone who will be using the system to sign up for an API Key – a unique value that lets them track who is using the system and for what. The first thing we’re going to in this section is to get an API Key and store it for use a bit later on. The NYT offers a set of different APIs to access different information, and each of them require a separate key.

1. Visit The New York Times Developer site and get an API key for the Article Search API.

2. Store this API key as a String (a string is simply a sequence of characters). We’re going to give this string the identifier ‘apiKey’. Your API key shouldn’t have the # characters!:

String apiKey = 1af81d#######################:##:########;

3. While we’re at it, we’ll store the URL that is used to access the article search API. This way, if the NYT happens to change this URL sometime down the road, we won’t have to hunt through our code to replace it:

String baseURL = “http://api.nytimes.com/svc/search/v1/article”;

4. Processing offers a set of built-in methods that are called automatically when our program is executed. We’re going to use the setup method in our example, and I’ll build the draw method as well just to get in the habit of doing this. Code inside the setup method will run once, when the program starts. Code inside the ‘draw’ method will execute once per frame for as long as our program is running:

void setup() {
};

void draw() {
};

Your code should now look something like this:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";

String apiKey = "1af81d#######################:##:########";

void setup() {

};

void draw() {

};

5. Now we’re going to write our own method to access the Article Search API and find out how many times a certain keyword was found in articles spanning a specific time period. For the sake of this example, let’s find out how many articles contained the phrase ‘O.J. Simpson’ in 1994 & 1995. To do this, we make a request that looks like this:

http://api.nytimes.com/svc/search/v1/article?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=1af81d#######################:##:########

Remember that we’ve stored the values for the base URL and our API key at the beginning of our code, so we can replace the actual values with the property identifiers:

baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;

A method to find out how many articles that mention O.J. in 1994 & 1995 might then look like this:

void getOJArticles() {

String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;

String result = join( loadStrings( request ), "");

println( result );

};

Go ahead and type or copy/paste that method into the code window. We can call this new method in the setup() wrapper to see the result when we run our program:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";
String apiKey = "1af81d#######################:##:########";

void setup() {
getOJArticles(); //THE METHOD GETS CALLED HERE
};

void draw() {
};

void getOJArticles() {
String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");
println( result );
};

If you run this code, you should see some text, wrapped up in a peculiar structure, appear in the output panel at the bottom of your Processing window. This is the data that is being returned from the Article Search API. Now we have to figure out how to get what we want out of the data.

Step Three: Digging through the JSON Data

The structure for the JSON data that is returned from this call to the NYTimes Article Search API looks something like this:


{
"offset" : "0",
"results" : [
{
"body": "Article Body",
"date" : "Article Data",
"title" : "Article Title",
"url" :"Article URL"
},
{ article 2 },
{article 3},
...
],
"tokens" : ["O", "J", "Simpson"],
"total": 2218
};

Whenever we see a set of curly braces, the data inside those braces is going to be parsed into a JSONObject. Whenever we see square braces, the data inside of those braces is going to be parsed into a JSONArray. So, we’re going to end up with one big JSONObject that contains a a string, two arrays, and an integer. Let’s get that information out.

Any code block that creates a JSONObject has to be ready to catch an exception if something goes wrong, so we wrap it in a try/catch statement:

void getOJArticles() {

String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
int total = nytData.getInt("total");
println ("There were " + total + " occurences of the term O.J. Simpson in 1994 & 1995");
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

};

We now have a function that will tell us how many times the term ‘O.J. Simpson’ was used by the New York Times in 1994 & 1995 . Which is good, if you are making the world’s most limited, O.J.-based visualization. In a real project, of course, we’d want to find out the occurrence of any phrase, in any time segment. With that in mind, let’s re-write our function to include some arguments, and to return an integer back to us (the ‘void’ at the beginning of the previous version of the method indicated that the method returned nothing):

int getArticleKeywordCount(String word, String beginDate, String endDate) {
String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

int total = 0;

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
total = nytData.getInt("total");
println ("There were " + total + " occurences of the term " + word + " between " + beginDate + " and " + endDate);
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

return(total);
};

Now we can send a whole pile of requests, with different keywords and different time periods. This is really useful when we want to compare the occurrence of different keywords over the same time period, or one keyword over many time periods. For instance, we might want to see the relative occurrences of other newsworthy items during the O.J. period:

void setup() {
getArticleKeywordCount("O.J.+Simpson", "19940101", "19960101" );
getArticleKeywordCount("Olympics", "19940101", "19960101" );
getArticleKeywordCount("Rwanda", "19940101", "19960101" );
};

Fishing out the other data from the JSONObject is reasonably easy, and follows the same general approach.

Step 4: Visualizing

The tricky business in this process is already over. Now all we have to do is use the numbers that we’ve retrieved to draw something to the screen. For the purposes of this tutorial, I’m just going to draw a set of coloured bars to indicate the relative frequency of use of each keyword during the time period. To avoid repetition, I am first building an array of words, and colours, then running through a for loop to draw the bars:

void setup() {

size(500,300);

String[] words = {"O.J.+Simpson", "Olympics", "South+Africa", "Super+Bowl", "Rwanda"};
color[] colors = {#FF0000, #00FF00, #0000FF, #FF3300, #FF9900};

int barSize = 25;
int startY = 80;

String start = "19940101";
String end = "19960101";

for (int i = 0; i < words.length; i++) {
    int freq = getArticleKeywordCount( words[i], start, end);
    fill(colors[i]);
    rect(0, startY + (barSize * i), freq/5, barSize);
};
};

You should get an image that looks like this:

One more review of the full code:

import org.json.*;

String baseURL = "http://api.nytimes.com/svc/search/v1/article";
String apiKey = "1af81d#######################:##:########";

void setup() {

size(500,300);

String[] words = {"O.J.+Simpson", "Olympics", "South+Africa", "Super+Bowl", "Rwanda"};
color[] colors = {#FF0000, #00FF00, #0000FF, #FF3300, #FF9900};

int barSize = 25;
int startY = 80;

String start = "19940101";
String end = "19960101";

for (int i = 0; i < words.length; i++) {
   int freq = getArticleKeywordCount( words[i], start, end);
   fill(colors[i]);
   rect(0, startY + (barSize * i), freq/5, barSize);
};
};

void draw() {

};

int getArticleKeywordCount(String word, String beginDate, String endDate) {
String request = baseURL + "?query=" + word + "&begin_date=" + beginDate + "&end_date=" + endDate + "&api-key=" + apiKey;
String result = join( loadStrings( request ), "");

int total = 0;

try {
JSONObject nytData = new JSONObject(join(loadStrings(request), ""));
JSONArray results = nytData.getJSONArray("results");
total = nytData.getInt("total");
println ("There were " + total + " occurences of the term " + word + " between " + beginDate + " and " + endDate);
}
catch (JSONException e) {
println ("There was an error parsing the JSONObject.");
};

return(total);
};

And there we have it – a very simple graph visualization working from NYTimes data. Of course, this particular visualization is about as simple as it gets. Still, it gives us an idea of how easy it is to dig through the information available from this API and make some visual sense out of it.

What now?

Download the more complex example of NYT visualization to get a look at how to take things a bit further. One of the most interesting aspects of the NYT APIs is that they allow for faceted searching – we can get many different types of data from the same search query. For example, my ‘Sex & Scandal‘ visualization not only graphs the frequencies of the words ‘sex’ and ‘scandal’ over time; it also displays the organizations that were associated with each article (not surprisingly, the most popular organization associated with both of these keywords was the Catholic Church). We can also ask the API to tell us what page each article appeared on – giving us a very useful way to weight our search results beyond the normal ‘how many times the word appeared’ method. For more information on faceted searching, visit the NYTimes Developer’s page.

My hope is that we will see a lot more people digging into this data and using it in interesting ways. It seems to me that this information could be a boon to sociologists, artists, and data junkies everywhere. If you end up using the APIs yourself, I would certainly love to see what you create.

44 thoughts on “Processing, JSON & The New York Times”

  1. thank you very much.
    I will try and install it.
    Do you know where you can get more data from to use?
    i.e. not NYtimes.
    If i get it up and running i will use it for my school,
    i study graphic design in the Netherlands.
    I will show you the results.

    thank you cheers

  2. Hi Styn,

    Are you on a Mac or a PC? On a mac, the un-zipped json folder needs to go in the directory ~/Documents/Processing/libraries . To find the Processing sketchbook location on your computer, open the Preferences window from the Processing application and look for the “Sketchbook location” item at the top. Please pay attention to capitalization! If the folder isn’t there, make it yourself.

    Hope that helps.

    -Jer

  3. Great stuff. Thanks for the tutorial and the code. One thing I noticed is that you are not leveraging the publication_year, publication_month facets which could greatly simply and reduce the # of calls you are making. If you set a date range for a year and ask for the the facet publication_month – you will get a facet for every month that has results in it. This will greatly reduce the # of queries required and even possibly allow for some real time stuff.

    Even though publication_year,publication_month and publication_day – don’t have _facet tagged at the end of their field, they are in fact facets. You can also use them to limit your search – so instead of setting a date range you can also add publication_year:[1996] to your query and it will limit results to 1996. Two ways of doing the same thing. thanks again.

  4. Thanks, Derek.

    I figured there was a better way to do this – I will give your suggestions a try when I get a spare minute.

    I am excited to dig a bit further into the facets. From what I’ve read there seems to be a lot of interesting information that can be foraged from the data.

  5. can’t find the sketchbook, i make a new directory in Documents/Processing/libraries
    named JSON…., i drag the downloaded file there, then i open the program Processing. and go to the tab Sketch but the filr is not there only. -dxf, javascript, minin, net, opengl, pdf , serial , video.

    ?

  6. Hi Styn,

    The library (the folder called ‘json’) should be put in the libraries directory that you created.

    If this isn’t working (it should), you can also use the Sketch > Add File menu option and select the json.jar file that is in the json/library directory.

    Jer

  7. Derek:

    I went in and re-built taking advantage of the publication_month facet, and you’re right: if the goal is simply to chart the occurences of a specific keyword over a year, this is a MUCH faster and simpler method. However, with my visualizations I also attach the org_facet results for each individual month (these are the branching text pieces that you see). So, I think I am still stuck running an individual query for each month. Is this right?

    Jer

  8. This is an awesome tutorial! Thank You. I’m pretty new to processing and programming and have a question…

    How would I go about answering the question – what is the name of each author for each article written from 1981-1986 in some particular Times desk.

    Thanks.

  9. This is amazing! Thanks so much for putting it together and sharing it so openly with us. I have an API key, but whenever I run the program, I get a null pointer exception for this line:
    JSONObject nytData = new JSONObject(join(loadStrings(url), “”));

    Any idea what that might be?

  10. SA728 asked:

    How would I go about answering the question – what is the name of each author for each article written from 1981-1986 in some particular Times desk?

    Using my current Classes (see the next post), you’d have to construct your query something like this:

    TimesArticleSearch search = new TimesArticleSearch(); //create a new search object
    search.addFacetQuery(“desk_facet”, “Financial Desk”) //ask for articles from this desk
    search.addExtra(“begin_date”, “19810101″); //begin and end dates
    search.addExtra(“end_date”, “19860101″);
    TimesArticleSearchResult r = search.doSearch();

    This would give us a search result for our specific request. You’d have to then go through each content item and pull out the author name:

    ie:

    String firstAuthor = r.results[0].author;
    String secondAuthor = r.results[1].author;

    Hope that helps.

    -Jer

  11. Aaron – could be that your query has spaces in it? The newest version of the classes that I’ve written handle this, but for now you can replace them by the encoded character for a space – %20

  12. I realize you may not be following this thread anymore, but when I run the program I quickly run into an error:

    java.io.IOException: Server returned HTTP response code: 403 for URL: http://api.nytimes.com/svc/search/v1/article?quer
    at sun.net.http://www.protocol.http.HttpURLConnection.getInputStrea...
    at java.net.URL.openStream(URL.java:1007)

    at processing.core.PApplet.createInputRaw(PApplet.java:3919)

    at processing.core.PApplet.createInput(PApplet.java:3888)

    at processing.core.PApplet.loadStrings(PApplet.java:4119)

    at NYT_Bar2.getChunk(NYT_Bar2.java:214)

    at NYT_Bar2.buildMonthArray(NYT_Bar2.java:193)

    at NYT_Bar2.setup(NYT_Bar2.java:95)

    at processing.core.PApplet.handleDraw(PApplet.java:1400)

    at processing.core.PApplet.run(PApplet.java:1328)

    at java.lang.Thread.run(Thread.java:613)

    The file "http://api.nytimes.com/svc/search/v1/article?query=japan%20publication_year:1981%20publication_month:12&fields=+&facets=org_facet&api-key=##APIKEY##&quot; is missing or inaccessible, make sure the URL is valid or that the file has been added to your sketch and is readable.

    Exception in thread "Animation Thread" java.lang.NullPointerException

    at processing.core.PApplet.join(PApplet.java:5153)

    at NYT_Bar2.getChunk(NYT_Bar2.java:214)

    at NYT_Bar2.buildMonthArray(NYT_Bar2.java:193)

    at NYT_Bar2.setup(NYT_Bar2.java:95)

    at processing.core.PApplet.handleDraw(PApplet.java:1400)

    at processing.core.PApplet.run(PApplet.java:1328)

    at java.lang.Thread.run(Thread.java:613)

    The program will get through 10 or so out of 336 requests before spitting out the error. I also received an email from the NYTimes telling me that I had exceeded their max number of requests per second. I tried adjusting it by inserting a few delay(1000);s in a few of the for loops, but that didn’t solve it. Any ideas?

    Thanks Jer, and thanks for putting up such a great tutorial.

    1. Hi TimesFan,

      I will try to look into this over the next couple of days – I haven't played with the NYT code for a while!

      The error is definitely coming from hitting the API too frequently. I know the delay() calls don't work because I remember trying them when I was having the same problem myself!

      -Jer

  13. Thanks for posting this! I'm getting the same error as Aaron and Ryan (at end of post); my query is:

    String request = baseURL + "?query=Dan+Brown&begin_date=19900101&end_date=20090101&api-key=" + apiKey;
    String result = join(loadStrings(request), "");

    Any help would be great!

    //errors:
    java.io.IOException: Server returned HTTP response code: 403 for URL: http://api.nytimes.com/svc/search/v1/article?quer
    at sun.net.http://www.protocol.http.HttpURLConnection.getInputStrea...
    at java.net.URL.openStream(URL.java:1007)
    at processing.core.PApplet.createInputRaw(PApplet.java:3901)
    at processing.core.PApplet.createInput(PApplet.java:3870)
    at processing.core.PApplet.loadStrings(PApplet.java:4101)
    at nyTimesServerTest.getDanBrownBestSellers(nyTimesServerTest.java:38)
    at nyTimesServerTest.setup(nyTimesServerTest.java:29)
    at processing.core.PApplet.handleDraw(PApplet.java:1383)
    at processing.core.PApplet.run(PApplet.java:1311)
    at java.lang.Thread.run(Thread.java:613)
    The file "http://api.nytimes.com/svc/search/v1/article?query=Dan+Brown&begin_date=19900101&end_date=20090101&api-key=60117643a309bbb7e80e276511e734af:8:59492943&quot; is missing or inaccessible, make sure the URL is valid or that the file has been added to your sketch and is readable.
    Exception in thread "Animation Thread" java.lang.NullPointerException
    at processing.core.PApplet.join(PApplet.java:5129)
    at nyTimesServerTest.getDanBrownBestSellers(nyTimesServerTest.java:38)
    at nyTimesServerTest.setup(nyTimesServerTest.java:29)
    at processing.core.PApplet.handleDraw(PApplet.java:1383)
    at processing.core.PApplet.run(PApplet.java:1311)
    at java.lang.Thread.run(Thread.java:613)

    1. Hi Emily, I will try to look at this tomorrow – I don't get the error since my API key is whitelisted for tonnes of requests. If some of your queries are working before you get the error, the 403 is almost definitely coming because you are hitting the API too often. It may be that rather than using a loop, it would be better to hit the queries once per frame. I will try to rebuild this and see what I can figure out. Cheers, -Jer

  14. Hi Jer, I've tried all other user groups (NYT, Processing, Google) and to avail I am bothering you with this. I'm on a Mac OSX (10.5.8) and I've followed your tutorial up to Step Two, and this what happens. I get the red code with the error: " processing.app.debug.RunnerException: unexpected char: '' " I've re-checked the code and my api key a million times. Is it because I'm on a Mac I can't run it?

  15. I am so grateful for your swift reply. Yes this is what I have and it did not run.
    Let me also say that I never get these built-in methods " void setup() { }; " to happen automatically

    At Step Two, this is what I run (I copied exactly what the tutorial has, then add my apiKey):

    import org.json.*;

    String baseURL = “http://api.nytimes.com/svc/search/v1/article&rdqu

    String apiKey = “820151880ac5d2227033d9d1a48451cf:13:34125204”;

    void setup() {

    };

    void draw() {

    };

    };

    As a result I get (I have truncated below the "unexpected" part):
    processing.app.debug.RunnerException: unexpected char: ''
    at processing.app.Sketch.preprocess(Sketch.java:1395)
    at processing.app.Sketch.preprocess(Sketch.java:1194)
    at processing.app.Sketch.build(Sketch.java:1480)
    at processing.app.Sketch.compile(Sketch.java:1174)
    at processing.app.Editor.handleRun(Editor.java:1644) ………

  16. Yes, baseURL seems to have changed once I posted.
    I tried putting in each line. Your baseURL has new characters in it "" I put them in but nothing changed.

    The problem line is:
    String baseURL = "http://api.nytimes.com/svc/search/v1/article&quot;;
    or
    String baseURL = "http://api.nytimes.com/svc/search/v1/article&quot;;

    But WOW – I tried something else which worked.
    I hand typed EVERY CHARACTER back in and it solved the problem!!
    It has something to do with me cutting and pasting from your tutorial!! Must be extra invisible characters.
    I really appreciate your time and attention.
    I am so excited about your work. Thank you for your tutorial. Best, Jilly

  17. Just got this working, had to reduce the date range as I believe I was hitting the request limits set by the API key. Try:
    int s = 2000; // – Start year (this can go as far back as 1981)
    int e = 2009;

    and it should start working if you were getting the java.io.IOException: Server returned HTTP response code: 403 error

  18. Hello —

    Thanks for another great tutorial. I posted this question in the NYT API forum as well.

    Do you know if there is any way to get the API to return a few of the words surrounding the keyword?

  19. I need to parse some json data I got from an API in Processing and got here. Would be really nice to see some sort of documentation for your neat little library ! Can you please direct me to more learning resources for the JSON library ? Thanks

    [Offtopic] Great talk at decoded'11 btw!

  20. JSONObject nytData = new JSONObject(join(loadStrings(url), ""));

    I am getting a null pointer exception at this line.

    Also 403 error.

    1. For anyone getting the Null Pointer Exception at that line, try changing the code in the url:

      It works for me when formatted like this:
      String url = baseURL + "?query=" + word + "&publication_year:[" + yr + "]publication_month:[" + m + "]&fields=+&facets=" + facetString + "&api-key=" + apiKey;

  21. Great outline and detail to get up and running; We tried this recently, the tutorial works fine from ground up; working with the .zip files however it returns 403 error after a few example runs with shorter return dates (2005-20012); then I get an email from NYT – According to our log files, the following API key registered to you has exceeded the queries-per-second limit for the NYT Article Search API:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>