Multi-faceted Searching with the NYTimes APIs

In my last post, I walked through the process of making simple requests to the New York Times Article Search API. Near the end of the post, I mentioned that the API allows for something called ‘faceted searching’ which can make the whole process a bit easier, and which can also allow users to uncover more interesting data and data relationships. In this post, I’ll show how to use facets in searches with the Article Search API and Processing – I’ll also share some code that I’ve written which makes the whole process a lot easier.

NYTimes: Going Organic 1981-2009

The image above makes use of the API to find out how often the word ‘organic’ appears in NYT articles between the years 1981 and 2009. I’ve used a more abstract visualization style here, but essentially it’s a bar graph, showing the increasing popularity of the term in the media, particularly in the last decade. Each blade of grass represents a month – 336 for the time period that is shown. We know from the last tutorial that we can find out the total number of articles in a certain month (here, January of 2004) using a call to the API that looks something like this:

http://api.nytimes.com/svc/search/v1/article?query=organic&begin_date=20040101&end_date=20040201&api-key=1af81d#######################:##:########

This call can be made easier if we use a search for a publication_month and a publication_year facet, instead of the begin_date/end_date:

http://api.nytimes.com/svc/search/v1/article?query=organic publication_year:[2004] publication_month:[01]&api-key=1af81d#######################:##:########

This search works because every article in the NTY database is associated with facets – pieces of information that are related to the article. So, every article published in the year 2004 has a publication_year facet with the value 2004. Similarly, every article published in the month of January (of any year) has a publication_month facet with a value of 01. There are a bunch of different facets available to search with – facets for people and organizations associated with the article (per_facet, org_facet), facets for page number in the paper (page_facet), and many more.

In Processing, I have built a set of simple classes to make using the Article Search API a bit easier. You can download them and see how they work here: NYT_API_Classes.zip . For example, to work with the above search, we could do this:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“organic”);
s.addFacetQuery(“publication_month”, “01″);
s.addFacetQuery(“publication_year”, “2004″);
TimesArticleSearchResult r = s.doSearch();
println (“There were ” + r.total + ” articles with the word organic in 2004.”);

This search will return the first 10 articles that match our search – we can access their body text, titles, author names, etc very easily:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“organic”);
s.addFacetQuery(“publication_month”, “01″);
s.addFacetQuery(“publication_year”, “2004″);
TimesArticleSearchResult r = s.doSearch();
println (“There were ” + r.total + ” articles with the word organic in 2004.”);
println (“The first article was titled: ” + r.results[0].title);

By using facets in our queries, we can conduct all sorts of interesting searches. For instance, if I wanted to find articles about that mentioned profit and talked about Apple, I could run this search:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“profit”);
s.addFacetQuery(“org_facet”, “APPLE COMPUTER INC”);
TimesArticleSearchResult r = s.doSearch();

NYTimes: Switching Enemies - Communism & Terrorism 1981-2009

The image above shows the frequency of the words ‘communism’ (bottom) and ‘terrorism’ (top) in New York Times articles since 1981. The yellow bars indicate the occurences that were on the front page – dimmer yellow bars give a picture of where in the paper the word was found. We get this information by requesting that facets be included in our search results. An example query for a single month would look like this:

http://api.nytimes.com/svc/search/v1/article?query=communism,terrorism publication_year:[2004] publication_month:[01]&facets=page_facet&api-key=1af81d#######################:##:########

With this query, we’ve asked the API to return us the page facet associated with each article. Again, I’ve tried to make it a bit easier to access this information. Here, we ask how many articles with the word ‘terrorism’ were on the front page in 2001:

TimesArticleSearch s = new TimesArticleSearch();
s.addQueries(“terrorism”);
s.addFacets(“page_facet”);
s.addFacetQuery(“publication_year”, “2001″);
TimesArticleSearchResult r = s.doSearch();
TimesFacetObject[] pageFacets = r.getFacetList(“page_facet”);
println (“There were ” + r.total + ” articles.”);
println (“There were ” + pageFacets[0].count + ” articles on the front page”);

Hopefully you can see that facets can be very helpful indeed when we are digging through the NYT data. I am still finding new ways to search myself, and rounding up data that can be useful for visualizations and other projects.

As always, I’d love to hear from you if you are using this code, or if you can suggest ways to make it better. I am planning on completing the code base and releasing it as a real Processing library, but that may take a while (given that I’ve never made a library before!)

15 thoughts on “Multi-faceted Searching with the NYTimes APIs

  1. this is incredible. hopefully i’ll have some time this week to try your code and get familiar, with the processing code. do you know if there are any libraries/bindings for ruby on rails? i would love to see this implemented in websites, that anyone can search the new york times and get some beautiful graphics. anyway i guess this api gives tremendous opportunities.

    1. Hi Manuel,

      I’m not sure about ruby – I’d check over at the NYTimes developer site.

      The code is pretty rudimentary right now. I’m hoping I’ll have some time over the next little while to put it together into something a bit more cohesive.

  2. Hey Jer,
    i got a basic ruby implementation done. just ported your classes and used a ruby gem (ruby-processing) to do the processing stuff. know i have to dig deeper into processing and get this ruby-processing gem work with rails.

    i followed your hint and found a ruby gem. check out this full api implementation on github: http://github.com/harrisj/nytimes-articles/tree/master

    i guess it’s pretty easy to port it to java…

    will get back to you as soon as i have a working app. thx four sharing your code and inspiring me.

  3. Yeah, I'm making loads of progress on the nytimes-articles gem for Ruby, so if you're doing some Ruby hacking, please use it and give me feedback. Thanks! And excellent visualizations!

  4. I was getting HTTP response code 400 for all searches where the url contains spaces. After encoding all spaces to %20 all searches worked for me in Java. Thanks!

    String urlEncodedSpaces = url.replaceAll(“\\s”, “%20″);

  5. I may be missing it, but I don’t see any support for the begin_date and end_date parameters. I tried using addFacetQuery(begin_date, “20080101″) and addFacetQuery(end_date, “20081230″) which didn’t work exactly right (every query returned 0 articles). Is there support for this?

  6. Hi,

    end_date and begin_date are currently implemented like this:

    TimesArticleSearch s = new TimesArticleSearch();
    s.addExtra(“begin_date”, “20080101″);
    s.addExtra(“end_date”, “20090101″);

    -Jer

  7. thanks for the great library. I made a small change that others might find useful. Making addFacet, addFields, addFacetQuery, etc return this, you can chain method calls together. not very java-like, but some may find it easy to read:
    s.addQueries(“terrorism”).addFacets(“page_facet”).addFacetQuery(“publication_year”, “2001″);

  8. hi. COULD YOU HELP ON SOMETHING VERY SIMPLE?
    can you tell us something super simple for non-power users;
    in this great example — how do you get it to print out a field such as a byline

    here is your code from classes_export — where I'm trying to get the byline.
    I can see the bylines with a println in process results, but cannot figure out how
    to print it in the main API_Classes_export.

    THANKS AHEAD OF TIME!

    // addeTimesArticleSearch s = new TimesArticleSearch();
    s.addQueries("margiela");
    s.addFacets("page_facet");
    s.addFacetQuery("publication_year", "2001");
    // added
    s.addFields("byline");
    TimesArticleSearchResult r = s.doSearch();
    TimesFacetObject[] pageFacets = r.getFacetList("page_facet");
    //added
    //HOW TO GET THE BYLINES???? should be something with TimesResultObject
    println ("There were " + r.total + " articles.");
    println ("There were " + pageFacets[0].count + " articles on the front page");

    1. Hi,

      The TimesArticleSearchResult that you get back from any search has an array of TimesResultObject instances representing each story returned. TimesResultObject instances have a field for byline (as well as author, body, date, lead_paragraph, title, etc.)

      So, to get the byline of the first article result in the search below, you could do this:

      String theByline = r.results[0].byline

      Hope that helps!

      -Jer

  9. hi
    long back i had tried something close to this with out the use of technology,since it was rare in india to use tech for social sciences. this was more about locational choices for commercial operations. infact in 1996 i wrote a project and got 2lakhs sanctioned for that. i had to let it go to my colleagues as i was posted as the Officer on special
    Duty in the University- though foolish i gave up this project and asked them to do it- they din't , it was killed . But now i realise how important this is. my imagination has been paid off. I need to learn however the techniques of all this sooner than later.

  10. You sir, are a rockstar. I'm an ecologist/applied mathematician and have been looking into Processing for nice visualization – and I came across your blog. Your work is totally appreciated. Now I just have to get all of my R code to play nicely with Processing…

  11. When using these classes, I am just getting a runtime error: " — unexpected character '' –" and I can't find whee that comes from. What am I ding wrong?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>