Tag Archives: guardian

Hacking the Newsroom @ Flashbelt

Nye's Polonaise room

** note: Photo above is not the official conference venue. It is, however, the spiritual heart.

Flashbelt is a new media conference that happens every year in Minneapolis – this year it runs from June 7th to 10th. I’m going to skip the paragraphs and paragraphs I could write extolling the virtues of this unique and engaging event, and instead, offer an invitation:

Come to Flashbelt. I’ll give you free stuff.

This year I’ll be talking about my work with The New York Times & The Guardian‘s newly-released APIs. I’ll talk about my inspiration, walk through my process, and I’ll show everyone how to easily get into experimenting and working with these fascinating repositories of data. I’ll even show some top-secret new work that I’ve been developing just for the conference. I’m really excited about this presentation.

Every year I leave Flashbelt event feeling inspired, and having met dozens of interesting people doing amazing work. The event is so much more than just a Flash conference – it’s a meeting-of-the-minds for new media creatives of all stripes.

The conference is a great deal – and there are only 400 tickets to be sold. Airline fares are cheap these days, you have some vacation time booked off – so head over to the website and sign up.

To sweeten the deal, a guarantee: If you buy a ticket to Flashbelt after reading this post, drop a comment below. I’ll give you a limited edition NYTimes 365/360 6″ x 6″ print. Also, I’ll be sure to say hello.

The Truth is In There: Research & Discovery with The Guardian Content API

Mulder & Scully

An article I wrote for The Guardian‘s Open Platform Blog was published earlier this week. It looks at some simple ways to use Processing to access information from the Guardian’s Content API. You can read the whole article and follow along with a short tutorial here.

The Guardian Open Platform

This morning, The Guardian announced the launch of The Guardian Open Platform, a suite of services designed to give access for developers to Guardian content. Of course, this follows hot on the heels of The New York Times’ API releases, which I have discussed in detail on this blog, and have created a series of visualizations to explore.

Visualizing the Guardian: Blair & Brown v.2

Comparisons to the NYTimes APIs will be inevitable. Instead of debating the various selling points of both systems, I’ll give a short introduction to what is available from the Guardian and show a few early sketches that I have made with the data.

The most interesting thing about the Guardian Content API is that it offers access to full text for every article. This is good, in that it gives us a much bigger set of data to work with. Unfortunately, the first version of the API release doesn’t let you control the verbosity of the return – so you are getting sometimes a lot of content to process from each call. This means that making ‘simple’ visualizations of keyword frequency can take a lot longer. On the bright side, it also means that we have a lot more data to work with. Though I’ve started by building some simple graphing tools, I am excited about being able to dig into the full body text of the articles – I think there are a lot of possibilities there for linguistic analysis, etc.

Visualizing the Guardian:  Beckham and Rooney

The Content API doesn’t allow for faceted searching in the same way as the NYT Article Search API, but it does give us some fairly easy ways to refine and control our searches. For example, if I wanted to find out how many times the Guardian mentioned David Beckham, I might use a query like this:


I can narrow down on a specific chunk of time using the before and after parameters:


I can further refine the results of this search by using filters, which are at the core of how the API works, and can be very useful in locating specific sets of information. For example, this search would result in stories about Beckham and football:


Whereas this search would result in stories which had a cultural angle:


A full list of filters can be received through the API endpoint – and every content piece retrieved through the search will also include a list of its related tags & filter codes.

The return from these calls can be retrieved as XML, JSON, or ATOM, by changing the format parameter. Full documentation is available on the Guardian site.

Visualizing the Guardian: Surveillance & Privacy

Along with the Content API, the Guardian has also launched their Data Store – a collection of curated data that has been used in the past by the Guardian’s editorial staff when researching articles and producing projects. I will be writing more about this later this week, but for now you can check out the offerings on the Data Store site, and read a bit about it on the Guardian’s Data Blog.