This morning, The Guardian announced the launch of The Guardian Open Platform, a suite of services designed to give access for developers to Guardian content. Of course, this follows hot on the heels of The New York Times’ API releases, which I have discussed in detail on this blog, and have created a series of visualizations to explore.
Comparisons to the NYTimes APIs will be inevitable. Instead of debating the various selling points of both systems, I’ll give a short introduction to what is available from the Guardian and show a few early sketches that I have made with the data.
The most interesting thing about the Guardian Content API is that it offers access to full text for every article. This is good, in that it gives us a much bigger set of data to work with. Unfortunately, the first version of the API release doesn’t let you control the verbosity of the return – so you are getting sometimes a lot of content to process from each call. This means that making ’simple’ visualizations of keyword frequency can take a lot longer. On the bright side, it also means that we have a lot more data to work with. Though I’ve started by building some simple graphing tools, I am excited about being able to dig into the full body text of the articles – I think there are a lot of possibilities there for linguistic analysis, etc.
The Content API doesn’t allow for faceted searching in the same way as the NYT Article Search API, but it does give us some fairly easy ways to refine and control our searches. For example, if I wanted to find out how many times the Guardian mentioned David Beckham, I might use a query like this:
http://api.guardianapis.com/content/search?q=environment&format=xml&api-key=
I can narrow down on a specific chunk of time using the before and after parameters:
http://api.guardianapis.com/content/search?q=environment&before=20080101&after=20070101format=xml&api-key=
I can further refine the results of this search by using filters, which are at the core of how the API works, and can be very useful in locating specific sets of information. For example, this search would result in stories about Beckham and football:
http://api.guardianapis.com/content/search?q=environment&filter=/football&before=20080101&after=20070101format=xml&api-key=
Whereas this search would result in stories which had a cultural angle:
http://api.guardianapis.com/content/search?q=environment&filter=/culturebefore=20080101&after=20070101format=xml&api-key=
A full list of filters can be received through the API endpoint – and every content piece retrieved through the search will also include a list of its related tags & filter codes.
The return from these calls can be retrieved as XML, JSON, or ATOM, by changing the format parameter. Full documentation is available on the Guardian site.
Along with the Content API, the Guardian has also launched their Data Store – a collection of curated data that has been used in the past by the Guardian’s editorial staff when researching articles and producing projects. I will be writing more about this later this week, but for now you can check out the offerings on the Data Store site, and read a bit about it on the Guardian’s Data Blog.
























5 Comments
As Simon Willison notes on his personal blog post on the launch, “you can use ?count=0 in your search API [query] to turn off results entirely and just get back the filters section”. This might help with visualisations (as well as the suggested tag browser).
Hi Paul,
Thanks for the tip -that should be useful when doing preliminary scans for interesting data. In most cases, though, I *do* want to see the content details – just not necessarily the whole body of the article.
This was a pretty quick post – written, appropriately, in the waiting area at Heathrow. I’ll be writing some more detailed posts and creating some more visualizations very soon.
Can Obsessing [ http://obsessing.org/ ] import data from a URI (e.g. one of the Guardian data store spreadsheets, or the Guardian content API?
If so, then it makes all sorts of “built purely in the browser” demos possible?
Impressive work and a great tool for visualization of complex interlinkages. At Transparency International, an international NGO fighting corruption (www.transparency.org), we are always wondering how to best visualise corruption, and, as a positive counter-concept, transparency. At least the latter seems to be some sort of a buzz word currently, with the new US administration’s promises. I’ll keep following your blog, maybe I it’ll give me some ideas.
Again a nice set of visualization! Thanks for sharing the experiences behind your work and the comparison between the NY Times API and the Guardian API.