Tag Archives: nytimes

State of the Union(s)

New York Times, 01/27/10 - State of the Union Graphic

I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ‘state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.

The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)

Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:

State of the Union Graphs

Another avenue I explored was to use the word weights to determine a ‘score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:

SOTU analysis - Sentence Weighting- UK

The Netherlands:

SOTU analysis - Sentence Weighting - Netherlands

And Botswana:

SOTU analysis - Sentence Weighting - Botswana

Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with:

The Most Important Sentence From Obama's State of the Union Address?

7 Days of Source Day #6: NYTimes GraphMaker

NYTimes Drug Diptych

Project: NYTimes GraphMaker
Date: Fall, 2009
Language: Processing
Key Concepts: Data vizualization, graphing, NYTimes Article Search API

Overview:

The New York Times Article Search API gives us access to a mountain of data: more than 2.6 million indexed articles. There must be countless discoveries waiting to be made in this vast pile of information – we just need more people with shovels! With that in mind, I wanted to release a really simple example of using Processing to access word trend information from the Article Search API. Since I made this project in February, the clever folks at the NYT research lab have released an online tool to explore word trends, but I think it’s useful to have the Processing code released for those of us who want to poke around the data in a slightly deeper way. Indeed, I hope this sketch can act as a starting point for people to take some more involved forays into the dataset – it is ripe to be customized and changed and improved.

This is the simplest project I’m sharing in this now multi-week source release. It should be a nice starting point for those of you who have some programming experience but haven’t done too much in the way of data visualization. As always, if you have questions, feel free to send me an e-mail or post in the comments section below.

You can see a whole pile of radial and standard bar graphs that I made with this sketch earlier in the year in this Flickr set.

Getting Started:

You’ll need the toxiclibs core, which you can download here. Put the unzipped library into the ‘libraries’ folder in your sketchbook (if there isn’t one already, create one).

Put the folder ‘NYT_GraphMaker’ into your Processing sketch folder. Open Processing and open the sketch from the File > Sketchbook menu. You’ll find detailed instructions in the header of the main tab (theNYT_GraphMaker.pde file).

Thanks:

It’s starting to get a bit repetitive, but once again this file depends on Karsten Schmidt’s toxiclibs. These libraries are so good they should ship with Processing.

Download: GraphMaker.zip(88k)


CC-GNU GPL

This software is licensed under the CC-GNU GPL version 2.0 or later.

7 Days of Source Day #2: NYTimes 365/360

NYTimes: 365/360 - 2009 (in color)

Project: NYTimes 365/360
Date: February, 2009
Language: Processing
Key Concepts: Data Visualization, NYTimes Article Search API, HashMaps & ArrayLists

Overview:

Many have you have already seen the series of visualizations that I created early in the year using the newly-released New York Times APIs. The most complex of these were in the 365/360 series in which I tried to distill an entire year of news stories into a single graphic. The resulting visualizations (2009 is picture above) capture the complex relationships – and somewhat tangled mess – that is a year in the news.

This release is a single sketch. I’ll be releasing the Article Search API Processing code as a library later in the week, but I wanted to show this project as it sits, with all of the code intact. The output from this sketch is a set of .PDFs which are suitable for print. Someday I’d like to show the entire series of these as a set of 6′ x 6′ prints – of course, someday I’d also like a solid-gold skateboard and a castle made of cheese.

That said, really nice, archival quality prints from this project (and the one I’ll be releasing tomorrow) are for sale in my Etsy shop. I realize that you’ll all be able to make your own prints now (and you are certainly welcome to do so) – but if you really enjoy the work and want to have a signed print to hang on your wall, you know who to talk to.

Getting Started:

Put the folder ‘NYT_365_360’ into your Processing sketch folder. Open Processing and open the sketch from the File > Sketchbook menu. You’ll find detailed instructions in the header of the main tab (the NYT_365_360.pde file).

Thanks:

Most of the credit for this sketch goes to the clever kids at the NYT who made the amazing Article Search API. This is the gold standard of APIs, and really is a dream to use. As you’ll see if you dig into the code, each of these complicated graphics is made with just 21 calls to the API. I can’t imagine the amount of blood, sweat, and tears that would go into making a graphic like this the old-fashioned way.

Speaking of gold standards, Robert Hodgin got me pointed to ArrayLists in the first place, and has been helpful many times over the last few years as I’ve tried to solve a series of ridiculously simple problems in Processing. Thanks, Robert!

Download: NYT365.zip (140k)


CC-GNU GPL

This software is licensed under the CC-GNU GPL version 2.0 or later.

We Are Beginning to See Positive Signs for our Industry — Bear Stearns, Lehman Brothers, Freddie Mac & Fanny Mae: 1984-2009

IMG_3121

For The Data Art Show in June at the Pink Hobo Gallery in Minneapolis, I created a 20′ long print visualizing the major players in the financial crisis, and their in-print relationships.

The Print, titled ‘We Are Beginning to See Positive Signs for our Industry — Bear Stearns, Lehman Brothers, Freddie Mac & Fanny Mae :1984-2009′ was made in Processing, using the NYTimes Article Search API. It was printed on kraft paper, and hung somewhat haphazardly using a handful of pushpins (certainly the easiest install I’ve ever had to do).

I know these images don’t show the whole piece – I am trying to track down a full-frame image of it that isn’t (like the image below) from my iPhone. It turns out it’s difficult to photograph a 20′ print in a room that is 14’ wide.

We Are Beginning to See Positive Signs for our Industry — Bear Stearns, Lehman Brothers, Freddie Mac & Fanny Mae: 1984-2009

The show also featured some excellent work from James Paterson and Mario Klingemann, which you can read about in this article, and see a bit of in this photo set. I’ll also be putting together a more thorough documentation of the show over the next few weeks.

IMG_3123

IMG_3139

Flashbelt ’09 – Hacking the Newsroom Followup

On Wednesday I had the chance to talk at Flashbelt, a web media conference that I have been presenting at every year since 2004. I talked about data – how to get it, how to use it, and how & why it’s becoming more and more a part of our lives. I walked through some of the process behind my NYTimes API visualizations, my recent Wired UK NDNAD piece, and Just Landed.

I really enjoyed giving the presentation, and it was great to speak to a lot of interesting people at the conference before and after the talk. As promised, I’ve posted a .ZIP file with some simple Processing files to get you started exploring with the NYTimes ArticleSearch API – the link for that along with some other resources that I mentioned during the talk are listed below.

Some of you may be aware that this year’s Flashbelt conference ‘featured’ a controversial talk by Hoss Gifford. I’m not going to talk about my reactions in detail in this post as my intention here is to simply share some information related to my presentation. However, I will say that I believe that there is no room at all for content that is in any way demeaning to women at Flashbelt or at any other event. It’s inexcusable. I’m saddened that this happened – but was heartened this morning to read this very thoughtful response and call for discussion from conference organizer Dave Schroeder, along with some of the people who very rightly brought this issue to a public stage earlier in the week. It’s well worth a read.

Back to the resources. Here are a couple of images that I wanted to show in my presentation, but somehow forgot to include. The first is an abstract visualization of the word ‘organic’ in the NYTimes between 1981 and 2009. The second is a radial visualization of mentions of the Yankees & Mets in the same paper over the same period of time.

NYTimes: Going Organic 1981-2009

NYTimes Threads - Yankees vs. Mets

Finally, a list of links:

Please let me know if there’s anything I’ve missed. As always, I’d love to hear any feedback and suggestions from those who were in the audience. I’m already looking forward to next year!