Tag Archives: dataviz

Random Number Multiples

Random Number Multiples - RGB

About seven years ago, I had a bit of a career crisis. I was freelancing – working for clients I didn’t care much about on projects that I didn’t care much about, and feeling that there was a huge distance between the work that I was creating and my physical self. I was sick of computers, and was considering a range of (in hindsight) ridiculous vocational changes.

My rescue didn’t come from a new programming language, or a faster computer, or even better clients. It came, instead, from a return to the physical. I learned how to screenprint, and made rock posters for local bands, out of my living room. Every weekend, a friend and I would rack paper, pull squeegees, make an enormous mess – and escape from all of our pixel-based problems. We kept it up for a few years; after I moved into a larger, cleaner, less ink-friendly place I put my screens into storage. Even though I stopped printing, that time I spent screenprinting turned the rest of my career in a more creative direction.

Imagine how happy I was, then, to be asked by curator Christina Vassallo to be part of the inaugural edition of her Random Number Multiple series – a project that would produce screenprints from the work of computational artists and designers. Even better, this first edition would pair me with Marius Watz, an artist who has been a huge inspiration to me over the years, and whose work is exceptional in every way.

Marius and Christina and I spent three days at Bushwick Print Lab printing each of the 200 prints by hand. It was a fantastic experience, and the results, I think, speak for themselves. Marius’ prints are explosions of colour, vivid, dramatic pseudo-random that really capture the eye:

RN Multiples 5146 Marius Watz - Arcs04-01

I made two prints. Both are abstractions of my word frequency visualizations that I created using Processing and the NYTimes Article Search API. The first, titled ‘RGB – NYT Word Frequency’, shows usage of the words ‘red’, ‘green’, ‘blue’ in the Times between 1981 and 2011 (you can see a series of details from the print here):

Jer Thorp, "RGB - NYTimes Word Frequencies"

Random Number Multiples - RGB

This print turned out even better than I could have expected. The fine detail is amazing, the colours are rich and vivid, and the half-toning on the individual bars creates a jewel-like halo in the center that is fascinating to look at up close.

My second print visualizes the terms ‘hope’ and ‘crisis’ over the same time period (again, more detailed views can be found here). This print was made with a semi-reflective ink, so it has a unique shimmer to it when viewed in the light:

RN Multiples 5235 Jer Thorp, Hope-Crisis

Overall, I was surprised and delighted by how well this computer-generated work translated to the traditional medium of screenprint. I will definitely be looking to make more prints in the future.

In the meantime, a limited number of both of these prints are available for sale at on the Random Number Multiples site. Prints are $100, made with entirely acid-free media, and ship with a signed certificate of authenticity.

Random Number Multiples - RGB

Two Sides of the Same Story: Laskas & Gladwell on CTE & the NFL

Laskas / Gladwell

In October, I read a fascinating article on GQ.com about head injuries among former NFL players. Written by Jeanne Marie Laskas, the article was a forensic detective story, documenting a little known doctor’s efforts to bring the brain trauma issue to the attention of the medical community, the NFL, and the general public. It is a great read – an in-depth investigative piece with engaging personalities and plenty of intrigue.

A few weeks later, I picked up a copy of The New Yorker on my way home from Pittsburgh. I was surprised to see, on the cover, a promo for an article by Malcolm Gladwell about – you guessed it – brain trauma and the NFL. After having read both articles, I was surprised by how much these two investigative pieces differed. At the time I thought about doing a visualization to investigate, but somehow the idea slipped out of my head.

Until this weekend. I spent a few (okay, more like eight) hours putting together a tool with Processing that would examine some of the similarities and differences between the two articles. The most interesting data ended up coming from word usage analysis (I looked at sentences and phrases as well, but with not much luck). The base interface for the tool is a XY chart of the words – they are positioned vertically by their average position in the articles, and horizontally by which article they occur in more. The words in the centre are shared by both articles. Total usage affects the scale of the words, so we can see quite quickly which words are used most, and in which articles.

By focusing our attention on the big words which lie more or less in the center, we can see what the two articles have in common: brains, football, dementia, and a disease called CTE. What is perhaps more interesting is what lies on the outer edges; the subjects and topics that were covered by one author and not by the other.

Laskas’ article is about Dr. Bennet Omalu, dead NFL players (Mike Webster), Omalu’s colleagues (Dr. Julian Bailes & Bob Fitzsimmons) and the NFL (click on the images to see bigger versions):

Laskas / Gladwell

Gladwell’s article, on the other hand, focuses partly on another scientist, Dr. Ann McKee, the sport of football in general, as well as s central metaphor in his piece – a comparison between football and dogfighting (the bridge between the two is Michael Vick):

Laskas / Gladwell

The gulf between the two main scientific personalities profiled in the articles is interesting. Omalu and McKee are both experts in chronic traumatic encephalopathy (CTE) so it makes sense that they each appear in both articles (Omalu was the first to describe the condition; McKee. However, we see when we isolate these names that Laskas references Dr. Omalu almost exclusively (Omalu is mentioned 96 times by Laskas and only 6 times by Gladwell)* – it’s worth noting here that the Laskas article is 11.4% longer than the Gladwell piece – JT:

Laskas / Gladwell

In contrast, Laskas only refers to McKee once (Dr. McKee is mentioned by Gladwell 21 times):

Laskas / Gladwell

What is the relationship between Dr. McKee and Dr. Omalu? McKee is on the advisory board for the Sports Legacy Institute, a group which studies the results of brain trauma on athletes. SLI was founded by four individuals, including Bennet Omalu and the group’s current head, Chris Nowinski, a former professional wrestler. Omalu and the other three founders of SLI have now left the group, but it apparently continues to be a high-profile presence in the CTE field. Laskas writes:

“Indeed, the casual observer who wants to learn more about CTE will be easily led to SLI and the Boston group. There’s an SLI Twitter link, an SLI awards banquet, an SLI Web site with photos of Nowinski and links to videos of him on TV and in the newspapers. Gradually, Omalu’s name slips out of the stories, and Bailes slips out, and Fitzsimmons, and their good fight. As it happens in stories, the telling and retelling simplify and reduce.”

I wonder how much the path of an journalistic piece is affected by who you talk to first? If I had to guess, I’d say Gladwell started with the SLI, whereas Laskas seemed to have began from Dr. Omalu. This single decision could account for many of the differences between the two articles.

Other word-use choices might also give insight into editorial positions. Laskas, for example, uses the term NFL (below, at left) a lot – 57 times to Gladwell’s 11. Gladwell, on the other hand, talks more about the sport in general, using the word ‘football’ (below, at right)  40 times to Laskas’ 23:

Laskas / Gladwell Laskas / Gladwell

According to Laskas, Dr. Omalu has been roundly shunned by the NFL – they have attempted to discredit his research on many occasions (attention that has not been so pointedly focused on Dr. McKee and the SLI). Though both articles are critical of the League, it seems clear both from the article and the data that Laskas and GQ have taken a more severe stance – the addresses the NFL much more often, and with more disdain.

This exercise of quantitatively analyzing a pair of articles may seem like a strange way to spend a weekend, but it helped me to more clearly understand the differences between the two stories and to consider my reactions to each. I uncovered a few things that I hadn’t picked up at first, and at the same time was able to reinforce some of the feelings that I had after reading the two articles.

It was also another opportunity to build a quick, lightweight visualization tool dedicated to a fairly specific topic (though in this case the tool could be used to compare any two bodies of text). This strategy holds a lot of appeal to me and I think deserves attention alongside the generalist approach that we tend to see a lot of on the web and in data visualization. It seems to me that this type of investigative technique could be useful for researchers of various stripes.

I will be releasing source code for this project as well as compiled applications for Mac, Linux & Windows. In the meantime, here’s a short video of how the interface behaves:

Two Sides of the Same Story: Laskas & Gladwell on CTE & the NFL from blprnt on Vimeo.

7 Days of Source Day #2: NYTimes 365/360

NYTimes: 365/360 - 2009 (in color)

Project: NYTimes 365/360
Date: February, 2009
Language: Processing
Key Concepts: Data Visualization, NYTimes Article Search API, HashMaps & ArrayLists


Many have you have already seen the series of visualizations that I created early in the year using the newly-released New York Times APIs. The most complex of these were in the 365/360 series in which I tried to distill an entire year of news stories into a single graphic. The resulting visualizations (2009 is picture above) capture the complex relationships – and somewhat tangled mess – that is a year in the news.

This release is a single sketch. I’ll be releasing the Article Search API Processing code as a library later in the week, but I wanted to show this project as it sits, with all of the code intact. The output from this sketch is a set of .PDFs which are suitable for print. Someday I’d like to show the entire series of these as a set of 6′ x 6′ prints – of course, someday I’d also like a solid-gold skateboard and a castle made of cheese.

That said, really nice, archival quality prints from this project (and the one I’ll be releasing tomorrow) are for sale in my Etsy shop. I realize that you’ll all be able to make your own prints now (and you are certainly welcome to do so) – but if you really enjoy the work and want to have a signed print to hang on your wall, you know who to talk to.

Getting Started:

Put the folder ‘NYT_365_360’ into your Processing sketch folder. Open Processing and open the sketch from the File > Sketchbook menu. You’ll find detailed instructions in the header of the main tab (the NYT_365_360.pde file).


Most of the credit for this sketch goes to the clever kids at the NYT who made the amazing Article Search API. This is the gold standard of APIs, and really is a dream to use. As you’ll see if you dig into the code, each of these complicated graphics is made with just 21 calls to the API. I can’t imagine the amount of blood, sweat, and tears that would go into making a graphic like this the old-fashioned way.

Speaking of gold standards, Robert Hodgin got me pointed to ArrayLists in the first place, and has been helpful many times over the last few years as I’ve tried to solve a series of ridiculously simple problems in Processing. Thanks, Robert!

Download: NYT365.zip (140k)


This software is licensed under the CC-GNU GPL version 2.0 or later.