Data Activism and Climate Reality

Today, the UK’s Met Office released a subset of a large record of global temperature readings. This data set has been at the core of a lot of scientific research supporting the idea that the planet is getting warmer, including the controversial IPCC Assessment Reports.

Here is the data currently available, representing decades of data from over 1,500 land stations. As you can read on the linked page, the Met is at work to get more of this data released as soon as possible. There is some urgency here – the hope is that hard, un-deniable numbers might finally put some of the ‘debate’ surrounding the issue to rest.

Manuel Lima from VisualComplexity wrote a convincing blog post today, suggesting that the data community (how’s that for a general grouping?) can offer a lot to this cause. I couldn’t agree more. The general public certainly won’t gain much from this pile of strangely formatted text files – but they might be swayed by some well built, innovative visualizations that communicate and convince. Certainly, we can do better than the current graphics:

Met Office Visualization

In order for this to be effective, I’d suggest three things are necessary:

1) Easy Access. I would love to see the data set placed into some format which is easily accessible, to save the work of everyone having to parse the data individually. Google Spreadsheets? MySQL tables? JSON? All of the above? Edit – mySQL tables are now available, along with a Perl parsing script, in the climate data forum (http://climatedata.blprnt.com)

2) Coordination. It would be useful to have a central place for people working with the data to ask questions and to share results. Ideally, a repository of graphics and interactive tools could be made available to the public and to the press.

3) Dialogue with Climate Scientists. The IPCC has more than 2500 expert reviewers, 800 contributing authors, and 450 lead authors. These people know what information needs to be shown, and what stories need to be told. Any effective effort to produce visualizations from this data would benefit from their input.

How does this start? As a quick measure to help with suggestion #2, I’ve created an open forum where we can start a dialogue, discuss some of these questions, and hopefully come up with some answers. For now, you can access the forum here:

http://climatedata.blprnt.com

Please pass on this invitation to any data-folks you might now – and of course any climate scientists, journalists, or other curious types who might want to get involved.

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Reddit
  • StumbleUpon
  • Tumblr
8 Comments

Tokyo | Cairo: Comparing Obama’s Foreign Policy Speeches

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

I spent a little bit of time today working on my text comparison tool, which I built last weekend to satisfy my curiosity about the similarities and differences between two very similar articles published on head injuries in the NFL (you can read the post here). I wanted to test out the tool with a different kind of content, and settled on something more political: two high profile foreign policy speeches by US President Barack Obama.

The first speech is Obama’s famous open address to the Muslim world, given in July at the University of Cairo. The second is much more recent – yesterday’s speech delivered at the Suntory Hall in Tokyo. As you might expect, the two speeches share a lot of common language. Here is the big picture, showing the top 100 words:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

The shared words – ‘america’,'world’,'common’,'human’,'responsibility’, ect don’t offer much in the way of analysis. Things start to get interesting, though, when we look towards the edges (click on the images to see larger versions):

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

At the far extremes, the speech in Cairo was about Islam, about Palestinians, about peace, faith, and communities. The Tokyo address was about China, North Korea, security, agreement and growth. If we look at some of the common words that were used in both speeches, we can see some more interesting patterns emerge.

It seems, for instance that the Egyptian address was more about people, whereas the speech in Tokyo was directed towards nations:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

Obama makes many more mentions about peace in Cairo (in Japan, this word seems to have been replaced by ’security’), and far more mentions of prosperity in Tokyo:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

There was a lot of speculation prior to Obama’s speech in Asia about how much focus the President would put on human rights. In the speech, Obama mentions ‘rights’ only five times – once at the beginning of the speech and four times near the end. This weighting is interesting when we compare it to Obama’s reference to China during the speech, which is heavily concentrated at the beginning (China is not mentioned at all past the half-way point of the speech):

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

Though one of the five occurrences of ‘rights’ is in reference to China, it appears from this analysis that there may have been a deliberate plan to keep the ‘human rights part’ of the speech separated from the ‘China part’.

There likely many interesting things in this data set, a lot of which are open to interpretation. While it’s doubtful that one can steer entirely clear of political biases during this kind of comparison, the quantitative nature of the data makes it a little bit easier to make an attempt at nonpartisan analysis. I will be including these speeches as sample texts when I release the tool to the public (hopefully next week).

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Reddit
  • StumbleUpon
  • Tumblr
11 Comments

Two Sides of the Same Story: Laskas & Gladwell on CTE & the NFL

Laskas / Gladwell

In October, I read a fascinating article on GQ.com about head injuries among former NFL players. Written by Jeanne Marie Laskas, the article was a forensic detective story, documenting a little known doctor’s efforts to bring the brain trauma issue to the attention of the medical community, the NFL, and the general public. It is a great read – an in-depth investigative piece with engaging personalities and plenty of intrigue.

A few weeks later, I picked up a copy of The New Yorker on my way home from Pittsburgh. I was surprised to see, on the cover, a promo for an article by Malcolm Gladwell about – you guessed it – brain trauma and the NFL. After having read both articles, I was surprised by how much these two investigative pieces differed. At the time I thought about doing a visualization to investigate, but somehow the idea slipped out of my head.

Until this weekend. I spent a few (okay, more like eight) hours putting together a tool with Processing that would examine some of the similarities and differences between the two articles. The most interesting data ended up coming from word usage analysis (I looked at sentences and phrases as well, but with not much luck). The base interface for the tool is a XY chart of the words – they are positioned vertically by their average position in the articles, and horizontally by which article they occur in more. The words in the centre are shared by both articles. Total usage affects the scale of the words, so we can see quite quickly which words are used most, and in which articles.

By focusing our attention on the big words which lie more or less in the center, we can see what the two articles have in common: brains, football, dementia, and a disease called CTE. What is perhaps more interesting is what lies on the outer edges; the subjects and topics that were covered by one author and not by the other.

Laskas’ article is about Dr. Bennet Omalu, dead NFL players (Mike Webster), Omalu’s colleagues (Dr. Julian Bailes & Bob Fitzsimmons) and the NFL (click on the images to see bigger versions):

Laskas / Gladwell

Gladwell’s article, on the other hand, focuses partly on another scientist, Dr. Ann McKee, the sport of football in general, as well as s central metaphor in his piece – a comparison between football and dogfighting (the bridge between the two is Michael Vick):

Laskas / Gladwell

The gulf between the two main scientific personalities profiled in the articles is interesting. Omalu and McKee are both experts in chronic traumatic encephalopathy (CTE) so it makes sense that they each appear in both articles (Omalu was the first to describe the condition; McKee. However, we see when we isolate these names that Laskas references Dr. Omalu almost exclusively (Omalu is mentioned 96 times by Laskas and only 6 times by Gladwell)* – it’s worth noting here that the Laskas article is 11.4% longer than the Gladwell piece – JT:

Laskas / Gladwell

In contrast, Laskas only refers to McKee once (Dr. McKee is mentioned by Gladwell 21 times):

Laskas / Gladwell

What is the relationship between Dr. McKee and Dr. Omalu? McKee is on the advisory board for the Sports Legacy Institute, a group which studies the results of brain trauma on athletes. SLI was founded by four individuals, including Bennet Omalu and the group’s current head, Chris Nowinski, a former professional wrestler. Omalu and the other three founders of SLI have now left the group, but it apparently continues to be a high-profile presence in the CTE field. Laskas writes:

“Indeed, the casual observer who wants to learn more about CTE will be easily led to SLI and the Boston group. There’s an SLI Twitter link, an SLI awards banquet, an SLI Web site with photos of Nowinski and links to videos of him on TV and in the newspapers. Gradually, Omalu’s name slips out of the stories, and Bailes slips out, and Fitzsimmons, and their good fight. As it happens in stories, the telling and retelling simplify and reduce.”

I wonder how much the path of an journalistic piece is affected by who you talk to first? If I had to guess, I’d say Gladwell started with the SLI, whereas Laskas seemed to have began from Dr. Omalu. This single decision could account for many of the differences between the two articles.

Other word-use choices might also give insight into editorial positions. Laskas, for example, uses the term NFL (below, at left) a lot – 57 times to Gladwell’s 11. Gladwell, on the other hand, talks more about the sport in general, using the word ‘football’ (below, at right)  40 times to Laskas’ 23:

Laskas / Gladwell Laskas / Gladwell

According to Laskas, Dr. Omalu has been roundly shunned by the NFL – they have attempted to discredit his research on many occasions (attention that has not been so pointedly focused on Dr. McKee and the SLI). Though both articles are critical of the League, it seems clear both from the article and the data that Laskas and GQ have taken a more severe stance – the addresses the NFL much more often, and with more disdain.

This exercise of quantitatively analyzing a pair of articles may seem like a strange way to spend a weekend, but it helped me to more clearly understand the differences between the two stories and to consider my reactions to each. I uncovered a few things that I hadn’t picked up at first, and at the same time was able to reinforce some of the feelings that I had after reading the two articles.

It was also another opportunity to build a quick, lightweight visualization tool dedicated to a fairly specific topic (though in this case the tool could be used to compare any two bodies of text). This strategy holds a lot of appeal to me and I think deserves attention alongside the generalist approach that we tend to see a lot of on the web and in data visualization. It seems to me that this type of investigative technique could be useful for researchers of various stripes.

I will be releasing source code for this project as well as compiled applications for Mac, Linux & Windows. In the meantime, here’s a short video of how the interface behaves:

Two Sides of the Same Story: Laskas & Gladwell on CTE & the NFL from blprnt on Vimeo.

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Reddit
  • StumbleUpon
  • Tumblr
7 Comments

Data in Contemporary Art: Brian Jungen

BrianJungen

Photo: Trevor Mills, Vancouver Art Gallery

Last month, Manuel Lima caused a bit of a stir in the data visualization world when he proposed a divide between Information Visualization and Information Art. This kind of phylogenic approach aside, how does data fit into a contemporary art context? To answer this question, it might be helpful to start with an investigation into contemporary artists who use or have used data within their practice. A few months ago, I featured an article on Mark Lombardi, whose data-centric illustrations have found their way into the collections of many major art galleries & museums. This time I’ll move away from 2-d form and look at a piece by Canadian artist Brian Jungen.

Brian Jungen’s 2001 piece Isolated Depiction of the Passage of Time consists of hundreds of plastic cafeteria trays, stacked neatly on a wooden shipping pallet. As is often the case with Jungen’s work, this piece is more complex than it may at first appear. Start with the wooden pallet, which has been hand-carved from red cedar. Above it, the trays themselves are also carefully considered: the number and colour of the trays correspond to the population of Aboriginal males incarcerated in Canada’s prisons. This block of trays looks solid, but it is in fact a hollow container; in the center a TV plays, showing a looping series of daytime television programs. Finally, the entire work references a historical event, in which an inmate at Ontario’s Millhaven Institution escaped inside a makeshift structure of stacked trays.

I think it’s possible to understand Isolated Depiction of the Passage of Time, at least in part, as a data visualization. In general, Jungen’s practice resolves around disassembly and re-assembly – both key parts of any visualization project – so it is not too much of a surprise to see him treading into this territory. Though some of his later work involves data in more abstract ways (his famous Prototypes for New Understanding were made in a strictly numbered edition corresponding to Michael Jordan’s number; other works involve the construction of new maps from boundary data), Passage of Time remains his most data-centric work.

A major exhibition of Brian Jungen’s work is currently on show at the National Museum of the American Indian in Washington. An excellent introduction to Jungen’s work can be found on the NMAI’s website, including a brilliant essay by museum curator Paul Chaat Smith.

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Reddit
  • StumbleUpon
  • Tumblr
2 Comments

7 Days of Source Day #7: Variance – Better Design Through Genetics?

Project: Variance
Date: January, 2007
Language: ActionScript 2
Key Concepts: Genetic Algorithms

Overview:

I’ll admit – this is some fairly dusty code. But I wanted to end this 7 (or 25) days of source release with this project for a reason: I think it’s a pretty good idea. I took it to a prototype stage but not much further – maybe someone will find the time to move it forward into something more functional.

Variance is a tool that lets you evolve graphics, from pre-built ’sets’ of elements and typefaces. It combines some simple editing tools (you can manually change colour, position, rotation, etc.) with some evolutionary magic – you can select two different compositions and hybridize them to get a new set of graphics that share some of the properties of their parents. It works fairly well; usually within about 6 generations you can get a graphic that starts to look pretty good (starting from some jumbled random compositions).

When I was building this tool, I imagined how nice it would be if this kind of functionality was built into a ‘traditional’ tool like Illustrator. Working on a logotype, and are feeling a bit stuck? Have the application generate 9 mutations of your current design, then evolve those for a few generations until you get a result you like. Just getting started on a poster design? Pick some assets and have the Variance plug-in generate some possibilities. In both of these cases, the designer is still being a designer – picking the best results, tweaking layouts – but they are letting the genetic algorithms do the grunt work of coming up with possible variations. At the same time, they are allowing for mutations; often modifications that they may not have tried themselves. This is a very direct application of an Evolutionary Creative Process (ECP), which I’ve talked about before on this blog.

Utility aside, it’s fun to play with.

Getting Started:

You can see Variance in action here: http://www.blprnt.com/variance

To read more about the project and to get a walk-through, you can read this post (you may want to do this before you try the application)

You’ll need Flash to get into the source and to get working with it.

Download: VarianceSource(4.12 MB)


CC-GNU GPL

This software is licensed under the CC-GNU GPL version 2.0 or later.

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Reddit
  • StumbleUpon
  • Tumblr
Leave a comment