We’ve heard a lot this week about earthquake aid for Haiti. As is always the case when large numbers are bandied about in the news media, it’s hard to get a feeling of scale. For example, Canada has, at the time of writing, pledged to donate nearly 5.5M dollars to the aid effort. What does this number really mean? Well, considering Canada’s population of 33.3M, the aid works out to about 16 cents per Canadian citizen. 16 cents doesn’t buy you much these days. A sip of coffee, or – say – 3.14 minutes of Avatar; barely enough to get through the credits.
How many Avatar minutes are governments around the world pledging? Sweden leads the way, with almost 38 minutes per citizen – almost a quarter of the movie. Other Scandinavian countries round out the top 6, along with Luxembourg, Guyana, and Estonia.
Here are the times for some other countries:
Sweden: 38 minutes
Luxembourg: 28 minutes
Denmark: 26 minutes
Guyana: 25 minutes
Norway: 20 minutes
Estonia: 14 minutes
Australia: 8 minutes
Finland: 6 minutes
United States: 6 minutes
Switzerland: 5 minutes
New Zealand: 4 minutes
Netherlands: 3 minutes
United Kingdom: 3 minutes
Canada: 3 minutes
Spain: 2 minutes
Brazil: 2 minutes
Germany: 1 minute
Japan: 1 minute
Morocco: 1 minute
Poland: 1 minute
Italy: 1 minute
The images in this post are exports from a Processing tool that I built to manage the data and to render the film strips. The application reads data from a Google spreadsheet – the original data was published by the always excellent Guardian Data Blog. If there’s enough interest, I will post the tool and the source later this week.
I was very much moved by Maggie Steber’s photo essay in The New York Times, titled ‘No End of Trouble. Ever.‘
The essay talks about Haiti’s violent history, and of the countries incredible tendency towards misfortune:
“How can nature or God or the fates or the universe do this to a country that has borne far too much sadness? An earthquake has now devastated the capital; claiming lives, hopes and the pitifully small dreams that people have held on to, despite political violence, unimaginable poverty, disease, corruption, dictators and nature’s full force of four hurricanes in a row.”
I built this very quick visualization to explore this topic a little further. Specifically, I wanted to compare Haiti to its Caribbean neighbours to see if the country is indeed as unlucky as it seems.
This visualization compares Haiti to 12 other Caribbean nations. It looks at articles published in the New York Times mentioning those countries between 1981 and 2010, and measures the occurence of specific words in those articles.
The pie charts in each row show the percentage of total articles on each country which contain the words in question. For example, we can see that about 25% of articles published about Haiti mention the word ‘violence’ – twice the frequency of any other country on the list.
Haiti has the highest frequency of the words ‘coup’, ‘violence’, ‘disease’, and ‘strife’. It is second or third in mentions of ‘death’, ‘unrest’ and ‘famine’.
Likely this week’s events will lead to many more mentions of these words. As you’re likely aware, many NGOs small and large are organizing to help Haitians – both through emergency assistance and through long-term rebuilding. If you want to donate, I’d highly recommend considering Architecture for Humanity (for long-term projects) or Partners in Health (for emergency assistance). Both organizations are accepting donations through their websites.
Today, the UK’s Met Office released a subset of a large record of global temperature readings. This data set has been at the core of a lot of scientific research supporting the idea that the planet is getting warmer, including the controversial IPCC Assessment Reports.
Here is the data currently available, representing decades of data from over 1,500 land stations. As you can read on the linked page, the Met is at work to get more of this data released as soon as possible. There is some urgency here – the hope is that hard, un-deniable numbers might finally put some of the ‘debate’ surrounding the issue to rest.
Manuel Lima from VisualComplexity wrote a convincing blog post today, suggesting that the data community (how’s that for a general grouping?) can offer a lot to this cause. I couldn’t agree more. The general public certainly won’t gain much from this pile of strangely formatted text files – but they might be swayed by some well built, innovative visualizations that communicate and convince. Certainly, we can do better than the current graphics:
In order for this to be effective, I’d suggest three things are necessary:
1) Easy Access. I would love to see the data set placed into some format which is easily accessible, to save the work of everyone having to parse the data individually. Google Spreadsheets? MySQL tables? JSON? All of the above? Edit – mySQL tables are now available, along with a Perl parsing script, in the climate data forum (http://climatedata.blprnt.com)
2) Coordination. It would be useful to have a central place for people working with the data to ask questions and to share results. Ideally, a repository of graphics and interactive tools could be made available to the public and to the press.
3) Dialogue with Climate Scientists. The IPCC has more than 2500 expert reviewers, 800 contributing authors, and 450 lead authors. These people know what information needs to be shown, and what stories need to be told. Any effective effort to produce visualizations from this data would benefit from their input.
How does this start? As a quick measure to help with suggestion #2, I’ve created an open forum where we can start a dialogue, discuss some of these questions, and hopefully come up with some answers. For now, you can access the forum here:
Please pass on this invitation to any data-folks you might now – and of course any climate scientists, journalists, or other curious types who might want to get involved.
I spent a little bit of time today working on my text comparison tool, which I built last weekend to satisfy my curiosity about the similarities and differences between two very similar articles published on head injuries in the NFL (you can read the post here). I wanted to test out the tool with a different kind of content, and settled on something more political: two high profile foreign policy speeches by US President Barack Obama.
The shared words – ‘america’,'world’,'common’,'human’,'responsibility’, ect don’t offer much in the way of analysis. Things start to get interesting, though, when we look towards the edges (click on the images to see larger versions):
At the far extremes, the speech in Cairo was about Islam, about Palestinians, about peace, faith, and communities. The Tokyo address was about China, North Korea, security, agreement and growth. If we look at some of the common words that were used in both speeches, we can see some more interesting patterns emerge.
It seems, for instance that the Egyptian address was more about people, whereas the speech in Tokyo was directed towards nations:
Obama makes many more mentions about peace in Cairo (in Japan, this word seems to have been replaced by ‘security’), and far more mentions of prosperity in Tokyo:
There was a lot of speculation prior to Obama’s speech in Asia about how much focus the President would put on human rights. In the speech, Obama mentions ‘rights’ only five times – once at the beginning of the speech and four times near the end. This weighting is interesting when we compare it to Obama’s reference to China during the speech, which is heavily concentrated at the beginning (China is not mentioned at all past the half-way point of the speech):
Though one of the five occurrences of ‘rights’ is in reference to China, it appears from this analysis that there may have been a deliberate plan to keep the ‘human rights part’ of the speech separated from the ‘China part’.
There likely many interesting things in this data set, a lot of which are open to interpretation. While it’s doubtful that one can steer entirely clear of political biases during this kind of comparison, the quantitative nature of the data makes it a little bit easier to make an attempt at nonpartisan analysis. I will be including these speeches as sample texts when I release the tool to the public (hopefully next week).
In October, I read a fascinating article on GQ.com about head injuries among former NFL players. Written by Jeanne Marie Laskas, the article was a forensic detective story, documenting a little known doctor’s efforts to bring the brain trauma issue to the attention of the medical community, the NFL, and the general public. It is a great read – an in-depth investigative piece with engaging personalities and plenty of intrigue.
A few weeks later, I picked up a copy of The New Yorker on my way home from Pittsburgh. I was surprised to see, on the cover, a promo for an article by Malcolm Gladwell about – you guessed it – brain trauma and the NFL. After having read both articles, I was surprised by how much these two investigative pieces differed. At the time I thought about doing a visualization to investigate, but somehow the idea slipped out of my head.
Until this weekend. I spent a few (okay, more like eight) hours putting together a tool with Processing that would examine some of the similarities and differences between the two articles. The most interesting data ended up coming from word usage analysis (I looked at sentences and phrases as well, but with not much luck). The base interface for the tool is a XY chart of the words – they are positioned vertically by their average position in the articles, and horizontally by which article they occur in more. The words in the centre are shared by both articles. Total usage affects the scale of the words, so we can see quite quickly which words are used most, and in which articles.
By focusing our attention on the big words which lie more or less in the center, we can see what the two articles have in common: brains, football, dementia, and a disease called CTE. What is perhaps more interesting is what lies on the outer edges; the subjects and topics that were covered by one author and not by the other.
Laskas’ article is about Dr. Bennet Omalu, dead NFL players (Mike Webster), Omalu’s colleagues (Dr. Julian Bailes & Bob Fitzsimmons) and the NFL (click on the images to see bigger versions):
Gladwell’s article, on the other hand, focuses partly on another scientist, Dr. Ann McKee, the sport of football in general, as well as s central metaphor in his piece – a comparison between football and dogfighting (the bridge between the two is Michael Vick):
The gulf between the two main scientific personalities profiled in the articles is interesting. Omalu and McKee are both experts in chronic traumatic encephalopathy (CTE) so it makes sense that they each appear in both articles (Omalu was the first to describe the condition; McKee. However, we see when we isolate these names that Laskas references Dr. Omalu almost exclusively (Omalu is mentioned 96 times by Laskas and only 6 times by Gladwell)* – it’s worth noting here that the Laskas article is 11.4% longer than the Gladwell piece – JT:
In contrast, Laskas only refers to McKee once (Dr. McKee is mentioned by Gladwell 21 times):
What is the relationship between Dr. McKee and Dr. Omalu? McKee is on the advisory board for the Sports Legacy Institute, a group which studies the results of brain trauma on athletes. SLI was founded by four individuals, including Bennet Omalu and the group’s current head, Chris Nowinski, a former professional wrestler. Omalu and the other three founders of SLI have now left the group, but it apparently continues to be a high-profile presence in the CTE field. Laskas writes:
“Indeed, the casual observer who wants to learn more about CTE will be easily led to SLI and the Boston group. There’s an SLI Twitter link, an SLI awards banquet, an SLI Web site with photos of Nowinski and links to videos of him on TV and in the newspapers. Gradually, Omalu’s name slips out of the stories, and Bailes slips out, and Fitzsimmons, and their good fight. As it happens in stories, the telling and retelling simplify and reduce.”
I wonder how much the path of an journalistic piece is affected by who you talk to first? If I had to guess, I’d say Gladwell started with the SLI, whereas Laskas seemed to have began from Dr. Omalu. This single decision could account for many of the differences between the two articles.
Other word-use choices might also give insight into editorial positions. Laskas, for example, uses the term NFL (below, at left) a lot – 57 times to Gladwell’s 11. Gladwell, on the other hand, talks more about the sport in general, using the word ‘football’ (below, at right) 40 times to Laskas’ 23:
According to Laskas, Dr. Omalu has been roundly shunned by the NFL – they have attempted to discredit his research on many occasions (attention that has not been so pointedly focused on Dr. McKee and the SLI). Though both articles are critical of the League, it seems clear both from the article and the data that Laskas and GQ have taken a more severe stance – the addresses the NFL much more often, and with more disdain.
This exercise of quantitatively analyzing a pair of articles may seem like a strange way to spend a weekend, but it helped me to more clearly understand the differences between the two stories and to consider my reactions to each. I uncovered a few things that I hadn’t picked up at first, and at the same time was able to reinforce some of the feelings that I had after reading the two articles.
It was also another opportunity to build a quick, lightweight visualization tool dedicated to a fairly specific topic (though in this case the tool could be used to compare any two bodies of text). This strategy holds a lot of appeal to me and I think deserves attention alongside the generalist approach that we tend to see a lot of on the web and in data visualization. It seems to me that this type of investigative technique could be useful for researchers of various stripes.
I will be releasing source code for this project as well as compiled applications for Mac, Linux & Windows. In the meantime, here’s a short video of how the interface behaves: