Tag Archives: data visualization

State of the Union(s)

New York Times, 01/27/10 - State of the Union Graphic

I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ‘state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.

The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)

Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:

State of the Union Graphs

Another avenue I explored was to use the word weights to determine a ‘score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:

SOTU analysis - Sentence Weighting- UK

The Netherlands:

SOTU analysis - Sentence Weighting - Netherlands

And Botswana:

SOTU analysis - Sentence Weighting - Botswana

Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with:

The Most Important Sentence From Obama's State of the Union Address?

Haiti & Avatar – updates.

This post is a bit of a swiss-army knife. Without being too long-winded, I’m going to clarify some misunderstandings, update some figures, talk about Canadian foreign policy, respond to some criticism and remove a rock from a horse’s hoof. To start, then, let’s

Clarify some misunderstandings

I published a post last week comparing Haiti aid per capita to Avatar ticket prices. The post got a lot of attention, and the figures and general concept were cross-posted and re-hashed in many places. Some people seemed to have misunderstood the post, though, and thought that I was comparing the contributions of individual governments to the production costs of Avatar. This is not what I did.

To get my figures for ‘Avatar minutes’ I started with the total aid contribution for a country, and divided it by that country’s population to get a per-capita aid figure. I then calculated how many minutes of Avatar that per-person contribution would pay for, using a ticket price of $8.50 (with a running time of 162 minutes, an ‘Avatar minute’ is about 5.25 cents). So, with Canada’s aid contribution of $5.5M, and a population of 33.3M, the per-person donation is about 3 Avatar minutes. Now, before any of you angry Canadians start frothing at the mouth, let me

Update some figures

Haiti/Avatar Updates

When I published by post last week, I used the data that was then available. Many people commented about my use of the figure $5.5M for Canada, since very shortly after the post it was announced that the Canadian government was drastically increasing their Haiti aid contributions, and at the same time stated that they would match Canadian citizen’s contributions dollar-for-dollar, with no capping amount. I highlighted Canada in my post not to shame the government, but because I live in Canada. Again, I used the data available. I promised at the time to update the figures as more information became available, so, without further ado:

  • Canada: 74 minutes
  • Sweden: 47 minutes
  • Norway: 41 minutes
  • Denmark: 39 minutes
  • Luxembourg: 28 minutes
  • Finland: 27 minutes
  • Guyana: 25 minutes
  • Spain: 19 minutes
  • Estonia: 14 minutes
  • Australia: 12 minutes
  • Ireland: 12 minutes
  • Switzerland: 11 minutes
  • USA: 10 minutes
  • France: 9.5 minutes
  • Germany: 5 minutes
  • Netherlands: 5 minutes
  • Italy: 3 minutes
  • Japan: 1 minute

The contributions pledged by the Canadian government are impressive. But the point of the original post was not to single out any individual country for either congratulation or condemnation. Instead, it was to take the figures and put them into some kind of context.

$130,733,775 is a lot of money. Really. But our measure of amounts always depends on what context we put the numbers in. $130 million is a lot of money when compared to my yearly income. But it’s not that much money compared to the 2010 olympic budget – $1,700 million for a two-week sporting event. It’s just under half of the estimated production costs of Avatar ($280M). It’s less than 4% of Canada’s foreign aid budget.

Comparing Millions

If we add up ALL of the contributions to Haiti Aid, we get an even bigger amount of money – $1.75 billion dollars. A huge amount, to be sure, but again, a number that needs to be looked at in context. $1.75B is just a little bit less than Avatar has made in global ticket sales. It’s about 50% of Canada’s foreign aid budget, and 0.25% of last year’s monstrous US financial bailout. It is, repeating myself from the last paragraph, pretty much exactly what Vancouver is spending on next month’s winter games.

Comparing Billions

All of this mention of Canada and foreign aid may have already have tipped you off that I’d like to

Talk about Canadian Foreign Policy

Canada’s foreign aid budget is $3.45B, or about 0.25% of Canada’s GDP. Compare that to the Danes, who spend 0.83% of their GDP on aid (up this year from 0.82%, despite a record forecast deficit), or to the Swedes who spend about 0.92%. Canadians like to believe that we are a shining example of global citizenry, but largely this is an artifact of the pre-Mulroney governments of the 1970s and 1980s. The Center for Glocal Development ranked Canada 11th in their Commitment to Development Index from 2009, behind countries like Sweden, Denmark, Netherlands, Ireland, Spain, and Australia.

This index includes factors like aid, trade, investment, and migration. As the report notes, our migration levels of unskilled immigrants from developing countries has changed very little since the 1980s (we rank 11th on the list for migration). 

Like many other Canadians I grew up feeling proud about my country and about our role in the world. Unfortunately, the more I look into the actual figures, I realize that we have in many ways failed to maintain these ideals in the last 30 years.

I hope that the Canada’s actions on Haiti mark a change for our government (and not, say, a convenient way to buy some much-needed PR). I would like nothing more than to see Canada return to the role of the good global citizen. In the meantime, I will continue watching the government’s record with a deserved amount of criticality.

Speaking of criticality, let me finish this post by taking a moment to

Respond to some criticism

Jen Stirrup wrote a nicely detailed blog post in response to my Avatar/Haiti piece, in which she argues that the visualization puts beauty in advance of clarity. If we take the images that I used in the post as examples of data visualizations, I can’t help but agree. However, these images weren’t intended to be stand-alone graphics. Instead, they are screenshots of an animated, interactive visualization tool that I built to explore the data. As is very often the case when I work with data, I wrote a little program using Processing which was constructed specifically to deal with this data. I use the term ‘little’ here to emphasize the fact that it was a quick project – from the time that I had the idea to the time when I pressed ‘publish’ last Sunday was about 4 hours.

I would love to develop a workflow to take these interactive visualization tools to a stage where they can be shared more easily – at this point they tend to sit around while I harbour the best intentions to clean up the code enough for a proper release. In the meantime I can say that if you ask nicely, I’m usually willing to share my messy pre-release code. I will also be posting a brief video which might give you a better feel for how the project behaves – which, for the sake of continuity, I’ll title ‘Remove a rock from a horse’s hoof’

Finding Perspective: Haiti Earthquake Aid in Avatar Minutes

Haiti Earthqauke Aid by Nation - In Avatar Minutes

Haiti Earthqauke Aid by Nation - In Avatar Minutes

We’ve heard a lot this week about earthquake aid for Haiti. As is always the case when large numbers are bandied about in the news media, it’s hard to get a feeling of scale. For example, Canada has, at the time of writing, pledged to donate nearly 5.5M dollars to the aid effort. What does this number really mean? Well, considering Canada’s population of 33.3M, the aid works out to about 16 cents per Canadian citizen. 16 cents doesn’t buy you much these days. A sip of coffee, or – say – 3.14 minutes of Avatar; barely enough to get through the credits.

Haiti Earthqauke Aid by Nation - In Avatar Minutes

How many Avatar minutes are governments around the world pledging? Sweden leads the way, with almost 38 minutes per citizen – almost a quarter of the movie. Other Scandinavian countries round out the top 6, along with Luxembourg, Guyana, and Estonia.

Haiti Earthqauke Aid by Nation - In Avatar Minutes

Here are the times for some other countries:

  • Sweden: 38 minutes
  • Luxembourg: 28 minutes
  • Denmark: 26 minutes
  • Guyana: 25 minutes
  • Norway: 20 minutes
  • Estonia: 14 minutes
  • Australia: 8 minutes
  • Finland: 6 minutes
  • United States: 6 minutes
  • Switzerland: 5 minutes
  • New Zealand: 4 minutes
  • Netherlands: 3 minutes
  • United Kingdom: 3 minutes
  • Canada: 3 minutes
  • Spain: 2 minutes
  • Brazil: 2 minutes
  • Germany: 1 minute
  • Japan: 1 minute
  • Morocco: 1 minute
  • Poland: 1 minute
  • Italy: 1 minute


The images in this post are exports from a Processing tool that I built to manage the data and to render the film strips. The application reads data from a Google spreadsheet – the original data was published by the always excellent Guardian Data Blog. If there’s enough interest, I will post the tool and the source later this week.

Sweden: 38 seconds
Luxembourg: 28 seconds
Denmark: 26 seconds
Guyana: 25 seconds
Norway: 20 seconds
Estonia: 14 seconds
Australia: 8 seconds
Finland: 6 seconds
United States: 6 seconds
Switzerland: 5 seconds
New Zealand: 4 seconds
Netherlands: 3 seconds
United Kingdom: 3 seconds
Canada: 3 seconds
Spain: 2 seconds
Brazil: 2 seconds
Germany: 1 seconds
Japan: 1 seconds
Morocco: 1 seconds
Poland: 1 seconds
Italy: 1 seconds

Haiti Earthqauke Aid by Nation - In Avatar Minutes

Haiti Earthqauke Aid by Nation - In Avatar Minutes

Data Activism and Climate Reality

Today, the UK’s Met Office released a subset of a large record of global temperature readings. This data set has been at the core of a lot of scientific research supporting the idea that the planet is getting warmer, including the controversial IPCC Assessment Reports.

Here is the data currently available, representing decades of data from over 1,500 land stations. As you can read on the linked page, the Met is at work to get more of this data released as soon as possible. There is some urgency here – the hope is that hard, un-deniable numbers might finally put some of the ‘debate’ surrounding the issue to rest.

Manuel Lima from VisualComplexity wrote a convincing blog post today, suggesting that the data community (how’s that for a general grouping?) can offer a lot to this cause. I couldn’t agree more. The general public certainly won’t gain much from this pile of strangely formatted text files – but they might be swayed by some well built, innovative visualizations that communicate and convince. Certainly, we can do better than the current graphics:

Met Office Visualization

In order for this to be effective, I’d suggest three things are necessary:

1) Easy Access. I would love to see the data set placed into some format which is easily accessible, to save the work of everyone having to parse the data individually. Google Spreadsheets? MySQL tables? JSON? All of the above? Edit – mySQL tables are now available, along with a Perl parsing script, in the climate data forum (http://climatedata.blprnt.com)

2) Coordination. It would be useful to have a central place for people working with the data to ask questions and to share results. Ideally, a repository of graphics and interactive tools could be made available to the public and to the press.

3) Dialogue with Climate Scientists. The IPCC has more than 2500 expert reviewers, 800 contributing authors, and 450 lead authors. These people know what information needs to be shown, and what stories need to be told. Any effective effort to produce visualizations from this data would benefit from their input.

How does this start? As a quick measure to help with suggestion #2, I’ve created an open forum where we can start a dialogue, discuss some of these questions, and hopefully come up with some answers. For now, you can access the forum here:

http://climatedata.blprnt.com

Please pass on this invitation to any data-folks you might now – and of course any climate scientists, journalists, or other curious types who might want to get involved.

Tokyo | Cairo: Comparing Obama’s Foreign Policy Speeches

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

I spent a little bit of time today working on my text comparison tool, which I built last weekend to satisfy my curiosity about the similarities and differences between two very similar articles published on head injuries in the NFL (you can read the post here). I wanted to test out the tool with a different kind of content, and settled on something more political: two high profile foreign policy speeches by US President Barack Obama.

The first speech is Obama’s famous open address to the Muslim world, given in July at the University of Cairo. The second is much more recent – yesterday’s speech delivered at the Suntory Hall in Tokyo. As you might expect, the two speeches share a lot of common language. Here is the big picture, showing the top 100 words:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

The shared words – ‘america’,’world’,’common’,’human’,’responsibility’, ect don’t offer much in the way of analysis. Things start to get interesting, though, when we look towards the edges (click on the images to see larger versions):

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

At the far extremes, the speech in Cairo was about Islam, about Palestinians, about peace, faith, and communities. The Tokyo address was about China, North Korea, security, agreement and growth. If we look at some of the common words that were used in both speeches, we can see some more interesting patterns emerge.

It seems, for instance that the Egyptian address was more about people, whereas the speech in Tokyo was directed towards nations:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

Obama makes many more mentions about peace in Cairo (in Japan, this word seems to have been replaced by ‘security’), and far more mentions of prosperity in Tokyo:

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

There was a lot of speculation prior to Obama’s speech in Asia about how much focus the President would put on human rights. In the speech, Obama mentions ‘rights’ only five times – once at the beginning of the speech and four times near the end. This weighting is interesting when we compare it to Obama’s reference to China during the speech, which is heavily concentrated at the beginning (China is not mentioned at all past the half-way point of the speech):

Tokyo | Cairo: Comparing Obama's Foreign Policy Speeches

Though one of the five occurrences of ‘rights’ is in reference to China, it appears from this analysis that there may have been a deliberate plan to keep the ‘human rights part’ of the speech separated from the ‘China part’.

There likely many interesting things in this data set, a lot of which are open to interpretation. While it’s doubtful that one can steer entirely clear of political biases during this kind of comparison, the quantitative nature of the data makes it a little bit easier to make an attempt at nonpartisan analysis. I will be including these speeches as sample texts when I release the tool to the public (hopefully next week).