Category Archives: Journalism

The Missing Piece of the OpenData / OpenGov Puzzle: Education

Yesterday, I tweeted a quick thought that I had, while walking the dog:

Picture 5

A few people asked me to expand on this, so let’s give it a try:

We are facing a very different data-related problem today than we were facing only a few years ago. Back then, the call was solely for more information. Since then, corporations and governments have started to answer this call and the result has been a flood of data of all shapes and sizes. While it’s important to remain on track with the goal of making data available, we are now faced with a parallel and perhaps more perplexing problem: What do we do with it all?

Of course, an industry has developed around all of this data; start-ups around the world are coming up with new ideas and data-related products every day. At the same time, open-sourcers are releasing helpful tools and clever apps by the dozen. Still, in a large part these groups are looking at the data with fiscal utility in mind. It seems to me that if we are going to make the most of this information resource, it’s important to bring more people in on the game – and to do that requires education.

At the post-secondary level, efforts should be made to educate academics for whom this new pile of data could be useful: journalists, social scientists, historians, contemporary artists, archivists, etc. I could imagine cross-disciplinary workshops teaching the basics:

  1. A survey of what kind of data is available, and how to find it.
  2. A brief overview of common data formats (CSV, JSON, XML, etc).
  3. An introduction to user-friendly exploration tools like ManyEyes & Tableau
  4. A primer in Processing and how it can be used to quickly prototype and build specialized visualization tools.

The last step seems particularly important to me, as it encourages people to think about new ways to engage with information. In many cases, datasets that are becoming available are novel in their content, structure, and complexity – encouraging innovation in an academic framework is essential. Yes, we do need to teach people how to make bar graphs and scatter charts; but let’s also facilitate exploration and experimentation.

Why workshops? While this type of teaching could certainly be done through tutorials, or with a well-written text book, it’s my experience that teaching these subjects is much more effective one-on-one. This is particularly true for students who come at data from a non-scientific perspective (and these people are the ones that we need the most).

The long-term goal of such an initiative would be to increase data-literacy. In a perfect world, this would occur even earlier – at the highschool level. Here’s where I put on my utopian hat: teaching data literacy to young people would mean that they could find answers to their own questions, rather than waiting for the media to answer those questions for them. It also teaches them, in a practical way, about transparency and accountability in government. The education system is already producing a generation of bloggers and citizen journalists – let’s make sure they have the skills they need to be dangerous. Veering a bit to the right, these are hugely valuable skills for workers in an ‘idea economy’ – a nation with a data-literate workforce is a force to be reckoned with.

Ideally this educational component would be build in to government projects like data.gov or data.hmg.gov.uk (are you listening, Canada?). More than that, it would be woven into the education mandate of governments at federal and local levels. Of course, I’m not holding my breath.

Instead, I’ve started to plan a bit of a project for the summer. Like last year, I taught a series of workshops at my studio in Vancouver, which were open to people of all skill levels. This year, I’m going to extend my reach a bit and offer a couple of free, online presentations covering some of the things that I’ve talked about in this post. One of these workshops will be specifically targeted to youth. At the same time, I’ll be publishing course outlines and sample materials for my sessions so that others can host similar events.

Stay tuned for details – and if you have any questions or would like to lend a hand, feel free to leave a comment or get in touch.

State of the Union(s)

New York Times, 01/27/10 - State of the Union Graphic

I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ‘state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.

The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)

Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:

State of the Union Graphs

Another avenue I explored was to use the word weights to determine a ‘score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:

SOTU analysis - Sentence Weighting- UK

The Netherlands:

SOTU analysis - Sentence Weighting - Netherlands

And Botswana:

SOTU analysis - Sentence Weighting - Botswana

Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with:

The Most Important Sentence From Obama's State of the Union Address?

Two Sides of the Same Story: Laskas & Gladwell on CTE & the NFL

Laskas / Gladwell

In October, I read a fascinating article on GQ.com about head injuries among former NFL players. Written by Jeanne Marie Laskas, the article was a forensic detective story, documenting a little known doctor’s efforts to bring the brain trauma issue to the attention of the medical community, the NFL, and the general public. It is a great read – an in-depth investigative piece with engaging personalities and plenty of intrigue.

A few weeks later, I picked up a copy of The New Yorker on my way home from Pittsburgh. I was surprised to see, on the cover, a promo for an article by Malcolm Gladwell about – you guessed it – brain trauma and the NFL. After having read both articles, I was surprised by how much these two investigative pieces differed. At the time I thought about doing a visualization to investigate, but somehow the idea slipped out of my head.

Until this weekend. I spent a few (okay, more like eight) hours putting together a tool with Processing that would examine some of the similarities and differences between the two articles. The most interesting data ended up coming from word usage analysis (I looked at sentences and phrases as well, but with not much luck). The base interface for the tool is a XY chart of the words – they are positioned vertically by their average position in the articles, and horizontally by which article they occur in more. The words in the centre are shared by both articles. Total usage affects the scale of the words, so we can see quite quickly which words are used most, and in which articles.

By focusing our attention on the big words which lie more or less in the center, we can see what the two articles have in common: brains, football, dementia, and a disease called CTE. What is perhaps more interesting is what lies on the outer edges; the subjects and topics that were covered by one author and not by the other.

Laskas’ article is about Dr. Bennet Omalu, dead NFL players (Mike Webster), Omalu’s colleagues (Dr. Julian Bailes & Bob Fitzsimmons) and the NFL (click on the images to see bigger versions):

Laskas / Gladwell

Gladwell’s article, on the other hand, focuses partly on another scientist, Dr. Ann McKee, the sport of football in general, as well as s central metaphor in his piece – a comparison between football and dogfighting (the bridge between the two is Michael Vick):

Laskas / Gladwell

The gulf between the two main scientific personalities profiled in the articles is interesting. Omalu and McKee are both experts in chronic traumatic encephalopathy (CTE) so it makes sense that they each appear in both articles (Omalu was the first to describe the condition; McKee. However, we see when we isolate these names that Laskas references Dr. Omalu almost exclusively (Omalu is mentioned 96 times by Laskas and only 6 times by Gladwell)* – it’s worth noting here that the Laskas article is 11.4% longer than the Gladwell piece – JT:

Laskas / Gladwell

In contrast, Laskas only refers to McKee once (Dr. McKee is mentioned by Gladwell 21 times):

Laskas / Gladwell

What is the relationship between Dr. McKee and Dr. Omalu? McKee is on the advisory board for the Sports Legacy Institute, a group which studies the results of brain trauma on athletes. SLI was founded by four individuals, including Bennet Omalu and the group’s current head, Chris Nowinski, a former professional wrestler. Omalu and the other three founders of SLI have now left the group, but it apparently continues to be a high-profile presence in the CTE field. Laskas writes:

“Indeed, the casual observer who wants to learn more about CTE will be easily led to SLI and the Boston group. There’s an SLI Twitter link, an SLI awards banquet, an SLI Web site with photos of Nowinski and links to videos of him on TV and in the newspapers. Gradually, Omalu’s name slips out of the stories, and Bailes slips out, and Fitzsimmons, and their good fight. As it happens in stories, the telling and retelling simplify and reduce.”

I wonder how much the path of an journalistic piece is affected by who you talk to first? If I had to guess, I’d say Gladwell started with the SLI, whereas Laskas seemed to have began from Dr. Omalu. This single decision could account for many of the differences between the two articles.

Other word-use choices might also give insight into editorial positions. Laskas, for example, uses the term NFL (below, at left) a lot – 57 times to Gladwell’s 11. Gladwell, on the other hand, talks more about the sport in general, using the word ‘football’ (below, at right)  40 times to Laskas’ 23:

Laskas / Gladwell Laskas / Gladwell

According to Laskas, Dr. Omalu has been roundly shunned by the NFL – they have attempted to discredit his research on many occasions (attention that has not been so pointedly focused on Dr. McKee and the SLI). Though both articles are critical of the League, it seems clear both from the article and the data that Laskas and GQ have taken a more severe stance – the addresses the NFL much more often, and with more disdain.

This exercise of quantitatively analyzing a pair of articles may seem like a strange way to spend a weekend, but it helped me to more clearly understand the differences between the two stories and to consider my reactions to each. I uncovered a few things that I hadn’t picked up at first, and at the same time was able to reinforce some of the feelings that I had after reading the two articles.

It was also another opportunity to build a quick, lightweight visualization tool dedicated to a fairly specific topic (though in this case the tool could be used to compare any two bodies of text). This strategy holds a lot of appeal to me and I think deserves attention alongside the generalist approach that we tend to see a lot of on the web and in data visualization. It seems to me that this type of investigative technique could be useful for researchers of various stripes.

I will be releasing source code for this project as well as compiled applications for Mac, Linux & Windows. In the meantime, here’s a short video of how the interface behaves:

Two Sides of the Same Story: Laskas & Gladwell on CTE & the NFL from blprnt on Vimeo.