Wired UK, July ‘09 – Visualizing a Nation’s DNA

Wired UK - NDNAD Spread (July, 2009)

In the spring, I was asked by Wired UK if I would be interested in producing something for the two-page ‘infoporn’ spread that runs in every issue. They had seen my experimentations with the NYTimes APIs, and were interested in the idea of non-conventional data visualizations. After a bit of research, I proposed an piece about the UK’s National DNA Database. It was a subject that interested me and I felt that there would be some interesting political territory to cover. Luckily, Wired agreed.

By searching through Parliamentary minutes, and sifting over annual reports, I was able to put together a fair amount of information about the NDNAD and I settled on a few key points that I wanted to convey with the piece. First, I wanted to somehow demonstrate how large the database is – with over 4.5M individuals profiled, it’s the largest DNA database in the world. It holds profiles for more than 7% of the UK’s population. As well as the size of the database, I wanted to show how it broke down – in racial groups, in age groups, and in terms of those who have been charged versus those who are ‘innocent’. Finally, I  wanted to talk about the difference between the UK’s population demographics and the demographics represented by the profiles in the NDNAD.

The central graphic, then, is a DNA strand with one dot for each of the profiles in the database – more than 5M! Of course, I didn’t do this by hand. I wrote a program in Processing that would generate a single, continuous strand that filled up a certain size area. I was inspired by electron microscope images that I had seen of real DNA in which it looks like a loop of thread:

The nice looping threads were rendered using Perlin noise – I had a few parameters inside the program which allowed me to control how ‘messy’ the tangle became, and how much variation in thickness each strand had. While I was at it, I colour-coded each DNA dot to indicate the database’s ethnic breakdown. The result was a giant tangle, which was pretty much exactly what I wanted:

Wired UK - NDNAD Infographic

Here, you can see the individual dots, and the colour breakdown:

Wired NDNAD Graphic - detail

The next step was to break down the big tangle into three parts – one representing the bulk of the database, one representing the 948,535 profiles that were taken from people under the age of 18, and one representing the ~500,000 profiles from people who had never been charged, convicted, or warned by police. The original image had a static centre-point for the DNA loop; to break the tangle apart, I modified the program so that the centrepoint could move to pre-determined points once certain counts had been reached. The final graphic changes centre-points three times. What was nice about this set-up what that it was easy to move and adjust the positioning of the graphic to fit the page layout. Rendering out a new version of the main image took just a few minutes.

Wired UK - NDNAD Infographic

Working with these kinds of generative strategies meant that I could explore many variations. As you can see from the graphics posted here, I went through a variety of compositional and colour changes, all of which were relatively painless. Using Processing, I built a mini-application whose entire purpose was to create these DNA systems. I also built a second min-app, which rendered out a set of pie-charts that were used to display related information along with the main graphic in the spread. I wanted these pie charts to fit in visually with the main graphic, so I created a very simple sketch to output charts from any set of data:

Wired NDNAD Pie Chart

There ended up being 11 of these little pie-charts that accompanied the main graphic. Again, by building tools, I was able to do some interesting things, while at the same time avoiding large amounts of manual labour. Just how I like it! You can see the final result in the image at the top of this post, and of course, in Wired UK – the July issue hit newsstands a couple of weeks ago. If you are in the UK, go out and buy a copy!

Perhaps the most exciting thing that has came out of this process is that I have been asked to be a contributing editor for Wired UK. I’ll be creating some more pieces centred around data & information over the coming months (look for a Just Landed spread next month), and will also be getting the chance to showcase some work by various brilliant designers & artists in the UK and around the world.

So, stay tuned…

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit
  • StumbleUpon
  • Tumblr
2 Comments

Flashbelt ‘09 – Hacking the Newsroom Followup

On Wednesday I had the chance to talk at Flashbelt, a web media conference that I have been presenting at every year since 2004. I talked about data – how to get it, how to use it, and how & why it’s becoming more and more a part of our lives. I walked through some of the process behind my NYTimes API visualizations, my recent Wired UK NDNAD piece, and Just Landed.

I really enjoyed giving the presentation, and it was great to speak to a lot of interesting people at the conference before and after the talk. As promised, I’ve posted a .ZIP file with some simple Processing files to get you started exploring with the NYTimes ArticleSearch API – the link for that along with some other resources that I mentioned during the talk are listed below.

Some of you may be aware that this year’s Flashbelt conference ‘featured’ a controversial talk by Hoss Gifford. I’m not going to talk about my reactions in detail in this post as my intention here is to simply share some information related to my presentation. However, I will say that I believe that there is no room at all for content that is in any way demeaning to women at Flashbelt or at any other event. It’s inexcusable. I’m saddened that this happened – but was heartened this morning to read this very thoughtful response and call for discussion from conference organizer Dave Schroeder, along with some of the people who very rightly brought this issue to a public stage earlier in the week. It’s well worth a read.

Back to the resources. Here are a couple of images that I wanted to show in my presentation, but somehow forgot to include. The first is an abstract visualization of the word ‘organic’ in the NYTimes between 1981 and 2009. The second is a radial visualization of mentions of the Yankees & Mets in the same paper over the same period of time.

NYTimes: Going Organic 1981-2009

NYTimes Threads - Yankees vs. Mets

Finally, a list of links:

Please let me know if there’s anything I’ve missed. As always, I’d love to hear any feedback and suggestions from those who were in the audience. I’m already looking forward to next year!

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit
  • StumbleUpon
  • Tumblr
Leave a comment

Arduino, XBee and The NYTimes: NewsAlarm goes wireless

NewsAlarm + Xbee

Last month, I built NewsAlarm – a modified smoke alarm wired into the NYTimes NewsWire API. It can be configured to sound in response to any keyword or keywords coming over the wire at a specific frequency; for example, you might set it to alarm when 50% of the headlines coming in contain the words ’space aliens’ or if 10% of the headlines include the word ‘evil robots’. It’s a pretty ridiculous device, meant to embody the equally ridiculous alarmism (pun intended) that permeates mainstream media.

The orginal NewsAlarm was hard-wired to the computer via an Arduino. It worked quite well, but it’s not very convenient – it can only get as far away from the computer as the wires allow, which is only about 5 feet. I wanted the device to be able to be a long way away from the computer processing the NewsWire data, and I also wanted one computer to be able to trigger multiple NewsAlarms. So, I looked into ways that I could connect the devices and the computers wirelessly.

The solution turned out to be the XBee – a cute little device that allows signals to be sent via 802.15.4 wireless. XBees are small, cheap, and can be combined to create simple mesh networks. Perfect! For the wireless NewsAlarm, 2 Xbees act as a transmitter and a receiver. A very simple serial signal is transferred from one XBee to the other when the alarm is triggered. Our system uses two arduinos right now, though it could be re-configured to use one and an FTDI cable.

Once we have things cleaned up a bit, we’ll post some schematics and some more detailed instructions on how to get this a system like this working. In the meantime, here are some tutorials that we found useful:

  • http://arduinofun.com/blog/2009/03/21/arduino-wireless-xbee-test/
  • http://www.ladyada.net/make/xbee/arduino.html
Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit
  • StumbleUpon
  • Tumblr
1 Comment

YVR -> BOS -> NYC -> MSP

Next week I’m heading out for a multi-stop trip to the USA.

I’ll be in Boston for Flash on Tap from the 28th to the 31st of May. Flash on Tap looks to be a great event – not only is there an interesting speaker lineup, there are also 13 microbreweries pouring in the evenings. I’m speaking at 10am on a Friday, which might be a bit early for a pint. I’ll be talking about a raft of projects centred around a theme of emergence, including some recent and brand new work.

After Boston, I head to New York to visit museums, eat as much as I can, and try not to look too much like a first-time New Yorker. Coincidentally I’ll be there for the same week as CAT – though I’m not attending, perhaps I’ll run into some creative technology types while I am wandering the streets. If anyone knows of other events happening in NYC in the first week of June, please let me know.

Finally, I’ll fly into Minneapolis for Flashbelt. I’ve already told you how much I like this event – if you haven’t already bought a ticket, there’s still time. I’ll be speaking at Flashbelt about my work with the NYTimes APIs as well as a broad range of topics surrounding open data and visualization. I’ll also be showing some work as part of the Data Art Show at the Pink Hobo Gallery in Minneapolis – along with James Paterson and Mario Klingemann.

Hopefully I’ll get a chance to talk to some of you along the way. If you are going to be attending either event, or are in Boston, New York, or Minneapolis and would like to say hello, feel free to fire me an e-mail or send a tweet.

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit
  • StumbleUpon
  • Tumblr
2 Comments

Just Landed: Processing, Twitter, MetaCarta & Hidden Data

Just Landed - Screenshot

I have a friend who has a Ph.D in bioinformatics. Over a beer last week, we ended up discussing the H1N1 flu virus, epidemic modeling, and countless other fascinating and somewhat scary things. She told me that epidemiologists have been experimenting with alternate methods of creating transmission models – specifically, she talked about a group that was using data from the Where’s George? project to build a computer model for tracking and predicting the spread of contagions (which I read about again in this NYTimes article two days later).

Just Landed - Screenshot

This got me thinking about the data that is hidden in various social network information streams – Facebook & Twitter updates in particular. People share a lot of information in their tweets – some of it shared intentionally, and some of it which could be uncovered with some rudimentary searching. I wondered if it would be possible to extract travel information from people’s public Twitter streams by searching for the term ‘Just landed in…’.

Just Landed - Screenshot

The idea is simple: Find tweets that contain this phrase, parse out the location they’d just landed in, along with the home location they list on their Twitter profile, and use this to map out travel in the Twittersphere (yes, I just used the phrase ‘Twittersphere’). Twitter’s search API gives us an easy way to get a list of tweets containing the phrase – I am working in Processing so I used Twitter4J to acquire the data from Twitter. The next question was a bit trickier – how would I extract location data from a list of tweets like this?:

Queen_Btch: just landed in London heading to the pub for a drink then im of to bed…so tired who knew hooking up on an airplane would be so tiring =S
jjvirgin: Just landed in Maui and I feel better already … Four days here then off to vegas
checrothers: Just landed in Dakar, Senegal… Another 9 hours n I’ll be in South Africa two entire days after I left … Doodles

It turned out to be a lot easier than I thought. MetaCarta offers 2 different APIs that can extract longitude & latitude information from a query. It can take the tweets above and extract locations:

London, London, United Kingdom – “Latitude” : 51.52, “Longitude” : -0.1
Maui, Hawaii, United States – “Latitude” : 20.5819, “Longitude” : -156.375
Dakar, Dakar, Senegal – “Latitude” : 14.72, “Longitude” : -17.48

This seemed perfect, so I signed up for an API key and set to work hooking the APIs up to Processing. This was a little bit tricky, since the APIs require authentication. After a bit of back and forth, I managed to track down the right libraries to implement Basic Authorization in Processing. I ended up writing a set of Classes to talk to MetaCarta – I’ll share these in a follow-up post later this week.

Now I had a way to take a series of tweets, and extract location data from them. I did the same thing with the location information from the Twitter user’s profile page – I could have gotten this via the Twitter API but it would cost one query per user, and Twitter limits requests to 100/hour, so I went the quick and dirty way and scraped this information from HTML. This gave me a pair of location points that could be placed on a map. This was reasonably easy with some assistance from the very informative map projection pages on Wolfram MathWorld.

I’ll admit it took some time to get the whole thing working the way that I wanted it to, but Processing is a perfect environment for this kind of project – bringing in data, implementing 3D, exporting to video – it’s all relatively easy. Here’s a render from the system, showing about 36 hours of Twitter-harvested travel:

Just Landed – 36 Hours from blprnt on Vimeo.

And another, earlier render showing just 4 hours but running a bit slower (I like this pace a lot better – but not the files size of the 36 hour video rendered at this speed!!)

Just Landed – Test Render (4 hrs) from blprnt on Vimeo.

Now, I realize this is a far stretch from a working model to predict epidemics. But, it sure does look cool. I also I think it will be a good base for some more interesting work. Of course, as always, I’d love to hear your feedback and suggestions.

Share:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Reddit
  • StumbleUpon
  • Tumblr
77 Comments