Tag Archives: wireduk

Wired UK, Barabási Lab and BIG data

Over the last year, I’ve produced five data-driven pieces for Wired UK. Four of them have been for the two-page infoporn spread that can be found in every issue. I’ve looked at the UK’s National DNA Database, used mined Twitter data to find people’s travel paths, and mapped traffic in some of the world’s busiest sea ports.

In the August issue, out on newsstands right now, I had a chance to work with some spectacular data and extremely talented people. The piece looks at a very, very big data set – cellular phone records from a pool of 10 million users in an anonymous European country. This data came (under a very strict layer of confidentiality) from Barabási Lab in Boston, where they have been using this information to find out some fascinating things about human mobility patterns.

In this post, I’ll walk through the process of creating this piece. Along the way, I’ll show some draft images and unused experiments that eventually evolved into the final project.

Working With Big Data

I can’t get into a lot of detail about the specifics of the data set, but needless to say, phone records for 10 million individuals take up a lot of space. All told, the data for this project consisted of more than 5.5GB of flattened text files. I should say, at this point, that I don’t work on a supercomputer – I churn out all of my work from an often overheated 2.33GHZ MacBook Pro. Since the deadline was reasonably tight on this project, I decided to rule out a distributed computing approach to get at all of this data, and instead chose to work with a subset of the full list of records. Working in Processing, I built a simple script that could filter out a smaller dataset from the complete files. I built several of these at varying file sizes, giving me a nice set of data to work with both in prototyping and in production stages. This is a strategy that I often employ, even with more minimal datasets – save the heavy lifting until the final render.

The first thing I did with the trimmed-down data was to construct ‘call histories’ for each user in the set. I rendered out these histories as stacked bars of individual calls, which could then be placed into a histogram. Here’s a graph of about 10,000 users, sorted by their total time spent on the phone :

Wired UK & Barabási Lab: Process

Here we see a very obvious power law distribution, with a few people talking a lot (really, a lot – 28.3 hours a week), and most callers talking relatively little (these is also a tail of text-only users at the very end). The problem here, of course, is that on a computer screen – or even in print – it’s hard to get into the data to learn anything useful. When I zoom into the graph, we can start to see the individual call histories (I’ve enlarged a few columns for detail). Here, long calls are rendered yellow, short calls are rendered red, and text messages are flat blue rectangles:

Wired UK & Barabási Lab: Process

I took the same graph as above, and added another set of columns extending below – here the white bars show us how many ‘friends’ the individual callers had – ie. how many people they are regularly talking to over the week:

Wired UK & Barabási Lab: Process

If I sort this graph by number of friends (rather than total call time), we can see that the two measures (talkativeness, and number of friends) don’t seem to be strongly correlated:

Wired UK & Barabási Lab: Process

It’s interesting to note here as well, that the data set includes linkage information – so I can also visualize who is calling who within our group of individuals:

Wired UK & Barabási Lab: Process

There is some interesting information to be dug up in here, but the long aspect of the graph and the general over-detail involved makes it not very usable – particularly for a magazine piece.

Ooh, and then Aaah.

The Infoporn section in Wired is a two page spread;  I always think of it as needing to serve two separate purposes for two different kinds of readers. First, it needs to be visually pleasing. I want people to say ‘Oooh…!’ when they turn the page to it. Once they’re hooked, though, I want them to learn something – the ‘Aaah!’ moment.

The data used in the graphs above seemed too complex to do anything truly revealing with – so perhaps it could be built into something sexy enough to draw an ‘Oooh!’ or two? In order to fit the long tails of these graphs onto the page, I wondered if I could add a bit of a curl to them. To make this structural change evident, I turned the graphs on a slight angle and rendered them in 3D. Here, we see five of these graphs, totaling about a million individual users, arranged into a single, tower-like shape:

Wired UK & Barabási Lab: Process

While these structures took a little while to render, I could quite easily generate a unique set of them, which I assembled as a line trailing off to the page edge on the left:

Wired UK & Barabási Lab: Process

Getting Personal

So far, the visuals for this project only tell a part of the story: that our individual calling habits fall into predictable patterns when placed with the larger whole (some excellent text from Michael Dumiak helps clarify this in the final piece). There’s another crucial piece, though. Cel phone usage data is inherently locative, since our provider always knows from which of their cel towers we are placing the call.

This is where the fun starts – we can use this locative data to track the mobility patterns of individual people (it’s worth saying here that all of the data the I worked with was anonymized). To do this, I created a tool (again, in Processing) to make ‘mobility cubes’ – which show a history of an individual’s movements over time:

Wired UK & Barabási Lab: Process

The individual above, for example, travels around an area less than a square kilometer over a period of just under three days. If I flatten this graph, we can see that this person travels mostly between two locations:

Wired UK & Barabási Lab: Process

From the data, we can identify a lot of individuals like this – commuters – who travel short distances between two places (home, and work). We can also find travelers (people who cover a long distance in a short period of time):

Wired UK & Barabási Lab: Process

And others who seem to follow more elaborate (but often still regular) mobility patterns:

Wired UK & Barabási Lab: Process

We can assemble a ‘mobility cube’ for each individual in the database – and very quickly gain a mechanism for recognizing patterns amongst these people:

Wired UK & Barabási Lab: Process

Which brings us to the underlying point of the piece – we are all leaving digital trails behind us, as we make our way around our individual lives. These trails are largely considered individual – even ethereal – yet technology is making these trails more visible and more readable everyday.

Of course, to see the final piece – the polished assembly of some of the drafts and artifacts you’ve seen in this post – you’ll have to buy the magazine. Wired UK is available on newsstands in the UK, and to all of our clever subscribers.

If you want to read more about this – and you should – I’d highly recommend Albert-László Barabási’s Bursts, which goes into much more detail about human mobility & predictability.

Finally, huge thanks have to go out to László and his team at the lab, without whom this piece would have never made it to print!

Wired UK, July ’09 – Visualizing a Nation’s DNA

Wired UK - NDNAD Spread (July, 2009)

In the spring, I was asked by Wired UK if I would be interested in producing something for the two-page ‘infoporn’ spread that runs in every issue. They had seen my experimentations with the NYTimes APIs, and were interested in the idea of non-conventional data visualizations. After a bit of research, I proposed an piece about the UK’s National DNA Database. It was a subject that interested me and I felt that there would be some interesting political territory to cover. Luckily, Wired agreed.

By searching through Parliamentary minutes, and sifting over annual reports, I was able to put together a fair amount of information about the NDNAD and I settled on a few key points that I wanted to convey with the piece. First, I wanted to somehow demonstrate how large the database is – with over 4.5M individuals profiled, it’s the largest DNA database in the world. It holds profiles for more than 7% of the UK’s population. As well as the size of the database, I wanted to show how it broke down – in racial groups, in age groups, and in terms of those who have been charged versus those who are ‘innocent’. Finally, I  wanted to talk about the difference between the UK’s population demographics and the demographics represented by the profiles in the NDNAD.

The central graphic, then, is a DNA strand with one dot for each of the profiles in the database – more than 5M! Of course, I didn’t do this by hand. I wrote a program in Processing that would generate a single, continuous strand that filled up a certain size area. I was inspired by electron microscope images that I had seen of real DNA in which it looks like a loop of thread:

The nice looping threads were rendered using Perlin noise – I had a few parameters inside the program which allowed me to control how ‘messy’ the tangle became, and how much variation in thickness each strand had. While I was at it, I colour-coded each DNA dot to indicate the database’s ethnic breakdown. The result was a giant tangle, which was pretty much exactly what I wanted:

Wired UK - NDNAD Infographic

Here, you can see the individual dots, and the colour breakdown:

Wired NDNAD Graphic - detail

The next step was to break down the big tangle into three parts – one representing the bulk of the database, one representing the 948,535 profiles that were taken from people under the age of 18, and one representing the ~500,000 profiles from people who had never been charged, convicted, or warned by police. The original image had a static centre-point for the DNA loop; to break the tangle apart, I modified the program so that the centrepoint could move to pre-determined points once certain counts had been reached. The final graphic changes centre-points three times. What was nice about this set-up what that it was easy to move and adjust the positioning of the graphic to fit the page layout. Rendering out a new version of the main image took just a few minutes.

Wired UK - NDNAD Infographic

Working with these kinds of generative strategies meant that I could explore many variations. As you can see from the graphics posted here, I went through a variety of compositional and colour changes, all of which were relatively painless. Using Processing, I built a mini-application whose entire purpose was to create these DNA systems. I also built a second min-app, which rendered out a set of pie-charts that were used to display related information along with the main graphic in the spread. I wanted these pie charts to fit in visually with the main graphic, so I created a very simple sketch to output charts from any set of data:

Wired NDNAD Pie Chart

There ended up being 11 of these little pie-charts that accompanied the main graphic. Again, by building tools, I was able to do some interesting things, while at the same time avoiding large amounts of manual labour. Just how I like it! You can see the final result in the image at the top of this post, and of course, in Wired UK – the July issue hit newsstands a couple of weeks ago. If you are in the UK, go out and buy a copy!

Perhaps the most exciting thing that has came out of this process is that I have been asked to be a contributing editor for Wired UK. I’ll be creating some more pieces centred around data & information over the coming months (look for a Just Landed spread next month), and will also be getting the chance to showcase some work by various brilliant designers & artists in the UK and around the world.

So, stay tuned…