Just Landed: Processing, Twitter, MetaCarta & Hidden Data

Just Landed - Screenshot

I have a friend who has a Ph.D in bioinformatics. Over a beer last week, we ended up discussing the H1N1 flu virus, epidemic modeling, and countless other fascinating and somewhat scary things. She told me that epidemiologists have been experimenting with alternate methods of creating transmission models – specifically, she talked about a group that was using data from the Where’s George? project to build a computer model for tracking and predicting the spread of contagions (which I read about again in this NYTimes article two days later).

Just Landed - Screenshot

This got me thinking about the data that is hidden in various social network information streams – Facebook & Twitter updates in particular. People share a lot of information in their tweets – some of it shared intentionally, and some of it which could be uncovered with some rudimentary searching. I wondered if it would be possible to extract travel information from people’s public Twitter streams by searching for the term ‘Just landed in…’.

Just Landed - Screenshot

The idea is simple: Find tweets that contain this phrase, parse out the location they’d just landed in, along with the home location they list on their Twitter profile, and use this to map out travel in the Twittersphere (yes, I just used the phrase ‘Twittersphere’). Twitter’s search API gives us an easy way to get a list of tweets containing the phrase – I am working in Processing so I used Twitter4J to acquire the data from Twitter. The next question was a bit trickier – how would I extract location data from a list of tweets like this?:

Queen_Btch: just landed in London heading to the pub for a drink then im of to bed…so tired who knew hooking up on an airplane would be so tiring =S
jjvirgin: Just landed in Maui and I feel better already … Four days here then off to vegas
checrothers: Just landed in Dakar, Senegal… Another 9 hours n I’ll be in South Africa two entire days after I left … Doodles

It turned out to be a lot easier than I thought. MetaCarta offers 2 different APIs that can extract longitude & latitude information from a query. It can take the tweets above and extract locations:

London, London, United Kingdom – “Latitude” : 51.52, “Longitude” : -0.1
Maui, Hawaii, United States – “Latitude” : 20.5819, “Longitude” : -156.375
Dakar, Dakar, Senegal – “Latitude” : 14.72, “Longitude” : -17.48

This seemed perfect, so I signed up for an API key and set to work hooking the APIs up to Processing. This was a little bit tricky, since the APIs require authentication. After a bit of back and forth, I managed to track down the right libraries to implement Basic Authorization in Processing. I ended up writing a set of Classes to talk to MetaCarta – I’ll share these in a follow-up post later this week.

Now I had a way to take a series of tweets, and extract location data from them. I did the same thing with the location information from the Twitter user’s profile page – I could have gotten this via the Twitter API but it would cost one query per user, and Twitter limits requests to 100/hour, so I went the quick and dirty way and scraped this information from HTML. This gave me a pair of location points that could be placed on a map. This was reasonably easy with some assistance from the very informative map projection pages on Wolfram MathWorld.

I’ll admit it took some time to get the whole thing working the way that I wanted it to, but Processing is a perfect environment for this kind of project – bringing in data, implementing 3D, exporting to video – it’s all relatively easy. Here’s a render from the system, showing about 36 hours of Twitter-harvested travel:

Just Landed – 36 Hours from blprnt on Vimeo.

And another, earlier render showing just 4 hours but running a bit slower (I like this pace a lot better – but not the files size of the 36 hour video rendered at this speed!!)

Just Landed – Test Render (4 hrs) from blprnt on Vimeo.

Now, I realize this is a far stretch from a working model to predict epidemics. But, it sure does look cool. I also I think it will be a good base for some more interesting work. Of course, as always, I’d love to hear your feedback and suggestions.

136 thoughts on “Just Landed: Processing, Twitter, MetaCarta & Hidden Data”

  1. As an epidemiologist at the CDC (who has colleagues currently dispatched to work on the Swine Flu, this is really valuable information. What an incredible dramatization too, I’ll pass it on.

  2. Nice job. Hope in the future geolocation services such Latitude bring public anonymized data. All this information will increase the self-knowing of the humanity.

  3. What a great idea!

    I just posted on your vimeo page http://vimeo.com/4587178 about doing this kind of visualization for “http” searches in the twitter API to get recently mentioned Web pages and using the MetaCarta GeoTagger to plot the locations mentioned in those Web pages.

    MetaCarta also has a search API that you can use to find recent news articles that mention places near any place — perhaps you could use that to add a “word cloud” to each landing site. “Just landed in… Wellington” would then generate “05/02/2009 06:58:00 A Wellington woman who tested positive for Influenza A (H1N1) after arriving in Auckland from Los Angeles on NZ1 on …”

    Let me know if you want any help filtering with the APIs.

    We love what you’re doing.

    jrf

  4. Amazing work Jer! Which were the two Metacarta APIs you used? And how were you able to get the black and white map? Keep up the great work!

  5. Hello Jer, this is beautiful!

    If you want we could do a sound track for the movies, for free. They are lacking it.

    You have my e-mail, just get in touc if you are interested.

    Cheers,
    Fernando

  6. All I can say is awesome! I am thrilled with the direction I see discussed here. Maybe there is hope when people start thinking. Many great uses for this, keep up the excellent creative genius. This is brilliant!

  7. Why assume that they are travelling from their Twitter home location? I realize that you lack a choice, but that's gonna skew the results for multi-city travellers, and the “just landed in” crowd are likely pre-selected for that bunch of people.

  8. Hi Bruce,

    First – this very quick project was meant as a proof of concept and was never supposed to be an accurate model of any kind. There has been some interest from the epidemiology community, though, so perhaps it might become something more accurate.

    Second, you're right. I think an ideal solution would track back into a user's tweets to try to guess what their itinerary may have been. I don't think that this would be extraordinarily difficult, but would certainly involve a fair number of hits to the Twitter API.

    In any case, thanks for your comments.

    -Jer

Leave a Reply

Your email address will not be published. Required fields are marked *