Just Landed: Processing, Twitter, MetaCarta & Hidden Data

Just Landed - Screenshot

I have a friend who has a Ph.D in bioinformatics. Over a beer last week, we ended up discussing the H1N1 flu virus, epidemic modeling, and countless other fascinating and somewhat scary things. She told me that epidemiologists have been experimenting with alternate methods of creating transmission models – specifically, she talked about a group that was using data from the Where’s George? project to build a computer model for tracking and predicting the spread of contagions (which I read about again in this NYTimes article two days later).

Just Landed - Screenshot

This got me thinking about the data that is hidden in various social network information streams – Facebook & Twitter updates in particular. People share a lot of information in their tweets – some of it shared intentionally, and some of it which could be uncovered with some rudimentary searching. I wondered if it would be possible to extract travel information from people’s public Twitter streams by searching for the term ‘Just landed in…’.

Just Landed - Screenshot

The idea is simple: Find tweets that contain this phrase, parse out the location they’d just landed in, along with the home location they list on their Twitter profile, and use this to map out travel in the Twittersphere (yes, I just used the phrase ‘Twittersphere’). Twitter’s search API gives us an easy way to get a list of tweets containing the phrase – I am working in Processing so I used Twitter4J to acquire the data from Twitter. The next question was a bit trickier – how would I extract location data from a list of tweets like this?:

Queen_Btch: just landed in London heading to the pub for a drink then im of to bed…so tired who knew hooking up on an airplane would be so tiring =S
jjvirgin: Just landed in Maui and I feel better already … Four days here then off to vegas
checrothers: Just landed in Dakar, Senegal… Another 9 hours n I’ll be in South Africa two entire days after I left … Doodles

It turned out to be a lot easier than I thought. MetaCarta offers 2 different APIs that can extract longitude & latitude information from a query. It can take the tweets above and extract locations:

London, London, United Kingdom – “Latitude” : 51.52, “Longitude” : -0.1
Maui, Hawaii, United States – “Latitude” : 20.5819, “Longitude” : -156.375
Dakar, Dakar, Senegal – “Latitude” : 14.72, “Longitude” : -17.48

This seemed perfect, so I signed up for an API key and set to work hooking the APIs up to Processing. This was a little bit tricky, since the APIs require authentication. After a bit of back and forth, I managed to track down the right libraries to implement Basic Authorization in Processing. I ended up writing a set of Classes to talk to MetaCarta – I’ll share these in a follow-up post later this week.

Now I had a way to take a series of tweets, and extract location data from them. I did the same thing with the location information from the Twitter user’s profile page – I could have gotten this via the Twitter API but it would cost one query per user, and Twitter limits requests to 100/hour, so I went the quick and dirty way and scraped this information from HTML. This gave me a pair of location points that could be placed on a map. This was reasonably easy with some assistance from the very informative map projection pages on Wolfram MathWorld.

I’ll admit it took some time to get the whole thing working the way that I wanted it to, but Processing is a perfect environment for this kind of project – bringing in data, implementing 3D, exporting to video – it’s all relatively easy. Here’s a render from the system, showing about 36 hours of Twitter-harvested travel:

Just Landed – 36 Hours from blprnt on Vimeo.

And another, earlier render showing just 4 hours but running a bit slower (I like this pace a lot better – but not the files size of the 36 hour video rendered at this speed!!)

Just Landed – Test Render (4 hrs) from blprnt on Vimeo.

Now, I realize this is a far stretch from a working model to predict epidemics. But, it sure does look cool. I also I think it will be a good base for some more interesting work. Of course, as always, I’d love to hear your feedback and suggestions.

136 thoughts on “Just Landed: Processing, Twitter, MetaCarta & Hidden Data”

  1. Jer, this is gorgeous!

    I'd like to see it built into an App that allows live UI tag filtering to be entered, then dynamically updated. Could be great as an installation or performance tool.

  2. Thanks, Jesse.

    This actually renders totally fine in real time – but the data harvesting takes a long time, since it has to make a query to MetaCarta twice for every tweet. Ideally I'd like to move this process server-side, and have a PHP script that takes care of harvesting the data, etc. That way you'd be able to use the interface to explore historical data, etc.

    -Jer

  3. This is great. I love the way the arcs have that leading pulses. I could see this being useful for visualising not just transmission of flu but also other data that moves over time for example weather.

  4. Thanks for all of the comments. I'll respond in more detail when I get a chance – but suffice to say I appreciate all of your suggestions & questions.

    Right now what you are seeing is 'away' pairs – where they are arriving somewhere that isn't their home location. I'd like to add the 'home' pairs – people who are landing at home, but I'm not sure how I want to represent these graphically, yet.

    A lot of people were quick to find the 2D weakness – cross-pacific flights are not shown properly. I like the suggestion that this might be due to a no-fly zone or a sea monster – but really it's because I opted for the 2D map to avoid the typical spinning globe visualization cliché. I think the 2D map works well for a lot of reasons – I might eventually try a globe version but I don't think so.

    Two things to keep in mind here:

    1. This is a sketch of a project that I don't consider final in any real way

    2. I am not intending to produce a real simulation of air travel – there are lots of better ways to do that. As I said int he post, I'm interested in exposing hidden information in Twitter feeds. I'm also interested in the possibilities (when implemented by much smarter people with a lot more time) of this kind of concept to real modeling.

    In any case, I'm glad that people seem to be enjoying these early steps – and I welcome any and all comment and feedback.

    -Jer

    1. To the modeler:
      You did this little experiment a long time ago….but I wonder what would happen if you try to search the equivalent of "Just landed…" in other languages (French, Mandarin, Japanese, German, etc…) Maybe you will be able to represent a more global sense of travel. Or maybe there are just more American users posting about their travel?

      To get "return home" flights, maybe you could write some code that would review previous posts of twitterers who return a result where the "Just landed…" destination = twitter hometown. There is a good chance that they reference places they were visiting. For example…someone from NYC returns to NYC. If they were in London….maybe 8 days prior they said "Just landed in…London" or they posted "@Big Ben in London" during their trip. Might be an interesting way to try to harvest the data and infer where the trip originated.

      To readers:
      As the modeler wrote, this is a very cool visualization, but the orange arcs are not true flight paths. Travelers from SF to Australia generally do not cross all of Africa and Asia (they cross the Pacific).

      The data collection excludes flights that represent people returning home. The data modeler assumed the start location was the twitterer's listed home town. For example If someone said they landed in New York City (perhaps from London) AND have New York City listed as their hometown, we will not see their London –> NYC trip because his model will only read "NYC (twitter hometown) –> NYC ("landed in…" destination).

  5. Interesting.

    It looks like most of the arcs in the US go west to east. I’m wondering why this is. It’s possible you collected data during times when flights are mostly west to east, such as coast to coast redeyes? Although it’s 36 hours of data. Or maybe there’s a bug in your renderer? Also, it’s possibly a vast majority of twitter users are west coasters.

  6. Tres cool. What a way to visualize data.

    How about sound effects? I hear some sort of up-sliding tone or whoosh for a takeoff, and a downward one for landing (nothing for mid-flight, though); have the pitch / frequency be higher for shorter trips.

  7. This is a really, really cool visualization. My mind is swirling with the possibilities.

    Really, really great stuff. I shared this with my entire staff. Thank you for taking the time to post. I for one, would love more insight into how specifically you generated this from a code perspective if you're willing to share. 🙂

    All the best,

    Jonathan

  8. I notice that all the arcs start in the west and move to the east. This would seem to imply an assumption in the programming that the target left its home location to arrive at ‘just landed in’. Also the direction of movement may be biased by Twitters user base being mostly North American. Too much inferred by this exercise to really tell much about the actual movement of the twitter targets.

  9. As someone just noted on Slashdot, you’d better re-implement it as a globe with better distance calculation. Great thought it is, as it stands, the Pacific Ocean is either a no-fly zone, or the world really *is* flat!

    Is great though. If only people would twitter more accurate information so (for instance) if someone’s flying from one location to another and neither is their home, it’ll still “work”.
    Awesome work, nice to see Processing getting some press. Be sure to let the guys at Processing know about this, they’ll find it just as cool as I do.
    Nice one!

  10. At first, i thought it was too much "traffic" (in both senses) coming from the US, but then i realized, it had to do with the language being parsed. The US would still lead in Twitter-using, but i think if you tried parsing different languages, the whole thing would spread a little more..

    To get more data, one could analyse the usage of google, with complete statistics such as ip-address and possibly set cookies, you could easily find out, where people are going if they were working off of their laptops..

  11. Very clever! I’m a Vancouverite myself. Question – why is it that we see so many outgoing flights but I don’t think I recall seeing even a single flight heading IN to the U.S?

  12. It’s possible that people are less likely to post something immediately after returning from a trip abroad as they simply want to get home and unpack and relax. To get those results it might be necessary to look for “just got back from…”

    However, that would lead to *tons* of results not being international flights and instead be things like, “the movies” or “grandma’s house”. What it boils down to is that mining social networking sites for interpreted data in very dependent on the language that is used to describe different situations and circumstances and also in looking for the right phrases.

  13. Finally, after over a decade of drinking beer together, we come up with something good. This is much, much better than candlelight croquet and the resulting lawn fire.

  14. It's pretty cool but it is too US centric as you would expect from twitter. I don't think it would make a very accurate model for the spread of a virus.

  15. Really cool piece of work!

    I just wonder what would happen if you were able to take data from travel sites like expedia.com and others as the input for your model.

    Best Regards,
    Victor

  16. Another issue to consider is the west-coaster who makes a stop on the way to Europe — your model as built would show an LA-NY flight, plus a LA-London flight. I suppose it would take a bit of work to correlate “just landed in” with prior entries from a user, rather than just his home location.

  17. Hi Jer,

    That is absolutely amazing. I’ve a proposal for you regarding working on a slightly different data source for this visualisation that would provide for a much more compelling real time visualisation. Please do feel free to drop me an email (its on the form, I assume you can see it) if you’d like to discuss.

    cheers
    David

  18. You should consider storing user names and locations for two weeks or so in the final product. That way, in the case of jjvrigin or someone else who is traveling multiple places. You could have it plot from their last “just landed in location” rather than from their listed home location. Not entirely a necessity, but if you want to create a tool to predict epidemics accurately, you are definitely going to want to include that information. Plus, this would help out with the return home plot. The search for lets say, “just got home” or “its good to be home”, or something to that effect, could then have the name extracted and compared the user list. You then have the last location and could easily get the home location.

    Either way, very cool project and by all definitions quite brilliant. Who could have thought, Twitter made useful? I never thought I’d see the day.

  19. A lot of my friends and business associates just say “LAX to DFW” as their Facebook updates (indicating that they are flying to Dallas from LA). I wonder if you could dig out some good data points using that kind of tagging.

  20. You assume that people have flown directly from their home location to their “just landed in” location. I think that’s a very big assumption that’s only true a fraction of the time.

    What about people taking multiple flights throughout a vacation? I’ve travelled that way before. What about people who drove somewhere else first, then flew? Business travel?

  21. Jer,

    This is phenomenal! From the side at the end it looks almost exactly like a strange attractor – I guess that fractal function is a matter of the distribution of airline travel consumption. Have you considered Buckminster Fuller’s Dymaxion projection? You’d see all travel moving up and down across a plane of nearly continuous landmass.
    http://en.wikipedia.org/wiki/Dymaxion_map

  22. Even given various smart questions about what’s being tracked and what it means, I think the visualizations are beautiful. They definitely got the creative, “what-if”? side of my brain working.

  23. Beautiful, curious , funny project.
    I should be happy if you plan to produce a shareware or why not a toool able to vizualize the Internet traffic in 3D; I presume that many people should be interested in this product.
    Good luck with your project.

  24. Fantastic.

    Have you considered doing something similar with “I am sick” and “I feel sick” – each entry could have a radius, and overlapping results could create increasing color gradients – similar to a storm on a weather radar.

  25. Thanks again for all of the comments. A lot of other people have done great work visualizing network traffic, airline traffic, etc. In this project I was most interested in using data that people didn’t necessarily intend to share.

    Once things calm down a bit (thanks /. !) I’ll sit down and build a version with some of the excellent suggestions that have come out of this thread and the Vimeo thread.

    And yes, I do intend to release source for this project – who needs $$$ anyways??

  26. Ryan – Dymaxion would be a great idea – certainly cool to try. I’ll have to see if I can dig up the formulas.

Leave a Reply

Your email address will not be published. Required fields are marked *