Too much information? Twitter, Privacy, and Lawrence Waterhouse

A few months ago, Twitter was in the news over concerns that people might be sharing a little bit too much information in their social networking broadcasts. Arizona resident Israel Hyman left for a vacation – but not before sending out a Tweet telling his friends and followers that he’d be out of town. While he was away, Hyman’s house was burgled. He suspected that his status update could have motivated the thieves to give his house a try.

Personally, I’m not convinced there are too many Twitter-saavy burglars (TweetBurglars?) out there. It’s probably much more likely that the would-be thief saw Mr. Hyman drive off in his station wagon with a canoe on top. In any case, I should be safe: I don’t broadcast much personal information in my Twitter feed.

At least not too much implicit personal information.

How much ‘hidden’ information am I sharing my Tweets? This is a question that has come up recently as I’ve been digging around with the Twitter API. I think that curious parties, armed with the Twitter API and some rudimentary programming skills might be able to find out more about you than you might think.

Here is a simple graphic showing all of my 1,212 Tweets since October of last year:

Twittergrams: All Tweets

The graph is ordered by day horizontally and by time vertically – tweets near the top are close to midnight, while those in the lower half were broadcast in the morning. I’ve also indicated the length of the tweet with the size of the circles – longer tweets show up larger and darker. You can see some trends from this really simple graph. First, it’s clear that my tweets have gotten longer and more frequent since I started using Twitter. Also, my first Tweets of the day (the bottom-most ones) seem to start on average at the same time – which a few notable anomalies. To investigate this a bit further, let’s highlight just the first tweets of the day (I’ve ignored days with 3 or fewer tweets):

Twittergrams: Morning Tweets

You can see from this plot that my morning messages tend to fall around 9:30am. There are a few outliers where I (heaven forbid) haven’t been around the computer – but there are also some deviations from the 9:30 norm that aren’t just statistical anomalies. If we plot some lines through the morning points starting in January this year, we’ll see the three areas where my twitter behaviour is not ‘normal’:

Twittergrams: Morning Tweet Graph

The yellow line in this graph is the 9:30 mean. The red line shows (with a bit of cushioning) the progression of  first tweet times over the 8 months in question. At the marked points, my first tweets deviate from the average. Why? In two of the tree cases, I’ve changed time zones. In March, I was in Munich, and in May/June I was in Boston, New York, and Minneapolis (you can actually see the time shift between EST & CST). In the zone marked with a 1, I was commuting in the morning to a residency in a Vancouver suburb – hence the later starts until February.

This is very simple example – certainly there isn’t a lot of useful (or incriminating) information to be found. But in the hands of a more capable investigator, it’s possible that the information underneath all of the Tweets, Facebook updates, Flickr comments, etc. that I am broadcasting everyday could reveal a lot more that I would want to share. Some nefarious party could quite easily set up a ‘tracker’ to watch my public broadcast, and to be notified if my daily behaviour deviates from the norm. TweetBurglars, are you listening?

Of course, all of this information can be useful for the good guys, too. With millions of people active on Twitter, the store of data – and what it can reveal – gets more and more interesting every day. We are already seeing scientists using web data to measure public happiness, but I think we have just scraped the surface of what could be uncovered. (To see a model of how Twitter updates could be used to track travel and disease spread, see my post on Just Landed).

In Neal Stephenson’s Cryptonomicon, he decribes a scenario in which an accurate map of London might be generated by tracking the heights of people’s heads as they stepped on and off of curves:

The curbs are sharp and perpendicular, not like the American smoothly molded sigmoid-cross-section curves. The transition between the side walk and the street is a crisp vertical. If you put a green lightbulb on Waterhouse’s head and watched him from the side during the blackout, his trajectory would look just like a square wave traced out on the face of a single-beam oscilloscope: up, down, up, down. If he were doing this at home, the curbs would be evenly spaced, about twelve to the mile, because his home town is neatly laid out on a grid.

Here in London, the street pattern is irregular and so the transitions in the square wave come at random-seeming times, sometimes very close together, sometimes very far apart.

A scientist watching the wave would probably despair of finding any pattern; it would look like a random circuit, driven by noise, triggered perhaps by the arrival of cosmic rays from deep space, or the decay of radioactive isotopes.

But if he had depth and ingenuity, it would be a different matter.

Depth could be obtained by putting a green light bulb on the head of every person in London and then recording their tracings for a few nights. The result would be a thick pile of graph-paper tracings, each one as seemingly random as the others. The thicker the pile, the greater the depth.

Ingenuity is a completely different matter. There is no systematic way to get it. One person could look at the pile of square wave tracings and see nothing but noise. Another might find a source of fascination there, an irrational feeling impossible to explain to anyone who did not share it. Some deep part of the mind, adept at noticing patterns (or the existence of a pattern) would stir awake and frantically signal the dull quotidian parts of the brain to keep looking at the pile of graph paper. The signal is dim and not always heeded, but it would instruct the recipient to stand there for days if necessary, shuffling through the pile of graphs like an autist, spreading them out over a large floor, stacking them in piles according to some inscrutable system, pencilling numbers, and letters from dead alphabets, into the corners, cross-referencing them, finding patterns, cross-checking them against others.

One day this person would walk out of that room carrying a highly accurate street map of London, reconstructed from the information in all of those square wave plots.

Stephenson tells us that success in such an endeavour requires depth and ingenuity. Depth, the internet has in spades. Millions of Twitter users are adding to a public dataset every second of every day. Ingenuity may prove a bit tougher, but with open APIs and thousands of clever curious investigators, it will be interesting to see what kinds of maps will be made – and to what means they will be used.

– I’ll be following up this post over the weekend with a preview of a new Twitter visualization that I have been working on – stay tuned!

4 thoughts on “Too much information? Twitter, Privacy, and Lawrence Waterhouse”

  1. I think it’s far more likely that someone just searched things like “vacation” or “trip” and their city.

    I don’t know why you think burglars aren’t Internet savvy. It’s the misconception that bad people all live in trailers and can’t spell their own names that gets us into trouble.

    1. I said I didn't think there were too many Twitter saavy burglars out there – not that there are none. Unless there's direct evidence that a Twitter status led to the theft, I'd be much more inclined in the Hyman case to believe that the burglars found out about his absence in a more traditional way. Houses have been broken into when people are on vacation for a very long time. You don't need Twitter to know that someone is away.

      In any case, this post is less about what happens when someone *does* make their lives public via twitter (tweets with 'vacation' or 'trip') and more about what kind of information we share unintentionally.

  2. On Wikipedia, where many users rack up thousands of edits, such temporal patterns have been analyzed to catch "sock puppets" (multiple accounts used by the same person for malicious purposes).

    See the diagrams on p.29-31 of… – including a time zone analysis like in your diagram, and an example of how sometimes you can even deduce a person's religious affiliation just from such temporal patterns, illustrating the privacy cocerns further.

Leave a Reply

Your email address will not be published. Required fields are marked *