Tag Archives: twitter

Your Random Numbers – Getting Started with Processing and Data Visualization

Over the last year or so, I’ve spent almost as much time thinking about how to teach data visualization as I’ve spent working with data. I’ve been a teacher for 10 years – for better or for worse this means that as I learn new techniques and concepts, I’m usually thinking about pedagogy at the same time. Lately, I’ve also become convinced that this massive ‘open data’ movement that we are currently in the midst of is sorely lacking in educational components. The amount of available data, I think, is quickly outpacing our ability to use it in useful and novel ways. How can basic data visualization techniques be taught in an easy, engaging manner?

This post, then, is a first sketch of what a lesson plan for teaching Processing and data visualization might look like. I’m going to start from scratch, work through some examples, and (hopefully) make some interesting stuff. One of the nice things, I think, about this process, is that we’re going to start with fresh, new data – I’m not sure what kind of things we’re going to find once we start to get our hands dirty. This is what is really exciting about data visualization; the chance to find answers to your own, possibly novel questions.

Let’s Start With the Data

We’re not going to work with an old, dusty data set here. Nor are we going to attempt to bash our heads against an unnecessarily complex pile of numbers. Instead, we’re going to start with a data set that I made up – with the help of a couple of hundred of my Twitter followers. Yesterday morning, I posted this request:

Even on a Saturday, a lot of helpful folks pitched in, and I ended up with about 225 numbers. And so, we have the easiest possible dataset to work with – a single list of whole numbers. I’m hoping that, as well as being simple, this dataset will turn out to be quite interesting – maybe telling us something about how the human brain thinks about numbers.

I wrote a quick Processing sketch to scrape out the numbers from the post, and then to put them into a Google Spreadsheet. You can see the whole dataset here: http://spreadsheets.google.com/pub?key=t6mq_WLV5c5uj6mUNSryBIA&output=html

I chose to start from a Google Spreadsheet in this tutorial, because I wanted people to be able to generate their own datasets to work with. Teachers – you can set up a spreadsheet of your own, and get your students to collect numbers by any means you’d like. The ‘User’ and ‘Tweet’ columns are not necessary; you just need to have a column called ‘Number’.

It’s about time to get down to some coding. The only tricky part in this whole process will be connecting to the Google Spreadsheet. Rather than bog down the tutorial with a lot of confusing semi-advanced code, I’ll let you download this sample sketch which has the Google Spreadsheet machinery in place.

Got it? Great. Open that sketch in Processing, and let’s get started. Just to make sure we’re all in the same place, you should see a screen that looks like this:

At the top of the sketch, you’ll see three String values that you can change. You’ll definitely have to enter your own Google username and password. If you have your own spreadsheet of number data, you can enter in the key for your spreadsheet as well. You can find the key right in the URL of any spreadsheet.

The first thing we’ll do is change the size of our sketch to give us some room to move, set the background color, and turn on smoothing to make things pretty. We do all of this in the setup enclosure:

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();
};

Now we need to get our data from the spreadsheet. One of the advantages of accessing the data from a shared remote file is that the remote data can change and we don’t have to worry about replacing files or changing our code.

We’re going to ask for a list of the ‘random’ numbers that are stored in the spreadsheet. The most easy way to store lists of things in Processing is in an Array. In this case, we’re looking for an array of whole numbers – integers. I’ve written a function that gets an integer array from Google – you can take a look at the code on the ‘GoogleCode’ tab if you’d like to see how that is done. What we need to know here is that this function – called getNumbers – will return, or send us back, a list of whole numbers. Let’s ask for that list:

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Ask for the list of numbers
 int[] numbers = getNumbers();
};

OK.

World’s easiest data visualization!

 fill(255,40);
 noStroke();
 for (int i = 0; i < numbers.length; i++) {
   ellipse(numbers[i] * 8, width/2, 8,8);
 };

What this does is to draw a row of dots across the screen, one for each number that occurs in our Google list. The dots are drawn with a low alpha (40/255 or about 16%), so when numbers are picked more than once, they get brighter. The result is a strip of dots across the screen that looks like this:

Right away, we can see a couple of things about the distribution of our ‘random’ numbers. First, there are two or three very bright spots where numbers get picked several times. Also, there are some pretty evident gaps (one right in the middle) where certain numbers don’t get picked at all.

This could be normal though, right? To see if this distribution is typical, let’s draw a line of ‘real’ random numbers below our line, and see if we can notice a difference:

fill(255,40);
 noStroke();
 //Our line of Google numbers
 for (int i = 0; i < numbers.length; i++) {
   ellipse(numbers[i] * 8, height/2, 8,8);
 };
 //A line of random numbers
 for (int i = 0; i < numbers.length; i++) {
   ellipse(ceil(random(0,99)) * 8, height/2 + 20, 8,8);
 };

Now we see the two compared:

The bottom, random line doesn’t seem to have as many bright spots or as evident of gaps as our human-picked line. Still, the difference isn’t that evident. Can you tell right away which line is our line from the group below?

OK. I’ll admit it – I was hoping that the human-picked number set would be more obviously divergent from the sets of numbers that were generated by a computer. It’s possible that humans are better at picking random numbers than I had thought. Or, our sample set is too small to see any kind of real difference. It’s also possible that this quick visualization method isn’t doing the trick. Let’s stay on the track of number distribution for a few minutes and see if we can find out any more.

Our system of dots was easy, and readable, but not very useful for empirical comparisons. For the next step, let’s stick with the classics and

Build a bar graph.

Right now, we have a list of numbers. Ours range from 1-99, but let’s imagine for a second that we had a set of numbers that ranged from 0-10:

[5,8,5,2,4,1,6,3,9,0,1,3,5,7]

What we need to build a bar graph for these numbers is a list of counts – how many times each number occurs:

[1,2,1,2,1,3,1,1,1,1]

We can look at this list above, and see that there were two 1s, and three 5s.

Let’s do the same thing with our big list of numbers – we’re going to generate a list 99 numbers long that holds the counts for each of the possible numbers in our set. But, we’re going to be a bit smarter about it this time around and package our code into a function – so that we can use it again and again without having to re-write it. In this case the function will (eventually) draw a bar graph – so we’ll call it (cleverly) barGraph:

void barGraph( int[] nums ) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };
};

This function constructs an array of counts from whatever list of numbers we pass into it (that list is a list of integers, and we refer to it within the function as ‘nums’, a name which I made up). Now, let’s add the code to draw the graph (I’ve added another parameter to go along with the numbers – the y position of the graph):


void barGraph(int[] nums, float y) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   rect(i * 8, y, 8, -counts[i] * 10);
 };
};

We’ve added a function – a set of instructions – to our file, which we can use to draw a bar graph from a set of numbers. To actually draw the graph, we need to call the function, which we can do in the setup enclosure. Here’s the code, all together:


/*

 #myrandomnumber Tutorial
 blprnt@blprnt.com
 April, 2010

 */

//This is the Google spreadsheet manager and the id of the spreadsheet that we want to populate, along with our Google username & password
SimpleSpreadsheetManager sm;
String sUrl = "t6mq_WLV5c5uj6mUNSryBIA";
String googleUser = "YOUR USERNAME";
String googlePass = "YOUR PASSWORD";

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 barGraph(numbers, 400);
};

void barGraph(int[] nums, float y) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   rect(i * 8, y, 8, -counts[i] * 10);
 };
};

void draw() {
  //This code happens once every frame.
};

If you run your code, you should get a nice minimal bar graph which looks like this:

We can help distinguish the very high values (and the very low ones) by adding some color to the graph. In Processing’s standard RGB color mode, we can change one of our color channels (in this case, green) with our count values to give the bars some differentiation:


 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   fill(255, counts[i] * 30, 0);
   rect(i * 8, y, 8, -counts[i] * 10);
 };

Which gives us this:

Or, we could switch to Hue/Saturation/Brightness mode, and use our count values to cycle through the available hues:

//Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255);
   rect(i * 8, y, 8, -counts[i] * 10);
 };

Which gives us this graph:

Now would be a good time to do some comparisons to a real random sample again, to see if the new coloring makes a difference. Because we defined our bar graph instructions as a function, we can do this fairly easily (I built in an easy function to generate a random list of integers called getRandomNumbers – you can see the code on the ‘GoogleCode’ tab):

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 barGraph(numbers, 100);

 for (int i = 1; i < 7; i++) {
 int[] randoms = getRandomNumbers(225);
 barGraph(randoms, 100 + (i * 130));
 };
};

I know, I know. Bar graphs. Yay. Looking at the graphic above, though, we can see more clearly that our humanoid number set is unlike the machine-generated sets. However, I actually think that the color is more valuable than the dimensions of the bars. Since we’re dealing with 99 numbers, maybe we can display these colours in a grid and see if any patterns emerge? A really important thing to be able to do with data visualization is to

Look at datasets from multiple angles.

Let’s see if the grid gets us anywhere. Luckily, a function to make a grid is pretty much the same as the one to make a graph (I’m adding two more parameters – an x position for the grid, and a size for the individual blocks):

void colorGrid(int[] nums, float x, float y, float s) {
 //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 0; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

//Move the drawing coordinates to the x,y position specified in the parameters
 pushMatrix();
 translate(x,y);
 //Draw the grid
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255, counts[i] * 30);
   rect((i % 10) * s, floor(i/10) * s, s, s);

 };
 popMatrix();
};

We can now do this to draw a nice big grid:

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 colorGrid(numbers, 50, 50, 70);

I can see some definite patterns in this grid – so let’s bring the actual numbers back into play so that we can talk about what seems to be going on. Here’s the full code, one last time:


/*

 #myrandomnumber Tutorial
 blprnt@blprnt.com
 April, 2010

 */

//This is the Google spreadsheet manager and the id of the spreadsheet that we want to populate, along with our Google username & password
SimpleSpreadsheetManager sm;
String sUrl = "t6mq_WLV5c5uj6mUNSryBIA";
String googleUser = "YOUR USERNAME";
String googlePass = "YOUR PASSWORD";

//This is the font object
PFont label;

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Create the font object to make text with
 label = createFont("Helvetica", 24);

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 colorGrid(numbers, 50, 50, 70);
};

void barGraph(int[] nums, float y) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255);
   rect(i * 8, y, 8, -counts[i] * 10);
 };
};

void colorGrid(int[] nums, float x, float y, float s) {
 //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 0; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 pushMatrix();
 translate(x,y);
 //Draw the grid
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255, counts[i] * 30);
   textAlign(CENTER);
   textFont(label);
   textSize(s/2);
   text(i, (i % 10) * s, floor(i/10) * s);
 };
 popMatrix();
};

void draw() {
  //This code happens once every frame.

};

And, our nice looking number grid:

BINGO!

No, really. If this was a bingo card, and I was a 70-year old, I’d be rich. Look at that nice line going down the X7 column – 17, 27, 37, 47, 57, 67, 77, 87, and 97 are all appearing with good frequency. If we rule out the Douglas Adams effect on 42, it is likely that most of the top 10 most-frequently occurring numbers would have a 7 on the end. Do numbers ending with 7s ‘feel’ more random to us? Or is there something about the number 7 that we just plain like?

Contrasting to that, if I had played the x0 row, I’d be out of luck. It seems that numbers ending with a zero don’t feel very random to us at all. This could also explain the black hole around the number 50 – which, in a range from 0-100, appears to be the ‘least random’ of all.

Well, there we have it. A start-to finish example of how we can use Processing to visualize simple data, with a goal to expose underlying patterns and anomalies. The techniques that we used in this project were fairly simple – but they are useful tools that can be used in a huge variety of data situations (I use them myself, all the time).

Hopefully this tutorial is (was?) useful for some of you. And, if there are any teachers out there who would like to try this out with their classrooms, I’d love to hear how it goes.

7 Days of Source Day #1: GoodMorning!

GoodMorning! Pinheads

Project: GoodMorning!
Date: August, 2009
Language: Processing
Key Concepts: Spherical coordinates, latitude & longitude conversion, Twitter API, MetaCarta API

Overview:

GoodMorning! is a global Twitter visualization tool. It allows tweets to be placed geographically and temporally, showing how a word or phrase is used around the world over a certain time period. You can watch a video here.

The project consists of two Processing sketches: a small sketch called TwitGather to gather tweets from the Twitter API and store them with location data in a JSON file, and a larger sketch which renders that data, and allows for export as .MOV and hi-res bitmap files. I’ve included a sample file which holds about 24 hours of ‘Good morning’ tweets.

This is likely the most complicated project that I will be releasing this week. I have tried to document the sketches as thoroughly as possible, but it may be confusing for beginners. However, there are lots of fairly simple things that can be gleaned by looking at various parts of the code.

To compile this project, you’ll need to make sure you have all of the libraries and other dependencies outlined below.

Getting Started:

Move the sketches into your Processing sketch folder. Open Processing and open the GoodMorning sketch from the File > Sketchbook menu. You’ll find detailed instructions in the header of the main tab (the GoodMorning.pde file).

Thanks:

GoodMorning uses a variety of libraries, to the authors of which I am extremely grateful:

  1. Karsten Schmidt’s toxiclibs coreutils for storing coordinates and doing vector math
  2. Andreas Köberle and Christian Riekoff’s surfaceLib for rendering the planet and cloud layer
  3. Yusuke Yamamoto’s excellent Twitter4J library for talking to the Twitter API (this library is included in the download)
  4. Marius Watz’ TileSaver class, which lets you output huge bitmaps from any Processing sketch.

Download: GoodMorning.zip (10.0MB)


CC-GNU GPL

This software is licensed under the CC-GNU GPL version 2.0 or later.

GoodMorning!

GoodMorning! Pinheads

GoodMorning! is a Twitter visualization tool that shows about 11,000 ‘good morning’ tweets over a 24 hour period, rendering a simple sample of Twitter activity around the globe. The tweets are colour-coded: green blocks are early tweets, orange ones are around 9am, and red tweets are later in the morning. Black blocks are ‘out of time’ tweets which said good morning (or a non-english equivalent) at a strange time in the day. Click on the image above to see North America in detail (click on the ‘all sizes’ button to see the high-res version, which is 6400×3600), or watch the video below to see the ‘good morning’ wave travel around the globe:

GoodMorning! Full Render #2 from blprnt on Vimeo.

I’ll admit that this isn’t a particularly useful visualization. It began as a quick idea that emerged out of the discussions following my post about Just Landed, in which several commenters asked to see a global version of that project. This would have been reasonably easy, but I felt that the 2D map made the important information in that visualization a bit easier to see. I wondered what type of data would be interesting to see on a globe, and started to think about something time based; more specifically, something in which you might see a kind of a wave traveling around the earth. I have been neck-deep in the Twitter API for a couple of months now, and eventually the idea trickled up to look at ‘good morning’ tweets.

The first task was to gather 24 hours worth of good morning tweets. Querying the Twitter API is easy enough – I posted a simple tutorial about doing this with Processing and Twitter4J a couple of weeks ago. The issue with gathering this many tweets is that you can only get the most recent 1,500 tweets with any search request. I needed many more than that – there are about 11,000 in the video, and those were only the ones from users with a valid location in their Twitter profile. All told, I ended up receiving upwards of 50,000 tweets. The only way to get this many results was to leave a ‘gathering’ client running for 24 hours. I should have put this on a server somewhere, but I didn’t, so my laptop needed to run the client for a full day to get the results. It ended up taking 5 days – a few false starts (the initial scripts choked on some strange iPhone locations), along with a couple of bone-headed errors on my part. Finally, I ended up with a JSON file which held the messages, users, dates and locations for a day’s worth of morning greetings.

I’d already decided to embrace the visual cliche that is the spinning globe. It’s reasonably easy to place points on a sphere once you know the latitude and longitude values – the first thing I did was to place all 11,000 points on the globe to see what kind of a distribution I ended up with. Not surprisingly, the points don’t cover the whole globe. I tried to include some non-english languages to encourage a more even distribution, but I don’t think I did the best job that I could have (if you have ideas for what to search for for other languages – particularly Asian ones, please leave a note in the comments). Still, I thought that there should be enough to get my ‘good morning wave’ going.

In my first attempts, I coloured the tweet blocks according to the size of the tweet. In these versions, you can see the wave, but it’s not very distinct:

GoodMorning! First Render from blprnt on Vimeo.

I needed some kind of colouring that would show the front of the wave – I ended up setting the colour of the blocks according to how far the time block was away from 9am, local time. This gave me the colour spectrum that you see in the latest versions.

Originally, I wanted to include the text in the tweets, but after a few very messy renders, I dropped that idea. I still think it might be possible to incorporate text in some way that doesn’t look like a pile of pick-up-sticks – I just haven’t found it yet. Here’s a render from the text-mess:

GoodMorning!

There are some inherent problems with this visualization. As mentioned earlier, it’s certainly not a complete representation of global Twitter users. Also, I’m relying on the location that the user lists in their Twitter profile to plot the points on the globe. It’s very likely that a high proportion of these locations could be inaccurate. Even if the location is correct, it might not be accurate enough to be useful. If you look at the bottom right of the images above, you’ll see a big plume of blocks in the middle of South America. This isn’t some undiscovered Twitter city in the middle of the jungle (El Tworado? Sorry.) – it’s the result of many users listing their location as simply ‘South America’. There’s one of these in every country and continent (this explains the cluster of Canadians tweeting from somewhere near Baker Lake).

On the other hand, it provides a model for how similar visualizations might be made – propagation maps of trending topics, plotting of followers over time, etc. Even in its current form, the tool does provide some interesting data – for example it seems that East Coasters tweet earlier than West Coaters (there’s more green in the East than in the West). I’m guessing that in the hands of people with more than my rudimentary statistics skills, these kinds of data sets could tell us some interesting – and (heaven forbid) useful things.

Too much information? Twitter, Privacy, and Lawrence Waterhouse

A few months ago, Twitter was in the news over concerns that people might be sharing a little bit too much information in their social networking broadcasts. Arizona resident Israel Hyman left for a vacation – but not before sending out a Tweet telling his friends and followers that he’d be out of town. While he was away, Hyman’s house was burgled. He suspected that his status update could have motivated the thieves to give his house a try.

Personally, I’m not convinced there are too many Twitter-saavy burglars (TweetBurglars?) out there. It’s probably much more likely that the would-be thief saw Mr. Hyman drive off in his station wagon with a canoe on top. In any case, I should be safe: I don’t broadcast much personal information in my Twitter feed.

At least not too much implicit personal information.

How much ‘hidden’ information am I sharing my Tweets? This is a question that has come up recently as I’ve been digging around with the Twitter API. I think that curious parties, armed with the Twitter API and some rudimentary programming skills might be able to find out more about you than you might think.

Here is a simple graphic showing all of my 1,212 Tweets since October of last year:

Twittergrams: All Tweets

The graph is ordered by day horizontally and by time vertically – tweets near the top are close to midnight, while those in the lower half were broadcast in the morning. I’ve also indicated the length of the tweet with the size of the circles – longer tweets show up larger and darker. You can see some trends from this really simple graph. First, it’s clear that my tweets have gotten longer and more frequent since I started using Twitter. Also, my first Tweets of the day (the bottom-most ones) seem to start on average at the same time – which a few notable anomalies. To investigate this a bit further, let’s highlight just the first tweets of the day (I’ve ignored days with 3 or fewer tweets):

Twittergrams: Morning Tweets

You can see from this plot that my morning messages tend to fall around 9:30am. There are a few outliers where I (heaven forbid) haven’t been around the computer – but there are also some deviations from the 9:30 norm that aren’t just statistical anomalies. If we plot some lines through the morning points starting in January this year, we’ll see the three areas where my twitter behaviour is not ‘normal’:

Twittergrams: Morning Tweet Graph

The yellow line in this graph is the 9:30 mean. The red line shows (with a bit of cushioning) the progression of  first tweet times over the 8 months in question. At the marked points, my first tweets deviate from the average. Why? In two of the tree cases, I’ve changed time zones. In March, I was in Munich, and in May/June I was in Boston, New York, and Minneapolis (you can actually see the time shift between EST & CST). In the zone marked with a 1, I was commuting in the morning to a residency in a Vancouver suburb – hence the later starts until February.

This is very simple example – certainly there isn’t a lot of useful (or incriminating) information to be found. But in the hands of a more capable investigator, it’s possible that the information underneath all of the Tweets, Facebook updates, Flickr comments, etc. that I am broadcasting everyday could reveal a lot more that I would want to share. Some nefarious party could quite easily set up a ‘tracker’ to watch my public broadcast, and to be notified if my daily behaviour deviates from the norm. TweetBurglars, are you listening?

Of course, all of this information can be useful for the good guys, too. With millions of people active on Twitter, the store of data – and what it can reveal – gets more and more interesting every day. We are already seeing scientists using web data to measure public happiness, but I think we have just scraped the surface of what could be uncovered. (To see a model of how Twitter updates could be used to track travel and disease spread, see my post on Just Landed).

In Neal Stephenson’s Cryptonomicon, he decribes a scenario in which an accurate map of London might be generated by tracking the heights of people’s heads as they stepped on and off of curves:

The curbs are sharp and perpendicular, not like the American smoothly molded sigmoid-cross-section curves. The transition between the side walk and the street is a crisp vertical. If you put a green lightbulb on Waterhouse’s head and watched him from the side during the blackout, his trajectory would look just like a square wave traced out on the face of a single-beam oscilloscope: up, down, up, down. If he were doing this at home, the curbs would be evenly spaced, about twelve to the mile, because his home town is neatly laid out on a grid.

Here in London, the street pattern is irregular and so the transitions in the square wave come at random-seeming times, sometimes very close together, sometimes very far apart.

A scientist watching the wave would probably despair of finding any pattern; it would look like a random circuit, driven by noise, triggered perhaps by the arrival of cosmic rays from deep space, or the decay of radioactive isotopes.

But if he had depth and ingenuity, it would be a different matter.

Depth could be obtained by putting a green light bulb on the head of every person in London and then recording their tracings for a few nights. The result would be a thick pile of graph-paper tracings, each one as seemingly random as the others. The thicker the pile, the greater the depth.

Ingenuity is a completely different matter. There is no systematic way to get it. One person could look at the pile of square wave tracings and see nothing but noise. Another might find a source of fascination there, an irrational feeling impossible to explain to anyone who did not share it. Some deep part of the mind, adept at noticing patterns (or the existence of a pattern) would stir awake and frantically signal the dull quotidian parts of the brain to keep looking at the pile of graph paper. The signal is dim and not always heeded, but it would instruct the recipient to stand there for days if necessary, shuffling through the pile of graphs like an autist, spreading them out over a large floor, stacking them in piles according to some inscrutable system, pencilling numbers, and letters from dead alphabets, into the corners, cross-referencing them, finding patterns, cross-checking them against others.

One day this person would walk out of that room carrying a highly accurate street map of London, reconstructed from the information in all of those square wave plots.

Stephenson tells us that success in such an endeavour requires depth and ingenuity. Depth, the internet has in spades. Millions of Twitter users are adding to a public dataset every second of every day. Ingenuity may prove a bit tougher, but with open APIs and thousands of clever curious investigators, it will be interesting to see what kinds of maps will be made – and to what means they will be used.

- I’ll be following up this post over the weekend with a preview of a new Twitter visualization that I have been working on – stay tuned!

Quick Tutorial: Twitter & Processing

Accessing information from the Twitter API with Processing is easy. A few people have sent me e-mails asking how it all works, so I thought I’d write a very quick tutorial to get everyone up on their feet.

We don’t need to know too much about how the Twitter API functions, because someone has put together a very useful Java library to do all of the dirty work for us. It’s called twitter4j, and you can download it here. Once you have it downloaded, we can get started.

1. Open up a new Processing sketch.
2. Import the twitter4j library – you do this by simply dragging the twitter4j-2.0.8.jar file onto your sketch window. If you want to check, you should now see this file in your sketch folder, inside of a new ‘code’ directory.
3. We’ll put the guts of this example into the setup enclosure, but you could wrap it into a function or build a simple Class around it if you’d like:

Twitter myTwitter;

void setup() {
  myTwitter = new Twitter("yourTwitterUserName", "yourTwitterPassword");
};

void draw() {
  
};

4. Now that we have a Twitter object, we want to build a query to search via the Twitter API for a specific term or phrase. This is code that will not always work – sometimes the Twitter API might be down, or our search might not return any results, or we might not be connected to the internet. The Twitter object in twitter4j handles those types of conditions by throwing back an exception to us; we need to have a try/catch structure ready to deal with that if it happens:

Twitter myTwitter;

void setup() {
  myTwitter = new Twitter("yourTwitterUserName", "yourTwitterPassword");
  try {
    
    Query query = new Query("sandwich");
    QueryResult result = myTwitter.search(query);    

  }
  catch (TwitterException te) {
    println("Couldn't connect: " + te);
  };
};
void draw() {
  
};

5. This code is working – but we haven’t done anything with the results. Here, we’ll set the results per page parameter for the query to 100 to get the last 100 results with the term ‘sandwich’ and spit those results into the output panel:

Twitter myTwitter;

void setup() {
  myTwitter = new Twitter("yourTwitterUserName", "yourTwitterPassword");
  try {
    
    Query query = new Query("sandwich");
    query.setRpp(100);
    QueryResult result = myTwitter.search(query);
    
    ArrayList tweets = (ArrayList) result.getTweets();
    
    for (int i = 0; i < tweets.size(); i++) {
      Tweet t = (Tweet) tweets.get(i);
      String user = t.getFromUser();
      String msg = t.getText();
      Date d = t.getCreatedAt();
      println("Tweet by " + user + " at " + d + ": " + msg);
    };
    
  }
  catch (TwitterException te) {
    println("Couldn't connect: " + te);
  };
};

void draw() {
  
};

That’s it! This example is very simple, but it’s the bare bones of what you need build a project which connects Twitter through Processing. Good luck!