Tag Archives: Processing

All The Names: Algorithmic Design and the 9/11 Memorial

In late October, 2009, I received an e-mail from Jake Barton from Local Projects, titled plainly ‘Potential Freelance Job’. I read the e-mail, and responded with a reply in two parts: First, I would love to work on the project. Second, I wasn’t sure that it could be done.

The project was to design an algorithm for placement of names on the 9/11 memorial in New York City. In architect Michael Arad‘s vision for the memorial, the names were to be laid according to where people were and who they were with when they died – not alphabetical, nor placed in a grid. Inscribed in bronze parapets, almost three thousand names would stream seamlessly around the memorial pools. Underneath this river of names, though, an arrangement would provide a meaningful framework; one which allows the names of family and friends to exist together. Victims would be linked through what Arad terms ‘meaningful adjacencies’ – connections that would reflect friendships, family bonds, and acts of heroism. through these connections, the memorial becomes a permanent embodiment of not only the many individual victims, but also of the relationships that were part of their lives before those tragic events.

Over several years, staff at the 9/11 Memorial Foundation undertook the painstaking process of collecting adjacency requests from the next of kin of the victims, creating a massive database of requested linkages. Often, several requests were made for each victim. There were more than one thousand adjacency requests in total, a complicated system of connections that all had to be addressed in the final arrangement. In mathematical terms, finding a layout that satisfied as many of these adjacency requests as possible is an optimization problem – a problem of finding the best solution among a myriad of possible ones. To solve this problem and to produce a layout that would give the Memorial Designers a structure to base their final arrangement of the names upon, we built a software tool in two parts: First, an arrangement algorithm that optimized this adjacency problem to find the best possible solution. And second, an interactive tool that allowed for human adjustment of the computer-generated layout.

The Algorithm

The solution for producing a solved layout for the names arrangement sat at the bottom of a precariously balanced stack of complex requirements.

First, there was the basic spatial problem – the names for each pool had to fit, evenly, into a set of 76 panels (18 panels per side plus one corner). 12 of these panels were irregularly shaped (the corners and the panels adjacent to the corners, as seen in the image at the top of this post). Because the names were to appear in one continuous flowing group around each pool, some names had to overlap between panels, crossing a thin, invisible expansion joint between the metal plates. This expansion joint was small enough it would fit in the space between many of the first and last names (or middle initials), but with certain combinations of letterforms (for example, a last name starting with a J, or a first name ending with a y), names were unable to cross this gap. As a result, the algorithm had to consider the typography of each name as it was placed into the layout.

Of course, the most challenging problem was to place the names within the panels while satisfying as many of the requested adjacencies as possible. There were more than 1200 adjacency requests that needed to be addressed. One of the first things that I did was to get an idea of how complex this network of relations was by building some radial maps:

Clearly, there was a dense set of relations that needed to be considered. On top of that, the algorithm needed to be aware of the physical form of each of the names, since longer names could offer more space for adjacency than smaller ones. Because each person could have more than one adjacency request, there were groups of victims who were linked together into large clusters of relationship – the largest one of these contained more than 70 names. Here is a map of the adjacency clusters within one of the major groupings:

On top of these crucial links between victim names, there were a set of larger groupings in which the names were required to be placed: affiliations (usually companies), and sub-affiliations (usually departments within companies). To complicate matters, many of the adjacency requests linked people in different affiliations and subaffiliations, so the ordering of these groupings had to be calculated in order to satisfy both the adjacency requests and the higher-level relationships.

At some point I plan on detailing the inner workings of the algorithm (built in Processing). For the time being, I’ll say that the total process is in fact a combination of several smaller routines: first, a clustering routine to make discrete sets of names in which the adjacency requests are satisfied. Second, a space filling process which places the clusters into the panels and fills available space with names from the appropriate groupings. Finally, there is a placement routine which manages the cross-panel names, and adjusts spacing within and between panels.

The end result from the algorithm is a layout which completes as many of the adjacency requests as completely as possible. With this system, we were able to produce layouts which satisfied more than 98% of the requested adjacencies.

The Tool

Early on in the project, it became clear that the final layout for the names arrangement would not come directly from the algorithm. While the computerized system was able to solve the logistical problems underlying the arrangement, it was not as good at addressing the myriad of aesthetic concerns. The final layout had to be reviewed by hand – the architects needed to be able to meticulously adjust spacing and placement so that the final layout would be precisely as they wanted it. With this in mind, we built a custom software tool which allowed the memorial team to make custom changes to the layout, while still keeping track of all of the adjacencies.

The tool, again built in Processing, allowed users to view the layouts in different modes, easily move names within and between panels, get overall statistics about adjacency fulfillment, and export SVG versions of the entire system for micro-level adjustments in Adobe Illustrator. In the image above, we see an early layout for the South Pool. The colouring here represents two levels of grouping within the names. Main colour blocks represent affiliations, while shading within those colour blocks represents sub-affiliations. Below, we see a close-up view of a single panel, with several names placed in the ‘drawer’ below. Names can be placed in that space when they need to be moved from one panel to another. Note that adjacency indications remain active while the names are in the drawer.

Other features were built in to make the process of finalizing the layout as easy as possible: users could search for individual names, as well as affiliations and sub-affiliations; a change-tracking system allowed users to see how a layout had changed over multiple saved versions, and a variety of interface options allowed for precise placement of names within panels. Here is a video of the tool in use, again using a preliminary layout for the South Pool:

Computer Assisted Design

It would be misleading to say that the layout for the final memorial was produced by an algorithm. Rather, the underlying framework of the arrangement was solved by the algorithm, and humans used that framework to design the final result. This is, I think, a perfect example of something that I’ve believed in for a long time: we should let computers do what computers do best, and let humans do what humans do best. In this case, the computer was able to evaluate millions of possible solutions for the layout, manage a complex relational system, and track a large set of important variables and measurements. Humans, on the other hand, could focus on aesthetic and compositional choices. It would have been very hard (or impossible) for humans to do what the computer did. At the same time, it would have been very difficult to program the computer to handle the tasks that were completed with such dedication and precision by the architects and the memorial team.

The Weight of Data

This post has been a technical documentation of a very challenging project. Of course, on a emotional level, the project was also incredibly demanding. I was very aware throughout the process that the names that I was working with were the names of fathers and sons, mothers, daughters, best friends, and lovers.

Lawrence Davidson. Theresa Munson. Muhammadou Jawara. Nobuhiro Hayatsu. Hernando Salas. Mary Trentini.

In the days and months that I worked on the arrangement algorithm and the placement tool, I found myself frequently speaking these names out loud. Though I didn’t personally know anyone who was killed that day, I would come to feel a connection to each of them.

This project was a very real reminder that information carries weight. While names of the dead may be the heaviest data of all, almost every number or word we work with bears some link to a significant piece of the real world. It’s easy to download a data set – census information, earthquake records, homelessness figures – and forget that the numbers represent real lives. As designers, artists, and researchers, we always need to consider the true source of data, and the moral responsibility which they carry.

Ultimately, my role in the construction of the memorial was a small, and largely invisible one. Rightly, visitors to the site will never know that an algorithm assisted in its design. Still, it is rewarding to think that my work with Local Projects has helped to add meaning to such an important monument.

The 9/11 Memorial will be dedicated on September 11, 2011 and opens to the public with the reservation of a visitor pass on September 12th.

Text Comparison Tool: SOURCE CODE

About this time last year, I built a light-weight tool in Processing to compare two articles that I read about head injuries in the NFL. Later, I extended the tool to compare any two texts, and promised a source release.

Well, it only took 12 months, but I’ve finally cleaned up the code to a point where I think it will be reasonably easy to use and helpful for those who might want to learn a bit more about the code.

I always find myself in a tricky position with source releases. Often, as in the case of this tool, I have ideas about what the project should look like before it gets released. Here, I wanted to build an interface to allow people to select different text sources within the app, so that people could use the app without having to compile it from Processing. This is what delayed the release of the code for a year – I was waiting for the time to get this last piece done.

Two weeks ago, though, I has a chance to speak at an event with Tahir Hemphill. I had sent Tahir a messy version of the project for use with his Hip Hop Word Count initiative a few months back, and he used it to analyze a famous rap battle between Nas and Jay-Z:

Text Correlation Tool: Nas Versus Jay / Ether Versus The Takeover from Staple Crops on Vimeo.

This reminded me of a fairly valuable lesson as far as source code is concerned: If it works, it’s probably good enough to release.

So, without too much further ado, here’s a link to download the Text Comparison Tool:

Text Comparison Tool (1.1MB)

It’s pretty easy to use. First, you’ll need Processing to open and work with the sketch. Also, you’ll need the toxiclibs core library installed. Assuming you have those two things, these are the steps:

1. Drag the unzipped folder into your sketchbook.
2. Place your text files in the sketch’s data folder.
3. Open the sketch.
4. Look for the code block at the top of the main tab where the article information is set. It’s pretty clearly marked, and looks like this:

String title1 = "Tokyo";
String url1 = "asia.txt";
String desc1 = "Suntory Hall";
String date1 = "November 14th, 2009";
color color1 = #140047;

String title2 = "Cairo";
String url2 = "cairo.txt";
String desc2 = "Cairo University";
String date2 = "June 4th, 2009";
color color2 = #680014;

5. Replace the information here with appropriate information for your files.
6. Run the sketch!

That’s it. If you are getting strange results, you can tweak the clean() and cleanBody() methods at the bottom of the main tab to control how your text is filtered.

Hopefully I’ll still find the time to package this thing up in a bit more of a user-friendly form. But, in the meantime, hopefully people will find this useful as an exploratory tool. Note that at any time you can press the ‘s’ key to save out an image – if you find some interesting texts to compare, let me know!

Processing Tip: Rendering Large Amounts of Text… Fast!

I’ve been working on a project at the New York Times R&D Lab which involves rendering a fair amount of text to screen, which needs to be animated smoothly in real time. I’ve never had much luck working with large amounts of text in Processing, using the traditional approach. But I’ve found a nice way using an external library and a old trick, which allows for the rendering of large amounts of text at high framerates (the image above shows 1000+ text blocks running at ~50fps on my two-year-old MacBook).

How does it work?

Well, let’s first talk about how we’d typically approach this problem. Because they need to be animated independently, each of the text blocks is a member of a custom Class. Once per frame, we loop through all of the text objects, and ask them to render. In the old-school way, each of these text blocks would render using Processing’s text() method, and all of the shapes would be drawn to screen. While this works well with limited amounts of text, if we add more text objects to the sketch, things start to slow down at an alarming rate:

Here, I’ve barely added 200 objects, and the frame rate has crawled down to a measly 10fps. Clearly not good.

Blit it.

The first part of the solution is to use an old-school technique called ‘blitting’. Now, before all of you demo-scene kids get all worked up, I know this isn’t really blitting, but I like the sound of the word, and it’s my tutorial, so deal with it. Instead of drawing every letter of every text block every frame, what we’re going to do is this:

  1. Render the text once to an off-screen graphics object
  2. Transfer the pixels from this off-screen object to a PImage
  3. Render that PImage every frame

The core reason this works is that Processing is much better at rendering hundreds of images than it is at rendering hundreds of text objects. With this implemented, we can get many, many more objects onto our screen:

Here, we’ve got about 550 objects on the screen, at full framerate. But, at least on my Mac, this system seems to collapse after about 600 objects. So, how can we get up to the 1000 objects that I showed in the first image in this post?

Let’s push it a little.

The last step of this process owes its existence to Andres Colubri, who has built an excellent library called GLGraphics which lets us do all kinds of things in Processing that get accelerated by the super-fancy graphics cards (GPUs) which most of us are lucky enough to have in our computers (Andres is also responsible for all of the tasty Processing for Android 3D stuff which I’ll be posting about later this week). In this case, the final step involves rendering all of the text blocks as GL Textures, instead of as PImages (note that this will give you a performance boost in any case where you are using images – yay!)

Around now you’re probably asking about the code. Well, here it is. I’ve tried to make it as simple as possible – just remember that you need the GLGraphics library installed.

GL Text Example (4kb)

This solution isn’t perfect – it doesn’t allow for perfect scaling, or dynamic effects within the text objects. But is does solve the basic problem, and also gives us a set of GLTextures which are ready to be used in whatever wacky 3D system we want to put them in. Let me know how it works for you.

Your Random Numbers – Getting Started with Processing and Data Visualization

Over the last year or so, I’ve spent almost as much time thinking about how to teach data visualization as I’ve spent working with data. I’ve been a teacher for 10 years – for better or for worse this means that as I learn new techniques and concepts, I’m usually thinking about pedagogy at the same time. Lately, I’ve also become convinced that this massive ‘open data’ movement that we are currently in the midst of is sorely lacking in educational components. The amount of available data, I think, is quickly outpacing our ability to use it in useful and novel ways. How can basic data visualization techniques be taught in an easy, engaging manner?

This post, then, is a first sketch of what a lesson plan for teaching Processing and data visualization might look like. I’m going to start from scratch, work through some examples, and (hopefully) make some interesting stuff. One of the nice things, I think, about this process, is that we’re going to start with fresh, new data – I’m not sure what kind of things we’re going to find once we start to get our hands dirty. This is what is really exciting about data visualization; the chance to find answers to your own, possibly novel questions.

Let’s Start With the Data

We’re not going to work with an old, dusty data set here. Nor are we going to attempt to bash our heads against an unnecessarily complex pile of numbers. Instead, we’re going to start with a data set that I made up – with the help of a couple of hundred of my Twitter followers. Yesterday morning, I posted this request:

Even on a Saturday, a lot of helpful folks pitched in, and I ended up with about 225 numbers. And so, we have the easiest possible dataset to work with – a single list of whole numbers. I’m hoping that, as well as being simple, this dataset will turn out to be quite interesting – maybe telling us something about how the human brain thinks about numbers.

I wrote a quick Processing sketch to scrape out the numbers from the post, and then to put them into a Google Spreadsheet. You can see the whole dataset here: http://spreadsheets.google.com/pub?key=t6mq_WLV5c5uj6mUNSryBIA&output=html

I chose to start from a Google Spreadsheet in this tutorial, because I wanted people to be able to generate their own datasets to work with. Teachers – you can set up a spreadsheet of your own, and get your students to collect numbers by any means you’d like. The ‘User’ and ‘Tweet’ columns are not necessary; you just need to have a column called ‘Number’.

It’s about time to get down to some coding. The only tricky part in this whole process will be connecting to the Google Spreadsheet. Rather than bog down the tutorial with a lot of confusing semi-advanced code, I’ll let you download this sample sketch which has the Google Spreadsheet machinery in place.

Got it? Great. Open that sketch in Processing, and let’s get started. Just to make sure we’re all in the same place, you should see a screen that looks like this:

At the top of the sketch, you’ll see three String values that you can change. You’ll definitely have to enter your own Google username and password. If you have your own spreadsheet of number data, you can enter in the key for your spreadsheet as well. You can find the key right in the URL of any spreadsheet.

The first thing we’ll do is change the size of our sketch to give us some room to move, set the background color, and turn on smoothing to make things pretty. We do all of this in the setup enclosure:

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();
};

Now we need to get our data from the spreadsheet. One of the advantages of accessing the data from a shared remote file is that the remote data can change and we don’t have to worry about replacing files or changing our code.

We’re going to ask for a list of the ‘random’ numbers that are stored in the spreadsheet. The most easy way to store lists of things in Processing is in an Array. In this case, we’re looking for an array of whole numbers – integers. I’ve written a function that gets an integer array from Google – you can take a look at the code on the ‘GoogleCode’ tab if you’d like to see how that is done. What we need to know here is that this function – called getNumbers – will return, or send us back, a list of whole numbers. Let’s ask for that list:

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Ask for the list of numbers
 int[] numbers = getNumbers();
};

OK.

World’s easiest data visualization!

 fill(255,40);
 noStroke();
 for (int i = 0; i < numbers.length; i++) {
   ellipse(numbers[i] * 8, width/2, 8,8);
 };

What this does is to draw a row of dots across the screen, one for each number that occurs in our Google list. The dots are drawn with a low alpha (40/255 or about 16%), so when numbers are picked more than once, they get brighter. The result is a strip of dots across the screen that looks like this:

Right away, we can see a couple of things about the distribution of our ‘random’ numbers. First, there are two or three very bright spots where numbers get picked several times. Also, there are some pretty evident gaps (one right in the middle) where certain numbers don’t get picked at all.

This could be normal though, right? To see if this distribution is typical, let’s draw a line of ‘real’ random numbers below our line, and see if we can notice a difference:

fill(255,40);
 noStroke();
 //Our line of Google numbers
 for (int i = 0; i < numbers.length; i++) {
   ellipse(numbers[i] * 8, height/2, 8,8);
 };
 //A line of random numbers
 for (int i = 0; i < numbers.length; i++) {
   ellipse(ceil(random(0,99)) * 8, height/2 + 20, 8,8);
 };

Now we see the two compared:

The bottom, random line doesn’t seem to have as many bright spots or as evident of gaps as our human-picked line. Still, the difference isn’t that evident. Can you tell right away which line is our line from the group below?

OK. I’ll admit it – I was hoping that the human-picked number set would be more obviously divergent from the sets of numbers that were generated by a computer. It’s possible that humans are better at picking random numbers than I had thought. Or, our sample set is too small to see any kind of real difference. It’s also possible that this quick visualization method isn’t doing the trick. Let’s stay on the track of number distribution for a few minutes and see if we can find out any more.

Our system of dots was easy, and readable, but not very useful for empirical comparisons. For the next step, let’s stick with the classics and

Build a bar graph.

Right now, we have a list of numbers. Ours range from 1-99, but let’s imagine for a second that we had a set of numbers that ranged from 0-10:

[5,8,5,2,4,1,6,3,9,0,1,3,5,7]

What we need to build a bar graph for these numbers is a list of counts – how many times each number occurs:

[1,2,1,2,1,3,1,1,1,1]

We can look at this list above, and see that there were two 1s, and three 5s.

Let’s do the same thing with our big list of numbers – we’re going to generate a list 99 numbers long that holds the counts for each of the possible numbers in our set. But, we’re going to be a bit smarter about it this time around and package our code into a function – so that we can use it again and again without having to re-write it. In this case the function will (eventually) draw a bar graph – so we’ll call it (cleverly) barGraph:

void barGraph( int[] nums ) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };
};

This function constructs an array of counts from whatever list of numbers we pass into it (that list is a list of integers, and we refer to it within the function as ‘nums’, a name which I made up). Now, let’s add the code to draw the graph (I’ve added another parameter to go along with the numbers – the y position of the graph):


void barGraph(int[] nums, float y) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   rect(i * 8, y, 8, -counts[i] * 10);
 };
};

We’ve added a function – a set of instructions – to our file, which we can use to draw a bar graph from a set of numbers. To actually draw the graph, we need to call the function, which we can do in the setup enclosure. Here’s the code, all together:


/*

 #myrandomnumber Tutorial
 blprnt@blprnt.com
 April, 2010

 */

//This is the Google spreadsheet manager and the id of the spreadsheet that we want to populate, along with our Google username & password
SimpleSpreadsheetManager sm;
String sUrl = "t6mq_WLV5c5uj6mUNSryBIA";
String googleUser = "YOUR USERNAME";
String googlePass = "YOUR PASSWORD";

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 barGraph(numbers, 400);
};

void barGraph(int[] nums, float y) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   rect(i * 8, y, 8, -counts[i] * 10);
 };
};

void draw() {
  //This code happens once every frame.
};

If you run your code, you should get a nice minimal bar graph which looks like this:

We can help distinguish the very high values (and the very low ones) by adding some color to the graph. In Processing’s standard RGB color mode, we can change one of our color channels (in this case, green) with our count values to give the bars some differentiation:


 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   fill(255, counts[i] * 30, 0);
   rect(i * 8, y, 8, -counts[i] * 10);
 };

Which gives us this:

Or, we could switch to Hue/Saturation/Brightness mode, and use our count values to cycle through the available hues:

//Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255);
   rect(i * 8, y, 8, -counts[i] * 10);
 };

Which gives us this graph:

Now would be a good time to do some comparisons to a real random sample again, to see if the new coloring makes a difference. Because we defined our bar graph instructions as a function, we can do this fairly easily (I built in an easy function to generate a random list of integers called getRandomNumbers – you can see the code on the ‘GoogleCode’ tab):

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 barGraph(numbers, 100);

 for (int i = 1; i < 7; i++) {
 int[] randoms = getRandomNumbers(225);
 barGraph(randoms, 100 + (i * 130));
 };
};

I know, I know. Bar graphs. Yay. Looking at the graphic above, though, we can see more clearly that our humanoid number set is unlike the machine-generated sets. However, I actually think that the color is more valuable than the dimensions of the bars. Since we’re dealing with 99 numbers, maybe we can display these colours in a grid and see if any patterns emerge? A really important thing to be able to do with data visualization is to

Look at datasets from multiple angles.

Let’s see if the grid gets us anywhere. Luckily, a function to make a grid is pretty much the same as the one to make a graph (I’m adding two more parameters – an x position for the grid, and a size for the individual blocks):

void colorGrid(int[] nums, float x, float y, float s) {
 //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 0; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

//Move the drawing coordinates to the x,y position specified in the parameters
 pushMatrix();
 translate(x,y);
 //Draw the grid
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255, counts[i] * 30);
   rect((i % 10) * s, floor(i/10) * s, s, s);

 };
 popMatrix();
};

We can now do this to draw a nice big grid:

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 colorGrid(numbers, 50, 50, 70);

I can see some definite patterns in this grid – so let’s bring the actual numbers back into play so that we can talk about what seems to be going on. Here’s the full code, one last time:


/*

 #myrandomnumber Tutorial
 blprnt@blprnt.com
 April, 2010

 */

//This is the Google spreadsheet manager and the id of the spreadsheet that we want to populate, along with our Google username & password
SimpleSpreadsheetManager sm;
String sUrl = "t6mq_WLV5c5uj6mUNSryBIA";
String googleUser = "YOUR USERNAME";
String googlePass = "YOUR PASSWORD";

//This is the font object
PFont label;

void setup() {
  //This code happens once, right when our sketch is launched
 size(800,800);
 background(0);
 smooth();

 //Create the font object to make text with
 label = createFont("Helvetica", 24);

 //Ask for the list of numbers
 int[] numbers = getNumbers();
 //Draw the graph
 colorGrid(numbers, 50, 50, 70);
};

void barGraph(int[] nums, float y) {
  //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 1; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 //Draw the bar graph
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255);
   rect(i * 8, y, 8, -counts[i] * 10);
 };
};

void colorGrid(int[] nums, float x, float y, float s) {
 //Make a list of number counts
 int[] counts = new int[100];
 //Fill it with zeros
 for (int i = 0; i < 100; i++) {
   counts[i] = 0;
 };
 //Tally the counts
 for (int i = 0; i < nums.length; i++) {
   counts[nums[i]] ++;
 };

 pushMatrix();
 translate(x,y);
 //Draw the grid
 for (int i = 0; i < counts.length; i++) {
   colorMode(HSB);
   fill(counts[i] * 30, 255, 255, counts[i] * 30);
   textAlign(CENTER);
   textFont(label);
   textSize(s/2);
   text(i, (i % 10) * s, floor(i/10) * s);
 };
 popMatrix();
};

void draw() {
  //This code happens once every frame.

};

And, our nice looking number grid:

BINGO!

No, really. If this was a bingo card, and I was a 70-year old, I’d be rich. Look at that nice line going down the X7 column – 17, 27, 37, 47, 57, 67, 77, 87, and 97 are all appearing with good frequency. If we rule out the Douglas Adams effect on 42, it is likely that most of the top 10 most-frequently occurring numbers would have a 7 on the end. Do numbers ending with 7s ‘feel’ more random to us? Or is there something about the number 7 that we just plain like?

Contrasting to that, if I had played the x0 row, I’d be out of luck. It seems that numbers ending with a zero don’t feel very random to us at all. This could also explain the black hole around the number 50 – which, in a range from 0-100, appears to be the ‘least random’ of all.

Well, there we have it. A start-to finish example of how we can use Processing to visualize simple data, with a goal to expose underlying patterns and anomalies. The techniques that we used in this project were fairly simple – but they are useful tools that can be used in a huge variety of data situations (I use them myself, all the time).

Hopefully this tutorial is (was?) useful for some of you. And, if there are any teachers out there who would like to try this out with their classrooms, I’d love to hear how it goes.

State of the Union(s)

New York Times, 01/27/10 - State of the Union Graphic

I was asked at the end of last week to produce a graphic for the Opinion page today – the idea was to compare the texts of various ‘state of the union’ addresses from around the world. The final result (pictured above) is not extraordinarily data-heavy. It worked quite nicely in the printed layout, where the individual ‘tentacles’ trailed to the text of the speeches that they index.

The process behind this piece was relatively simple. Each speech was indexed using a Processing application that I wrote which counts the frequency of individual names (the program ignores commonly used or unimportant words). The words for each speech were then ranked by mentions per thousand words (you can see a version of the piece with numbers here)

Almost every project I work on involves a period of ‘data exploration’ in which I try shake as many interesting things out of the information as I can. Even though this piece had a short turn-around, I did a fair amount of poking around, generating some simple bar graphs:

State of the Union Graphs

Another avenue I explored was to use the word weights to determine a ‘score’ for each sentence. By doing this, I can try to find the ‘kernel’ of the speech – the sentence that sums up the entire text in the most succinct way. This, I think was fairly successful. Here are the ‘power sentences’ for the UK:

SOTU analysis - Sentence Weighting- UK

The Netherlands:

SOTU analysis - Sentence Weighting - Netherlands

And Botswana:

SOTU analysis - Sentence Weighting - Botswana

Which brings us to tonight’s State of the Union Address by Barack Obama. What was the ‘power sentence’ from this speech? I ran the weighting algorithm on the address and this is what it came up with:

The Most Important Sentence From Obama's State of the Union Address?