Over the last year or so, I’ve spent almost as much time thinking about how to teach data visualization as I’ve spent working with data. I’ve been a teacher for 10 years – for better or for worse this means that as I learn new techniques and concepts, I’m usually thinking about pedagogy at the same time. Lately, I’ve also become convinced that this massive ‘open data’ movement that we are currently in the midst of is sorely lacking in educational components. The amount of available data, I think, is quickly outpacing our ability to use it in useful and novel ways. How can basic data visualization techniques be taught in an easy, engaging manner?

This post, then, is a first sketch of what a lesson plan for teaching Processing and data visualization might look like. I’m going to start from scratch, work through some examples, and (hopefully) make some interesting stuff. One of the nice things, I think, about this process, is that we’re going to start with fresh, new data – I’m not sure what kind of things we’re going to find once we start to get our hands dirty. This is what is really exciting about data visualization; the chance to find answers to your own, possibly novel questions.

**Let’s Start With the Data**

We’re not going to work with an old, dusty data set here. Nor are we going to attempt to bash our heads against an unnecessarily complex pile of numbers. Instead, we’re going to start with a data set that I made up – with the help of a couple of hundred of my Twitter followers. Yesterday morning, I posted this request:

Even on a Saturday, a lot of helpful folks pitched in, and I ended up with about 225 numbers. And so, we have the easiest possible dataset to work with – a single list of whole numbers. I’m hoping that, as well as being simple, this dataset will turn out to be quite interesting – maybe telling us something about how the human brain thinks about numbers.

I wrote a quick Processing sketch to scrape out the numbers from the post, and then to put them into a Google Spreadsheet. You can see the whole dataset here: http://spreadsheets.google.com/pub?key=t6mq_WLV5c5uj6mUNSryBIA&output=html

I chose to start from a Google Spreadsheet in this tutorial, because I wanted people to be able to generate their own datasets to work with. Teachers – you can set up a spreadsheet of your own, and get your students to collect numbers by any means you’d like. The ‘User’ and ‘Tweet’ columns are not necessary; you just need to have a column called ‘Number’.

It’s about time to get down to some coding. The only tricky part in this whole process will be connecting to the Google Spreadsheet. Rather than bog down the tutorial with a lot of confusing semi-advanced code, I’ll let you download this sample sketch which has the Google Spreadsheet machinery in place.

Got it? Great. Open that sketch in Processing, and let’s get started. Just to make sure we’re all in the same place, you should see a screen that looks like this:

At the top of the sketch, you’ll see three String values that you can change. You’ll definitely have to enter your own Google username and password. If you have your own spreadsheet of number data, you can enter in the key for your spreadsheet as well. You can find the key right in the URL of any spreadsheet.

The first thing we’ll do is change the size of our sketch to give us some room to move, set the background color, and turn on smoothing to make things pretty. We do all of this in the setup enclosure:

void setup() { //This code happens once, right when our sketch is launched size(800,800); background(0); smooth(); };

Now we need to get our data from the spreadsheet. One of the advantages of accessing the data from a shared remote file is that the remote data can change and we don’t have to worry about replacing files or changing our code.

We’re going to ask for a list of the ‘random’ numbers that are stored in the spreadsheet. The most easy way to store lists of things in Processing is in an Array. In this case, we’re looking for an array of whole numbers – integers. I’ve written a function that gets an integer array from Google – you can take a look at the code on the ‘GoogleCode’ tab if you’d like to see how that is done. What we need to know here is that this function – called getNumbers – will return, or send us back, a list of whole numbers. Let’s ask for that list:

void setup() { //This code happens once, right when our sketch is launched size(800,800); background(0); smooth(); //Ask for the list of numbers int[] numbers = getNumbers(); };

OK.

**World’s easiest data visualization!**

fill(255,40); noStroke(); for (int i = 0; i < numbers.length; i++) { ellipse(numbers[i] * 8, width/2, 8,8); };

What this does is to draw a row of dots across the screen, one for each number that occurs in our Google list. The dots are drawn with a low alpha (40/255 or about 16%), so when numbers are picked more than once, they get brighter. The result is a strip of dots across the screen that looks like this:

Right away, we can see a couple of things about the distribution of our ‘random’ numbers. First, there are two or three very bright spots where numbers get picked several times. Also, there are some pretty evident gaps (one right in the middle) where certain numbers don’t get picked at all.

This could be normal though, right? To see if this distribution is typical, let’s draw a line of ‘real’ random numbers below our line, and see if we can notice a difference:

fill(255,40); noStroke(); //Our line of Google numbers for (int i = 0; i < numbers.length; i++) { ellipse(numbers[i] * 8, height/2, 8,8); }; //A line of random numbers for (int i = 0; i < numbers.length; i++) { ellipse(ceil(random(0,99)) * 8, height/2 + 20, 8,8); };

Now we see the two compared:

The bottom, random line doesn’t seem to have as many bright spots or as evident of gaps as our human-picked line. Still, the difference isn’t that evident. Can you tell right away which line is our line from the group below?

OK. I’ll admit it – I was hoping that the human-picked number set would be more obviously divergent from the sets of numbers that were generated by a computer. It’s possible that humans are better at picking random numbers than I had thought. Or, our sample set is too small to see any kind of real difference. It’s also possible that this quick visualization method isn’t doing the trick. Let’s stay on the track of number distribution for a few minutes and see if we can find out any more.

Our system of dots was easy, and readable, but not very useful for empirical comparisons. For the next step, let’s stick with the classics and

**Build a bar graph.**

Right now, we have a list of numbers. Ours range from 1-99, but let’s imagine for a second that we had a set of numbers that ranged from 0-10:

[5,8,5,2,4,1,6,3,9,0,1,3,5,7]

What we need to build a bar graph for these numbers is a list of *counts* – how many times each number occurs:

[1,2,1,2,1,3,1,1,1,1]

We can look at this list above, and see that there were two 1s, and three 5s.

Let’s do the same thing with our big list of numbers – we’re going to generate a list 99 numbers long that holds the counts for each of the possible numbers in our set. But, we’re going to be a bit smarter about it this time around and package our code into a function – so that we can use it again and again without having to re-write it. In this case the function will (eventually) draw a bar graph – so we’ll call it (cleverly) barGraph:

void barGraph( int[] nums ) { //Make a list of number counts int[] counts = new int[100]; //Fill it with zeros for (int i = 1; i < 100; i++) { counts[i] = 0; }; //Tally the counts for (int i = 0; i < nums.length; i++) { counts[nums[i]] ++; }; };

This function constructs an array of counts from whatever list of numbers we pass into it (that list is a list of integers, and we refer to it within the function as ‘nums’, a name which I made up). Now, let’s add the code to draw the graph (I’ve added another parameter to go along with the numbers – the y position of the graph):

void barGraph(int[] nums, float y) { //Make a list of number counts int[] counts = new int[100]; //Fill it with zeros for (int i = 1; i < 100; i++) { counts[i] = 0; }; //Tally the counts for (int i = 0; i < nums.length; i++) { counts[nums[i]] ++; }; //Draw the bar graph for (int i = 0; i < counts.length; i++) { rect(i * 8, y, 8, -counts[i] * 10); }; };

We’ve added a function – a set of instructions – to our file, which we can use to draw a bar graph from a set of numbers. To actually draw the graph, we need to call the function, which we can do in the setup enclosure. Here’s the code, all together:

/* #myrandomnumber Tutorial blprnt@blprnt.com April, 2010 */ //This is the Google spreadsheet manager and the id of the spreadsheet that we want to populate, along with our Google username & password SimpleSpreadsheetManager sm; String sUrl = "t6mq_WLV5c5uj6mUNSryBIA"; String googleUser = "YOUR USERNAME"; String googlePass = "YOUR PASSWORD"; void setup() { //This code happens once, right when our sketch is launched size(800,800); background(0); smooth(); //Ask for the list of numbers int[] numbers = getNumbers(); //Draw the graph barGraph(numbers, 400); }; void barGraph(int[] nums, float y) { //Make a list of number counts int[] counts = new int[100]; //Fill it with zeros for (int i = 1; i < 100; i++) { counts[i] = 0; }; //Tally the counts for (int i = 0; i < nums.length; i++) { counts[nums[i]] ++; }; //Draw the bar graph for (int i = 0; i < counts.length; i++) { rect(i * 8, y, 8, -counts[i] * 10); }; }; void draw() { //This code happens once every frame. };

If you run your code, you should get a nice minimal bar graph which looks like this:

We can help distinguish the very high values (and the very low ones) by adding some color to the graph. In Processing’s standard RGB color mode, we can change one of our color channels (in this case, green) with our count values to give the bars some differentiation:

//Draw the bar graph for (int i = 0; i < counts.length; i++) { fill(255, counts[i] * 30, 0); rect(i * 8, y, 8, -counts[i] * 10); };

Which gives us this:

Or, we could switch to Hue/Saturation/Brightness mode, and use our count values to cycle through the available hues:

//Draw the bar graph for (int i = 0; i < counts.length; i++) { colorMode(HSB); fill(counts[i] * 30, 255, 255); rect(i * 8, y, 8, -counts[i] * 10); };

Which gives us this graph:

Now would be a good time to do some comparisons to a real random sample again, to see if the new coloring makes a difference. Because we defined our bar graph instructions as a function, we can do this fairly easily (I built in an easy function to generate a random list of integers called getRandomNumbers – you can see the code on the ‘GoogleCode’ tab):

void setup() { //This code happens once, right when our sketch is launched size(800,800); background(0); smooth(); //Ask for the list of numbers int[] numbers = getNumbers(); //Draw the graph barGraph(numbers, 100); for (int i = 1; i < 7; i++) { int[] randoms = getRandomNumbers(225); barGraph(randoms, 100 + (i * 130)); }; };

I know, I know. Bar graphs. Yay. Looking at the graphic above, though, we can see more clearly that our humanoid number set is unlike the machine-generated sets. However, I actually think that the color is more valuable than the dimensions of the bars. Since we’re dealing with 99 numbers, maybe we can display these colours in a grid and see if any patterns emerge? A really important thing to be able to do with data visualization is to

**Look at datasets from multiple angles.**

Let’s see if the grid gets us anywhere. Luckily, a function to make a grid is pretty much the same as the one to make a graph (I’m adding two more parameters – an x position for the grid, and a size for the individual blocks):

void colorGrid(int[] nums, float x, float y, float s) { //Make a list of number counts int[] counts = new int[100]; //Fill it with zeros for (int i = 0; i < 100; i++) { counts[i] = 0; }; //Tally the counts for (int i = 0; i < nums.length; i++) { counts[nums[i]] ++; }; //Move the drawing coordinates to the x,y position specified in the parameters pushMatrix(); translate(x,y); //Draw the grid for (int i = 0; i < counts.length; i++) { colorMode(HSB); fill(counts[i] * 30, 255, 255, counts[i] * 30); rect((i % 10) * s, floor(i/10) * s, s, s); }; popMatrix(); };

We can now do this to draw a nice big grid:

//Ask for the list of numbers int[] numbers = getNumbers(); //Draw the graph colorGrid(numbers, 50, 50, 70);

I can see some definite patterns in this grid – so let’s bring the actual numbers back into play so that we can talk about what seems to be going on. Here’s the full code, one last time:

/* #myrandomnumber Tutorial blprnt@blprnt.com April, 2010 */ //This is the Google spreadsheet manager and the id of the spreadsheet that we want to populate, along with our Google username & password SimpleSpreadsheetManager sm; String sUrl = "t6mq_WLV5c5uj6mUNSryBIA"; String googleUser = "YOUR USERNAME"; String googlePass = "YOUR PASSWORD"; //This is the font object PFont label; void setup() { //This code happens once, right when our sketch is launched size(800,800); background(0); smooth(); //Create the font object to make text with label = createFont("Helvetica", 24); //Ask for the list of numbers int[] numbers = getNumbers(); //Draw the graph colorGrid(numbers, 50, 50, 70); }; void barGraph(int[] nums, float y) { //Make a list of number counts int[] counts = new int[100]; //Fill it with zeros for (int i = 1; i < 100; i++) { counts[i] = 0; }; //Tally the counts for (int i = 0; i < nums.length; i++) { counts[nums[i]] ++; }; //Draw the bar graph for (int i = 0; i < counts.length; i++) { colorMode(HSB); fill(counts[i] * 30, 255, 255); rect(i * 8, y, 8, -counts[i] * 10); }; }; void colorGrid(int[] nums, float x, float y, float s) { //Make a list of number counts int[] counts = new int[100]; //Fill it with zeros for (int i = 0; i < 100; i++) { counts[i] = 0; }; //Tally the counts for (int i = 0; i < nums.length; i++) { counts[nums[i]] ++; }; pushMatrix(); translate(x,y); //Draw the grid for (int i = 0; i < counts.length; i++) { colorMode(HSB); fill(counts[i] * 30, 255, 255, counts[i] * 30); textAlign(CENTER); textFont(label); textSize(s/2); text(i, (i % 10) * s, floor(i/10) * s); }; popMatrix(); }; void draw() { //This code happens once every frame. };

And, our nice looking number grid:

**BINGO!**

No, really. If this was a bingo card, and I was a 70-year old, I’d be rich. Look at that nice line going down the X7 column – 17, 27, 37, 47, 57, 67, 77, 87, and 97 are all appearing with good frequency. If we rule out the Douglas Adams effect on 42, it is likely that most of the top 10 most-frequently occurring numbers would have a 7 on the end. Do numbers ending with 7s ‘feel’ more random to us? Or is there something about the number 7 that we just plain like?

Contrasting to that, if I had played the x0 row, I’d be out of luck. It seems that numbers ending with a zero don’t feel very random to us at all. This could also explain the black hole around the number 50 – which, in a range from 0-100, appears to be the ‘least random’ of all.

Well, there we have it. A start-to finish example of how we can use Processing to visualize simple data, with a goal to expose underlying patterns and anomalies. The techniques that we used in this project were fairly simple – but they are useful tools that can be used in a huge variety of data situations (I use them myself, all the time).

Hopefully this tutorial is (was?) useful for some of you. And, if there are any teachers out there who would like to try this out with their classrooms, I’d love to hear how it goes.