Category Archives: Science

Neural Darwinism – An Idea Reborn?

Years ago, I read a book by neuroscientist William H. Calvin, called How Brains Think. In it, he outlines a theory in which consciousness emerges through a myriad of super-fast ‘microevolutionary processes’ inside of our brains. Put simply, every thought you have and decision that you make is the result of a ultra-quick competition among a vast ‘population’ of candidate ideas. This theory is known as Neural Darwinism, and has been put forward as early as 1978.

This idea seemed fascinating to me. It provided a lot of answers to questions that I had about my own creative process, and also seemed to suggest that we could make ourselves better thinkers by providing the most suitable mental environments for ideas to evolve within. I’ve written about some of these ideas in previous posts on this blog. Like the best theories, it also seemed to have a certain elegance to it – it makes sense that one of the most powerful optimization mechanisms known – evolution – would be at work inside of our brains.

Unfortunately, there was a problem. In order for any kind of true evolution to occur in the brain, there needed to be some mechanism for replication. The mind didn’t seem to operate this way – from what we knew, ideas (or neural patterns) weren’t copied. Evolution won’t work, if the finches can’t lay eggs. It seemed like an interesting theory might be dead in the water.

A week ago, though, a research team from Hungary and the UK posted a paper titled ‘The Neuronal Replicator Hypothesis‘ which suggests that replication of neuronal patterns can (and does) occur within the brain using known neurophysiological processes. This would mean that, true to the ideas of Neural Darwinism, evolution could indeed play a role in cognition and consciousness. Furthermore, the paper also suggests that in combination with another known neural mechanism, Hebbian learning, this brain-based evolutionary process could be more powerful than the traditional Darwinist model.

This new development is exciting. Not only does it revive a once-promising theory, it also adds to it – perhaps giving us a workable model of how complex things like consciousness and creativity might arise. A better understanding of these processes is valuable not only at a scientific level, but also for anyone involved in creative endeavors. In the long run, it may be possible to actively ‘optimize’ our thinking processes – to have better ideas, to solve bigger problems – and be more creative.

The Missing Piece of the OpenData / OpenGov Puzzle: Education

Yesterday, I tweeted a quick thought that I had, while walking the dog:

Picture 5

A few people asked me to expand on this, so let’s give it a try:

We are facing a very different data-related problem today than we were facing only a few years ago. Back then, the call was solely for more information. Since then, corporations and governments have started to answer this call and the result has been a flood of data of all shapes and sizes. While it’s important to remain on track with the goal of making data available, we are now faced with a parallel and perhaps more perplexing problem: What do we do with it all?

Of course, an industry has developed around all of this data; start-ups around the world are coming up with new ideas and data-related products every day. At the same time, open-sourcers are releasing helpful tools and clever apps by the dozen. Still, in a large part these groups are looking at the data with fiscal utility in mind. It seems to me that if we are going to make the most of this information resource, it’s important to bring more people in on the game – and to do that requires education.

At the post-secondary level, efforts should be made to educate academics for whom this new pile of data could be useful: journalists, social scientists, historians, contemporary artists, archivists, etc. I could imagine cross-disciplinary workshops teaching the basics:

  1. A survey of what kind of data is available, and how to find it.
  2. A brief overview of common data formats (CSV, JSON, XML, etc).
  3. An introduction to user-friendly exploration tools like ManyEyes & Tableau
  4. A primer in Processing and how it can be used to quickly prototype and build specialized visualization tools.

The last step seems particularly important to me, as it encourages people to think about new ways to engage with information. In many cases, datasets that are becoming available are novel in their content, structure, and complexity – encouraging innovation in an academic framework is essential. Yes, we do need to teach people how to make bar graphs and scatter charts; but let’s also facilitate exploration and experimentation.

Why workshops? While this type of teaching could certainly be done through tutorials, or with a well-written text book, it’s my experience that teaching these subjects is much more effective one-on-one. This is particularly true for students who come at data from a non-scientific perspective (and these people are the ones that we need the most).

The long-term goal of such an initiative would be to increase data-literacy. In a perfect world, this would occur even earlier – at the highschool level. Here’s where I put on my utopian hat: teaching data literacy to young people would mean that they could find answers to their own questions, rather than waiting for the media to answer those questions for them. It also teaches them, in a practical way, about transparency and accountability in government. The education system is already producing a generation of bloggers and citizen journalists – let’s make sure they have the skills they need to be dangerous. Veering a bit to the right, these are hugely valuable skills for workers in an ‘idea economy’ – a nation with a data-literate workforce is a force to be reckoned with.

Ideally this educational component would be build in to government projects like data.gov or data.hmg.gov.uk (are you listening, Canada?). More than that, it would be woven into the education mandate of governments at federal and local levels. Of course, I’m not holding my breath.

Instead, I’ve started to plan a bit of a project for the summer. Like last year, I taught a series of workshops at my studio in Vancouver, which were open to people of all skill levels. This year, I’m going to extend my reach a bit and offer a couple of free, online presentations covering some of the things that I’ve talked about in this post. One of these workshops will be specifically targeted to youth. At the same time, I’ll be publishing course outlines and sample materials for my sessions so that others can host similar events.

Stay tuned for details – and if you have any questions or would like to lend a hand, feel free to leave a comment or get in touch.

Open Science, H1N1, Processing, and the Google Spreadsheet API

Flu Genome Data Visualizer

I’ve recently been working on a project with my friend Jennifer Gardy, whose insights into epidemiology and data modeling led me to build Just Landed. Jennifer is currently working at the BC Centre for Disease Control where, among other things, she’s been looking at data related to swine flu genomics. She came to me with an interesting idea for visualizing data related to historical flu strains, and I thought it might be an interesting project for several reasons. First, I’ve been doing a lot of reading and thinking around the concept of open science and open research, and thought that this project might be a good chance to test out some ideas. Similarly, I am very interested in the chance to use Processing in a scientific context (rather than an artistic one) and I hope this might be a way to introduce my favourite tool to a broader audience. Finally, I hope there is the chance that a good visualization tool might uncover some interesting things about the influenza virus and its nefarious ways.

The project is just getting started, so I don’t have a lot of results to share (a screenshot of the initial stages of the tool is above). But I would like to talk about the approach that I have been taking, and to share some code which might enable similar efforts to happen using Processing & Google Spreadsheets.

Michael Nielson is the author of the most cited physics publication of the last 25 years. He’s also a frequent blogger, writing on such disparate topics as chess & quantum mechanics.  He has written several excellent posts about open science, including this article about the future of science and this one about doing science online. In both articles, he argues that scientists should be utilizing web-based technologies in a much more efficient manner than they have thus far. In doing so, he believes (as do I) that the efficiency of science as a whole can be greatly improved. In his articles, Michael concentrates both on specialized services such as Science Advisor and the physics preprint arXiv, as well as more general web entities like Wikipedia and FriendFeed. I agree that these services and others (specifically, I am going to look at Google Docs) can play a useful role in building working models for open science. I’ll argue as well that open-source ‘programming for the people’ initiatives such as Processing and OpenFrameworks could be useful in fostering collaborative efforts in a scientific context.

For the flu genomics project, we are working with a reasonably large data set – about 38,000 data points. Typically, I would work with this file locally, parsing it with Processing and using it as I see fit. This approach, however, has a couple of failings. First, if the data set changes, I am required to update my file to ensure I am working with the latest version. Likewise, if the data is being worked with by several people, Jennifer would have to send each of us updated versions of the data every time there is a significant change. Both of these concerns can be solved by hosting the data online, and by having anyone working with the data subscribe to a continually updated version. This is very easily managed by a number of ‘cloud-based’ web services – the most convenient and most prevalent being Google Docs – specifically Google Spreadsheets.

Most of us are familiar with using Google Spreadsheets – we can create documents online, and then share them with other people. Spreadsheets can be created, added to, edited and deleted, all the while being available to anyone who is subscribed to the document. What isn’t common knowledge is that Google has released an API  for Spreadsheets – meaning that we can do all of those things (creating, adding, editing, deleting) remotely using a program like Processing. We can manage our Google-hosted databases with the same programs that we are using to process and visualize our data. It also means that multiple people can be working with a central store of data at the same time. In this way, Google Spreadsheets becomes a kind of a publicly-editable database (with a GUI!).

Google Spreadsheets have already been put to good use by the Guardian Data Store, where some clever British folks have compiled interesting data like university drop-out rates, MP’s expenses, and even a full list of swine flu cases by country. Using the API, we can access all of the information from our own spreadsheets and from public spreadsheers and use it to do whatever we’d like. The Google Spreadsheets API has some reasonably advanced features that allow you to construct custom tables, and use structured queries to extract specific data from spreadsheets (see the Developer’s Guide), but for now I want to concentrate on doing the simplest possible thing – extracting data from individual table cells. Let’s walk through a quick example using Processing.

I’ve created an empty sketch, which you can download here. This sketch includes all of the .jar files that we need to get started with the Spreadsheet API, saving you the trouble of having to import them yourself (the Java Client Library for the Google Data APIs is here – note that the most recent versions are compiled in Java 1.6 and aren’t compatible with the latest version of Processing). I’ve also wrapped up some very basic functionality into a class called SimpleSpreadsheetManager – have a look at the code in that tab if you want to get a better idea of how the guts of this example function. For now, I’ll just show you how to use the pre-built Class to access spreadsheet data.

First, we create a new member of the SimpleSpreadsheetManager class, and initialize it with our Google username and password:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
};

void draw() {

};

Now we need to load in our spreadsheet – or more specifically, our worksheet. Google Spreadsheets are collections of individual worksheets. Each spreadsheet has a unique ID which we can use to retrieve it. We can then ask for individual worksheets within that spreadsheet. If I visit the swine flu data spreadsheet from the Guardian Data Store in my browser, I can see that the URL looks like this:

http://spreadsheets.google.com/pub?key=rFUwm_vmW6WWBA5bXNNN6ug&gid=1

This URL shows me the spreadsheet id (rFUwm_vmW6WWBA5bXNNN6). I can also see from the tabs at the top that the worksheet that I want (“COUNTRY TOTALS”) is the first worksheet in the list. I can now load this worksheet using my spreadsheet manager:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
sm.fetchSheetByKey("rFUwm_vmW6WWBA5bXNNN6ug", 0);
};

void draw() {

};

To get data out of the individual cells, I have two options with the SimpleSpreadsheetManager. I can request a cell by its column and row indexes, or I can request a cell from its column name and row index:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
sm.fetchSheetByKey("rFUwm_vmW6WWBA5bXNNN6ug", 0);

//get the value of the third cell in the first column
println(sm.getCellValue(0,2));                             //returns 'Australia'    //get the value of the third cell in the column      labelled 'Deaths, confirmed swine flu'
println(sm.getCellValue("deathsconfirmedswineflu",2));     //returns '9'
};

void draw() {

};

If we wanted to find out which countries had more than 10 confirmed swine flu deaths, we could do this:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
sm.fetchSheetByKey("rFUwm_vmW6WWBA5bXNNN6ug", 0);

//Get all of the countries with more than one death
for (int i=0; i < sm.currentTotalRows; i++) {
String country = sm.getCellValue(0,i);
String deaths = sm.getCellValue("deathsconfirmedswineflu", i);
if (deaths != null && Integer.valueOf(deaths) > 10) println(country +  " : " + deaths);
};
};

void draw() {

};

With a bit more work (this took about 10 minutes), we can create a sketch to build an infographic linked to the spreadsheet – making it very easy to output new versions as the data is updated:

Swine Flu Deaths

Not a particularly exciting demo – but it opens a lot of doors for working with data remotely and collaboratively. Rather than needing to depend on generic visualization tools like those built into typical spreadsheet applications, we can use Processing (or a similar platform like Openframeworks or Field) to create customized tools that are suited to a specific dataset. For my flu genomics project, we’re able to create a very specialized applet that examines how the genetic sequences for antigenic regions change over time – certainly not a function that comes standard with Microsoft Excel.

Combining Processing with Google Spreadsheets provides an easy way to bring almost any kind of data into Processing, and at the same time gives us a good way to store and manage that data. I’d definitely like to add some functionality to this really simple starting point. It would be reasonably easy to allow for creation of spreadsheets and worksheets, and I’d also like to look at implementing table record feeds and their associated structured queries. Ultimately, it would be good to package this all up into a Processing library – if someone has the time to take it on, I think it would be a very useful addition for the Processing community.

The single biggest benefit of Processing is that it is easy to learn and use. Over the years, I have taught dozens of designers and artists how to leverage Processing to enter a world that many had thought was reserved for programmers. I suspect that a similar gulf tends to exist in science between those that gather the data and those that process and visualize it. I am interested to see if Processing can help to close that gap as well.

The Guardian Data Store serves as a good model for a how a shared repository for scientific data might work. Such a project would be useful for scientists. But it would also be open to artists, hackers, and the generally curious, who might be able to use the available data in novel (and hopefully useful) ways.