Open Science, H1N1, Processing, and the Google Spreadsheet API

Flu Genome Data Visualizer

I’ve recently been working on a project with my friend Jennifer Gardy, whose insights into epidemiology and data modeling led me to build Just Landed. Jennifer is currently working at the BC Centre for Disease Control where, among other things, she’s been looking at data related to swine flu genomics. She came to me with an interesting idea for visualizing data related to historical flu strains, and I thought it might be an interesting project for several reasons. First, I’ve been doing a lot of reading and thinking around the concept of open science and open research, and thought that this project might be a good chance to test out some ideas. Similarly, I am very interested in the chance to use Processing in a scientific context (rather than an artistic one) and I hope this might be a way to introduce my favourite tool to a broader audience. Finally, I hope there is the chance that a good visualization tool might uncover some interesting things about the influenza virus and its nefarious ways.

The project is just getting started, so I don’t have a lot of results to share (a screenshot of the initial stages of the tool is above). But I would like to talk about the approach that I have been taking, and to share some code which might enable similar efforts to happen using Processing & Google Spreadsheets.

Michael Nielson is the author of the most cited physics publication of the last 25 years. He’s also a frequent blogger, writing on such disparate topics as chess & quantum mechanics.  He has written several excellent posts about open science, including this article about the future of science and this one about doing science online. In both articles, he argues that scientists should be utilizing web-based technologies in a much more efficient manner than they have thus far. In doing so, he believes (as do I) that the efficiency of science as a whole can be greatly improved. In his articles, Michael concentrates both on specialized services such as Science Advisor and the physics preprint arXiv, as well as more general web entities like Wikipedia and FriendFeed. I agree that these services and others (specifically, I am going to look at Google Docs) can play a useful role in building working models for open science. I’ll argue as well that open-source ‘programming for the people’ initiatives such as Processing and OpenFrameworks could be useful in fostering collaborative efforts in a scientific context.

For the flu genomics project, we are working with a reasonably large data set – about 38,000 data points. Typically, I would work with this file locally, parsing it with Processing and using it as I see fit. This approach, however, has a couple of failings. First, if the data set changes, I am required to update my file to ensure I am working with the latest version. Likewise, if the data is being worked with by several people, Jennifer would have to send each of us updated versions of the data every time there is a significant change. Both of these concerns can be solved by hosting the data online, and by having anyone working with the data subscribe to a continually updated version. This is very easily managed by a number of ‘cloud-based’ web services – the most convenient and most prevalent being Google Docs – specifically Google Spreadsheets.

Most of us are familiar with using Google Spreadsheets – we can create documents online, and then share them with other people. Spreadsheets can be created, added to, edited and deleted, all the while being available to anyone who is subscribed to the document. What isn’t common knowledge is that Google has released an API  for Spreadsheets – meaning that we can do all of those things (creating, adding, editing, deleting) remotely using a program like Processing. We can manage our Google-hosted databases with the same programs that we are using to process and visualize our data. It also means that multiple people can be working with a central store of data at the same time. In this way, Google Spreadsheets becomes a kind of a publicly-editable database (with a GUI!).

Google Spreadsheets have already been put to good use by the Guardian Data Store, where some clever British folks have compiled interesting data like university drop-out rates, MP’s expenses, and even a full list of swine flu cases by country. Using the API, we can access all of the information from our own spreadsheets and from public spreadsheers and use it to do whatever we’d like. The Google Spreadsheets API has some reasonably advanced features that allow you to construct custom tables, and use structured queries to extract specific data from spreadsheets (see the Developer’s Guide), but for now I want to concentrate on doing the simplest possible thing – extracting data from individual table cells. Let’s walk through a quick example using Processing.

I’ve created an empty sketch, which you can download here. This sketch includes all of the .jar files that we need to get started with the Spreadsheet API, saving you the trouble of having to import them yourself (the Java Client Library for the Google Data APIs is here – note that the most recent versions are compiled in Java 1.6 and aren’t compatible with the latest version of Processing). I’ve also wrapped up some very basic functionality into a class called SimpleSpreadsheetManager – have a look at the code in that tab if you want to get a better idea of how the guts of this example function. For now, I’ll just show you how to use the pre-built Class to access spreadsheet data.

First, we create a new member of the SimpleSpreadsheetManager class, and initialize it with our Google username and password:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
};

void draw() {

};

Now we need to load in our spreadsheet – or more specifically, our worksheet. Google Spreadsheets are collections of individual worksheets. Each spreadsheet has a unique ID which we can use to retrieve it. We can then ask for individual worksheets within that spreadsheet. If I visit the swine flu data spreadsheet from the Guardian Data Store in my browser, I can see that the URL looks like this:

http://spreadsheets.google.com/pub?key=rFUwm_vmW6WWBA5bXNNN6ug&gid=1

This URL shows me the spreadsheet id (rFUwm_vmW6WWBA5bXNNN6). I can also see from the tabs at the top that the worksheet that I want (“COUNTRY TOTALS”) is the first worksheet in the list. I can now load this worksheet using my spreadsheet manager:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
sm.fetchSheetByKey("rFUwm_vmW6WWBA5bXNNN6ug", 0);
};

void draw() {

};

To get data out of the individual cells, I have two options with the SimpleSpreadsheetManager. I can request a cell by its column and row indexes, or I can request a cell from its column name and row index:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
sm.fetchSheetByKey("rFUwm_vmW6WWBA5bXNNN6ug", 0);

//get the value of the third cell in the first column
println(sm.getCellValue(0,2));                             //returns 'Australia'    //get the value of the third cell in the column      labelled 'Deaths, confirmed swine flu'
println(sm.getCellValue("deathsconfirmedswineflu",2));     //returns '9'
};

void draw() {

};

If we wanted to find out which countries had more than 10 confirmed swine flu deaths, we could do this:

void setup() {
size(500,500);
background(255);

SimpleSpreadsheetManager sm = new SimpleSpreadsheetManager();
sm.init("myProjectName","me@myemail.com", "mypassword");
sm.fetchSheetByKey("rFUwm_vmW6WWBA5bXNNN6ug", 0);

//Get all of the countries with more than one death
for (int i=0; i < sm.currentTotalRows; i++) {
String country = sm.getCellValue(0,i);
String deaths = sm.getCellValue("deathsconfirmedswineflu", i);
if (deaths != null && Integer.valueOf(deaths) > 10) println(country +  " : " + deaths);
};
};

void draw() {

};

With a bit more work (this took about 10 minutes), we can create a sketch to build an infographic linked to the spreadsheet – making it very easy to output new versions as the data is updated:

Swine Flu Deaths

Not a particularly exciting demo – but it opens a lot of doors for working with data remotely and collaboratively. Rather than needing to depend on generic visualization tools like those built into typical spreadsheet applications, we can use Processing (or a similar platform like Openframeworks or Field) to create customized tools that are suited to a specific dataset. For my flu genomics project, we’re able to create a very specialized applet that examines how the genetic sequences for antigenic regions change over time – certainly not a function that comes standard with Microsoft Excel.

Combining Processing with Google Spreadsheets provides an easy way to bring almost any kind of data into Processing, and at the same time gives us a good way to store and manage that data. I’d definitely like to add some functionality to this really simple starting point. It would be reasonably easy to allow for creation of spreadsheets and worksheets, and I’d also like to look at implementing table record feeds and their associated structured queries. Ultimately, it would be good to package this all up into a Processing library – if someone has the time to take it on, I think it would be a very useful addition for the Processing community.

The single biggest benefit of Processing is that it is easy to learn and use. Over the years, I have taught dozens of designers and artists how to leverage Processing to enter a world that many had thought was reserved for programmers. I suspect that a similar gulf tends to exist in science between those that gather the data and those that process and visualize it. I am interested to see if Processing can help to close that gap as well.

The Guardian Data Store serves as a good model for a how a shared repository for scientific data might work. Such a project would be useful for scientists. But it would also be open to artists, hackers, and the generally curious, who might be able to use the available data in novel (and hopefully useful) ways.

30 thoughts on “Open Science, H1N1, Processing, and the Google Spreadsheet API

  1. Great example of how to use the Google spreadsheets API with Processing; I started looking at it from the point of view of the Google charts API:
    http://ouseful.wordpress.com/2009/05/22/first-s

    With so many data projects opening up at the moment, I've come round to thinking that it would be handy if they implemented the Google API. Do you have any thoughts about that? eg whether it would be a good think, or bad thing; what the risks might be; whether the API is up to it; and so on?

  2. Thanks, Tony. I like what you are working on with the charts API. Your approach allows for very rapid exploration of the data, which is certainly very useful.

    My post is mostly concerned with the next step – how Processing or OF or similar tools could be used to create novel, non-standard visualizations well as tools to process/filter/manipulate the data.

    I think that the Google APIs and their underlying infrastructure would provide a perfect base for large data projects, wether they be scientific, governmental, or personal. It would certainly be a good step towards standardization, which seems to be a key part of realizing the dream of truly open data.

    I don't know that much about the mechanics of Google Spreadsheets & the Google Data APIs, but considering it's Google I'd have to guess that the APIs could probably handle pretty much anything we could throw at them. It would be interesting to come up with some ideas to test some of these concepts out on a bigger scale.

    When the Guardian APIs were launched in the Spring I found what the Data Store was doing to be the most interesting part – I think it has just taken everyone a few months to start to explore and build tools. It certainly provides a very interesting model for how similar initiatives could be constructed.

  3. Hi Phillip – I just switched servers and in my over-zealous file culling L removed the .zip file by accident. It's back there now!

  4. Hi Phillip,I haven't tried to integrate Google Spreadsheets with flash (yet) but there's no reason why it wouldn't be possible. The tricky part would be handling authentication, but there should be a pile of libraries for that.

  5. Looking at your code, you have an init() function that takes user credentials; public spreadsheets will emit data without the need for credentials; do you have a function that grabs data from these simple spreadsheets with requiring the provision of Google authentication details?

    maybe fetchSheetByKey() works without the prior init()? (I don't have processing handy so can't try just at the mo:-(

  6. Hi Tony,

    From what I've tested quickly it doesn't seem to work without authenticating – I'm not sure why this is or if there's a way to avoid it. I threw the code together pretty quickly to suit a specific project; if and when it becomes a library I would look into it a bit deeper…

  7. Wow, thank you so much for this code. I wanted to connect to Processing to a google spread, and this was EXACTLY what I needed. Thanks a ton!

    1. Ah, and by the way, is it possible to actually send data TO google spreadsheets from Processing? Any pointers here would be great. I'm basically using Arduino to gather data, and Processing to visualize it. My next step is to hopefully save the data into a google spreadsheet for future access and analysis. Is this just as easy to do?

      1. Yes, this is possible.

        Add this function to the SimpleSpreadsheetManager Class:

        void setCellValue(String tag, int r, String val) {

        ListEntry le = (ListEntry) currentListEntries.get(r);
        le.getCustomElements().setValueLocal(tag, val);
        try {
        ListEntry updatedRow = le.update();
        }
        catch (Exception e){

        };

        };

        You can use this to set the value of any cell:

        sm.setCellValue(\”colName\”, row#, value);

        1. FYI, the function only works if you want to change an existing value in the spreadsheet; i.e. if the spreadsheet currently only has three rows, and you ask it to "setCellValue" for row #5, you'll get an out of bounds exception. I'll have to write some error conditions to keep that from happening.

  8. Hi :)

    ive been trying to get his to work and im having trouble.
    processing prints

    ERROR AUTHENTICATING

    then errors out and hilights this
    WorksheetEntry we = (WorksheetEntry) worksheets.get(wi);

    i have an empty spreadsheet im trying to access. to tell the truth, im doing something like adam.
    say i want to control a servo with a potentiometer on an arduino and send the data to a spreadsheet.
    soon after, id like to be able to change the potentiometer position, update to spreadsheet, and back to servo.. then over my network, from a friends house, etc.

    but for now, just reading and writing data to g-docs sheets via processing would be awsome!

    1. Check and double-check the "key" value in the URL, and make sure you're not including anything with an '&' in it. The key extends to the '&' and no further.

  9. I'm a linguist just getting into Processing. I really like your tutorials, and I can't thank you enough for writing them!

    Unfortunately, while I'm not averse to learning a little coding, I'm not very familiar with the nuts and bolts of Processing. Could you point me in the right direction for where to put the .jar files so I can use the SimpleSpreadsheetManager class?

    Then again, this post is a little old, so maybe the Google Spreadsheets API has changed? Any help is appreciated! I can't wait to dig into my Google Docs data…

    1. This is really a question for the Processing forum (highly recommended!), but FYI you can drop the .jar files in your Documents/Processing/libraries folder. If no 'libraries' folder exists in your Processing folder, create one.

      Processing is one of the best documented languages for beginners I've seen. Take some time to peruse processing.org next time. The answers are all there: http://dev.processing.org/libraries/

  10. i am new to processing and I am working through the tutorial and
    i got this error.

    i tried making sure of the URL was correct. I am pretty sure it is. but i got this line highlighted and

    WorksheetEntry we = (WorksheetEntry) worksheets.get(wi);
    this error:

    ERROR RETRIEVING WORKSHEET FEED
    processing.app.debug.RunnerException: IndexOutOfBoundsException: Index: 400, Size: 0
    at processing.app.Sketch.placeException(Sketch.java:1543)
    at processing.app.debug.Runner.findException(Runner.java:583)
    at processing.app.debug.Runner.reportException(Runner.java:558)
    at processing.app.debug.Runner.exception(Runner.java:498)
    at processing.app.debug.EventThread.exceptionEvent(EventThread.jav

    ….

    If anyone can help me i would greatly appreciate this

  11. Hey this tut is awesome!

    Was wondering why, if i print all my data, I get it sorted low to high, when I want data (from a sensor) in the order it's on the sheet?
    I think maybe I am doing this incorrectly? Thanks!!!
    =========
    for (int c=0; c < sm.currentTotalCols ; c++){
    for (int r=0; r < sm.currentTotalRows ; r++){
    float tempVal = Float.parseFloat(sm.getCellValue(2,r));
    readVal = tempVal;
    readVal = map(readVal,0,350,0,10);
    println(readVal);
    }
    }
    =========

  12. Hi Jer, Great tutorial!
    Well, firstly, while going to the google spreadsheet api, it says that version 1 and version 2 have been 'officially deprecated' Now the reason i ask this, is because i am working on a similar project where i am taking data from spreadsheet only and using you're code for retrieving(which i am able to successfully). So, the final product(the pde sketch) will actually be online(on a webpage online). So, will this older version cause an issue? Or anything which i am not thinking right? Also because i want to know more 'variables' of the api like you used "cellValue" , "currentTotalRows" etc. HOPE to see a reply. Thanks. M

  13. Hi, I cannot get it work. I get stuck with:
    Set<String> tags = le.getCustomElements().getTags(); – with error: Cannot find a class or type named "Set", any ideas?

    Thank you for support in advance

    1. Hey, it might be a little late for you, but I just got this part figured out. For some reason with the new version of Processing you need to add another call the the Java library put this line with your other imports at the top of the SimpleSpreadsheetManager Class:

      import java.util.Set;

      should work :)

    2. Hey, it might be a little late for you, but I just got this part figured out. For some reason with the new version of Processing you need to add another call the the Java library put this line with your other imports at the top of the SimpleSpreadsheetManager Class:

      import java.util.Set;

  14. hi,
    I love your stuff. I'm trying to create some fairly simple concept-map type graphs that are populated with "data" (just text and urls) pulled from a collectively edited google spreadsheet. I think that I could work it out with some help from this example, but I'm wondering how much you think has changed with processing.js, and, I guess, whether you know of any other simple solutions off the top of your head.

    thanks for your time and all your generosity!
    -kyle

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>