WordPlay: A Tool for Freeform Language Exploration

When text becomes data it opens up a phenomenal amount of possibility for insight and creative exploration. The problem is that most Natural Language Processing (NPL) tools are hard to use unless you have a good foundation in programming to begin with. We use a lot of NLP in our work at The Office for Creative Research and I’ve often wondered what it would mean to make a language tool designed for open-ended exploration. I’ve also thought a lot about the role a tool like this could play in the classroom. English teachers could use it to explore both basic language (what is an adverb?) and more complex questions (how did Shakespeare use hyphenates?). History classrooms could examine patterns in language over time (how does Twitter compare to Moby Dick?). Because the tool would be open source and would have an API, Computer Science instructors could use it as a basis to teach about computation and language.

With all of this in mind I built WordPlay, a freeform NLP tool that lets you search across various bodies of text without having to know how to code or to learn any kind of strange syntax.

It’s really easy to use. Whatever you type into the query box will be used as a pattern to search across the current corpus. For example we might search for text similar to ‘a terrible fate’ across the Shakespeare’s collected works:

Screen Shot 2014-06-09 at 5.41.16 PM

Here is the same search, performed across the Bible:

Screen Shot 2014-06-09 at 5.41.45 PM

And within Sigmund Freud’s A General Introduction to Psychoanalysis:

Screen Shot 2014-06-09 at 5.42.26 PM

The system works by trying to find a match to the query text in two different ways:

First, it tries to match to part-of-speech. ‘A terrible fate’ contains a determiner, followed by an adjective, followed by a noun, so the best results will also have these parts of speech, in the same order.

Next, it attempts to match how the query sounds. Specifically, it considers the stressing pattern of the word or phrase: in the case of ‘a terrible fate’ it’ll look for other phrases with the same syllable counts (1-3-1) and the same stressing pattern.

The result of these two strategies is that the top search results in WordPlay should really sound like the query. And it works– go ahead and sing the results to the Shakespearean Danger  Zone (Kenny Loggins, eat your heart out!).

WordPlay is meant to be a simple tool, so I’ll end the explanation here. Go play with it.


Art and the API

In 1968, in his seminal essay Systems Esthetics, Jack Burnham wrote:

The specific function of modern didactic art has been to show that art does not reside in material entities, but in relations between people and between people and the components of their environment.

In 2013, this list of relations can be expanded to include those between people and softwareas well as those between people and networks. How can art reside within these modern relations, rather than outside of them?

Enter the API.

API is one of those three-letter acronyms (TLAs) which makes only slightly more sense once you know what it stands for. Application Programming Interface; a generic enough term to be applied to many, many pieces of software, lots of which are operating inside of your computer right now. Really the important part of the definition is ‘interface’. I like to think about an API as a bridge which allows one computer program to talk to another computer program.

APIs have a lot of utility, as they can connect disparate programs running on different devices, even if those devices are running completely different operating systems. It’s not much of a stretch to say that any software company would have a set of internal APIs that allow communications between different parts of their software infrastructure. For every public-facing API at Google or Facebook, there are dozens more that are just used inside of the company. There’s a good parallel here to mail services – while a large company might have a mail room that deals with things being delivered to them from the outside world, they also have a lot of internal machinery which allows for mail to be delivered internally. The majority of your day-to-day interaction with social networks, e-mail applications or mobile apps is facilitated by APIs.

If you’ve heard of an API at all, it’s likely the one you’re thinking of is the Twitter API. It is what lets the apps on your phone, or third party Twitter applications communicate with all of the central tweeting machinery at Twitter HQ. Twitter took a risk in the beginning of their business by leaving this API open – they gambled that, by allowing businesses to build products around the main tweeting system, they’d end up with many more projects build on Twitter than they could have built themselves. This ‘open API’ model was so successful that plenty of other companies have since jumped on board and offered open APIs of their own. (Recently, Twitter has severely limited access to its API, to the consternation of the broad community of developers who make use of it).

Because APIs are associated with big companies like Twitter and Facebook and Google, a lot of weight is often attributed to them. The creation of an API can seem almost a reverential act. “They have an API”, we whisper, in hushed tones. Surely they must be hard to build?

As it turns out, you can build simple APIs very… simply. As an example, I spent about an hour writing an API that lets you query for word counts in this article. Go ahead, and try this link:


That number that gets spit out is how many times the word ‘API’ is mentioned on the page (I built it to include comments, so this will change a bit over time). If you’re interested in the extraordinarily simple guts of all of the demo APIs in this post, you can find the code in a GitHub repository.This example shows us that APIs an be built very easily. While you certainly can put a lot of work and weight into an API, making one can also be a quick and expressive way to create bridges and tools between until-now unrelated bodies of content and applications.

This act of bridging, enabled by an API, can be a political one. Josh Begley, a data artist living in New York, has recently created an API which allows access to information on every US drone strike, using data from The Bureau for Investigative Journalism. Updated as new strikes are reported and confirmed, The API allows others access to verified and aggregated data on drone activity. I built a small wrapper API for it which returns just the most recent strike:


Josh’s API is already a useful tool; journalists can use it to feed stories, apps can use it to display updated counts. It could be used for many conceivable art purposes. A good example is Begley’s own recent project, Dronestream, a Twitter stream of every known US drone attack. Pitch Interactive’s recent ‘Out of Sight, out of Mind‘ project uses the API to update its interactive timeline of drone strikes. Here, through the use of an API, intendedly secret data becomes exposed – open data from the most closed of sources.

Recall that the basic function of an API is to bridge one piece of software to another. In this way, APIs are conduits for the mash-up, long a preferred creative tool for media artists. Instead of producing a single mash-up, though, a functional API makes a permanent link between two applications, one whose pitch and timbre can change as the data themselves are updated.

The API can act as a clear connection, simply relaying data from one place to another. However, it can also operate on these data, shifting modes and meaning as information is requested an relayed. Instead of returning a single number of people killed in a drone strike, what if we returned a list of names? What if those names were extracted from a US zip code, allowing us to think about how media attention and personal perspective would change if these were Americans dying, instead of Afghanis or Pakistanis? More easily, the names could be extracted from our social media feeds. Here is an example API that returns a group of users from my own feed, equal to the number of people killed in the last US drone strike:


This is a heavy-handed, quickly drawn example. But it suggests an interesting idea: the conceptual API. A piece of software architecture intended not only to bridge but also to question. The API as a software art mechanism, intended to be consumed not only by humans, but by other pieces of software. (Promisingly, the API also offers a medium in which software artists can work entirely apart from visual esthetic.)

Burnham wrote in 1968 that ‘the significant artist strives to reduce the technical and psychical distance between [their] artistic output and the productive means of society’. In an age of Facebook, Twitter & Google, that productive means consists largely of networked software systems. The API presents a mechanism for artistic work to operate very close to, or in fact to live within these influential systems.

New Year, New Company: Introducing The Office for Creative Research

In the fall of 2010, my friend Mike Young invited me to come to the New York Times R&D Lab, to discuss a new visualization project that was just starting to get off of the ground. That project became Cascade, and that meeting led to my two-and-a-half year stay at the R&D Lab, as the first Data Artist in Residence. Yesterday, my residency at the New York Times came to an end. This morning, I’m thrilled to announce the official launch of my new company: The Office For Creative Research.

My 28 months (the residency was originally set for four months) at the New York Times was transformational in many, many ways. Cascade, which I initiated with Mark Hansen as a conceptual prototype, became a full-fledged project supported by an entire team of designers, developers and engineers. Along with Jake Porway, Brian House, and Matt Boggie, we built OpenPaths, which continues to be an exciting model for personal engagement with data. Mark and I, working with Alexis Lloyd, also made Memory Maps, a prototype for archive exploration, in which news stories are interwoven with the personal history of the user.

These successful projects were of course accompanied by unfinished sketches, necessary failures and inevitable dead ends. I built a visualization tool for household power usage that went nowhere, a few failed archive exploration tools, and one particularly bad interface for visualizing personal connections on Twitter. The R&D group, conceived and led by Michael Zimbalist, is very much a place that encourages real exploration – and the inevitable failures that result. This freedom to explore and to push boundaries is what has made, and will continue to make NYTLabs fertile ground for ideas and innovation.

Which brings me back to The Office for Creative Research, the new company I’ve founded with Mark Hansen and Ben Rubin. OCR is a multidisciplinary research group focusing on new modes of engagement with data. We’re looking to partner with companies, institutions, scientists, museums – any individual, group or organization who is facing novel problems with data. A browse through our collective portfolio will show our range of approach, from visualization to algorithm design to performance and installation. Our unique range of skills, drawing from both the arts and sciences, give us the ability to tackle almost any problem, from the laboratory to the gallery, and everywhere in between.

We’ve outlined the mission of The Office for Creative research in this memorandum, released today, and you can see more of our work on OCR’s freshly-launched website. While we already have a set of fascinating projects on the go for 2013, we are looking for innovative new partners. Please get in touch if you’d like to explore the possibility of working with OCR. Also, we’ll be looking to hire talented people in the spring, so if you’d like to work in New York City, exploring the borders between data, technology & culture, send us a message. 

It’s going to be an exciting year. We’ll be running a series of workshops at OCR starting next month, and we’ll be publishing a journal at the end of 2013 documenting the progress of our research. For regular news and data-related commentary, you can follow The Office For Creative Research on Twitter – @The_O_C_R.

I’d be remiss not to end this post with a thank-you to the many talented people at the New York Times who made my time there so tremendously enjoyable. It’s a world-class organization, filled with world-class human beings, and I’ll always be grateful for having had the chance to spend time there.

Happy New Year,


Before Us is the Salesman’s House

Before us is the Salesman's House

When the dust settles on the 21st century, and all of the GIFs have finished animating, the most important cultural artifacts left from the digital age may very well be databases.

How will the societies of the future read these colossal stores of information?

Consider the eBay databases, which contain information for every transaction that happens and has happened on the world’s biggest marketplace. $2,094 worth of goods are sold on eBay every second. The records kept about this buying and selling go far beyond dollars and cents. Time, location and identity come together with text and images to leave a record that documents both individual events, as well as collective trends across history and geography.

This summer, Mark Hansen and I created an artwork, installed at the eBay headquarters in San Jose, which investigates this idea of the eBay database as a cultural artifact. Working in cooperation with eBay, Inc., and the ZERO1 Biennial, the piece was installed outside of the eBay headquarters and ran dusk to midnight from September 11th to October 12th.

As a conceptual foundation for the piece, we chose a much more traditional creative form than the database: the novel. Each movement begins with a selection of text. The first one every day was a stage direction from the beginning of Death of a Salesman which reads:

A melody is heard, played upon a flute. It is small and fine, telling of grass and trees and the horizon. The curtain rises.
Before us is the Salesman’s house. We are aware of towering, angular shapes behind it, surrounding it on all sides. Only the blue light of the sky falls upon the house and forestage; the surrounding area shows an angry glow of orange. As more light appears, we see a solid vault of apartment houses around the small, fragile-seeming home. An air of the dream dings to the place, a dream rising out of reality. The kitchen at center seems actual enough, for there is a kitchen table with three chairs, and a refrigerator. But no other fixtures are seen. At the back of the kitchen there is a draped entrance, which leads to the living room. To the right of the kitchen, on a level raised two feet, is a bedroom furnished only with a brass bedstead and a straight chair. On a shelf over the bed a silver athletic trophy stands. A window opens onto the apartment house at the side.

From this text, we begin by extracting items1 that might be bought on eBay:

Before us is the Salesman's House

Flute, grass, trees, curtain, table, chairs, refrigerator. This list serves now as a kind of inventory, each explored in a small set of data sketches which examine distribution: Where are these objects being sold right now? How much are they being sold for? What does the aggregate of all of the refrigerators sold in the USA look like?

Before us is the Salesman's House

Before us is the Salesman's House

Before us is the Salesman's House

Before us is the Salesman's House

From this map of objects for sale, the program selects one at random to act as a seed. For example, a refrigerator being sold for $695 in Milford, New Hampshire, will switch the focus of the piece to this town of fifteen thousand on the Souhegan river. The residents of Milford have sold many things on eBay over the years – but what about books? Using historical data, we investigate the flow of books into the town, both sold and bought by residents.

Before us is the Salesman's House

Before us is the Salesman's House

Before us is the Salesman's House

Finally, the program selects a book from this list2 and re-starts the cycle, this time with a new extracted passage, new objects, new locations, and new stories. Over the course of an evening, about a hundred cycles are completed, visualizing thousands of current and historic exchanges of objects.

Ultimately, the size of a database like eBay’s makes a complete, close reading impossible – at least for humans. Rather than an exhaustive tour of the data, then, our piece can be thought of as a distant reading3, a kind of a fly-over of this rich data landscape. It is  an aerial view of the cultural artifact that is eBay.

A motion sample of three movements from the piece can be seen in this video.

Before Us is the Salesman’s House was projected on a 30′ x 20′ semi-transparent screen, suspended in the entry way to the main building (I’m afraid lighting conditions were far from ideal for photography). It was built using Processing 2.0, MongoDB & Python. Special thanks to Jaime Austin, Phoram Meta, Jagdish Rishayur, David Szlasa and Sean Riley.

  1. Items are extracted through a combination of a text-analysis algorithm and, where needed, processing by helpful folks on Mechanical Turk.
  2. All text used comes from Project Gutenberg, a database of more than 40,000 free eBooks
  3. For more about distant reading, read this essay by Franco Moretti, or, for a summary, this article from the NYTimes

Infinite Weft @ Bridge Gallery until October 18th

Infinite Weft -

Since early in the year, I have been working with my mother Diane Thorp to produce hand-woven textiles that contain non-repeating patterns. Weaving Information Files (WIFs) are produced via a custom-written software tool, and are then woven on a 16-harness floor loom equipped with an AVL Compu-Dobby interface.

Here’s a zoomable view of six metres (almost 20 feet) of the handwoven result:

You really (really) want to hit the fullscreen button and zoom in – it’s a 95 megapixel (25,283 x 3,738) image. Alternately, you can go here to see it in a bigger window.

You can read more about the project here, and you can see a set of images documenting the project here. The next step is to weave a much longer section – we are aiming for something above 100′.

If you’re in New York, you can see the 6 metre long section from Infinite Weft, on exhibit at Bridge Gallery in NYC, until October 10th.

Bridge Gallery
98 Orchard Street
New York, NY 10002
Subway F, J, M, Z Delancey/Essex

Data & Art Miscellanea from Jer Thorp