Text Comparison Tool: SOURCE CODE

About this time last year, I built a light-weight tool in Processing to compare two articles that I read about head injuries in the NFL. Later, I extended the tool to compare any two texts, and promised a source release.

Well, it only took 12 months, but I’ve finally cleaned up the code to a point where I think it will be reasonably easy to use and helpful for those who might want to learn a bit more about the code.

I always find myself in a tricky position with source releases. Often, as in the case of this tool, I have ideas about what the project should look like before it gets released. Here, I wanted to build an interface to allow people to select different text sources within the app, so that people could use the app without having to compile it from Processing. This is what delayed the release of the code for a year – I was waiting for the time to get this last piece done.

Two weeks ago, though, I has a chance to speak at an event with Tahir Hemphill. I had sent Tahir a messy version of the project for use with his Hip Hop Word Count initiative a few months back, and he used it to analyze a famous rap battle between Nas and Jay-Z:

Text Correlation Tool: Nas Versus Jay / Ether Versus The Takeover from Staple Crops on Vimeo.

This reminded me of a fairly valuable lesson as far as source code is concerned: If it works, it’s probably good enough to release.

So, without too much further ado, here’s a link to download the Text Comparison Tool:

Text Comparison Tool (1.1MB)

It’s pretty easy to use. First, you’ll need Processing to open and work with the sketch. Also, you’ll need the toxiclibs core library installed. Assuming you have those two things, these are the steps:

1. Drag the unzipped folder into your sketchbook.
2. Place your text files in the sketch’s data folder.
3. Open the sketch.
4. Look for the code block at the top of the main tab where the article information is set. It’s pretty clearly marked, and looks like this:

String title1 = "Tokyo";
String url1 = "asia.txt";
String desc1 = "Suntory Hall";
String date1 = "November 14th, 2009";
color color1 = #140047;

String title2 = "Cairo";
String url2 = "cairo.txt";
String desc2 = "Cairo University";
String date2 = "June 4th, 2009";
color color2 = #680014;

5. Replace the information here with appropriate information for your files.
6. Run the sketch!

That’s it. If you are getting strange results, you can tweak the clean() and cleanBody() methods at the bottom of the main tab to control how your text is filtered.

Hopefully I’ll still find the time to package this thing up in a bit more of a user-friendly form. But, in the meantime, hopefully people will find this useful as an exploratory tool. Note that at any time you can press the ‘s’ key to save out an image – if you find some interesting texts to compare, let me know!

9 thoughts on “Text Comparison Tool: SOURCE CODE”

  1. My code skills have failed me. I tried a short simple test and got an "IndexOutOfBoundsException: Index 5, Size 5"
    Is this a software issue, or a my lack of skill issue? Or a combo of both?

    1. Hi Chris,

      Zip up your sketch folder and send it to me – blprnt at blprnt – and I'll have a look. It's very (very) likely that it's a code problem, since I've only ever tested this on three sets of text files.

      -Jer

      1. Hi Jer,

        I got it working, strangely it seems to have been the formatting in my txt document that was the problem. But it is working and working great.

  2. A great tool – thank you. I'm having only one difficulty – common words such as "and," "the," "with" etc. are still appearing in the comparison, though I looked through the code and it seems you've written a section to resolve that. Not sure why it isn't filtering those words out. Any suggestions?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>