Text Comparison Tool: SOURCE CODE

About this time last year, I built a light-weight tool in Processing to compare two articles that I read about head injuries in the NFL. Later, I extended the tool to compare any two texts, and promised a source release.

Well, it only took 12 months, but I’ve finally cleaned up the code to a point where I think it will be reasonably easy to use and helpful for those who might want to learn a bit more about the code.

I always find myself in a tricky position with source releases. Often, as in the case of this tool, I have ideas about what the project should look like before it gets released. Here, I wanted to build an interface to allow people to select different text sources within the app, so that people could use the app without having to compile it from Processing. This is what delayed the release of the code for a year – I was waiting for the time to get this last piece done.

Two weeks ago, though, I has a chance to speak at an event with Tahir Hemphill. I had sent Tahir a messy version of the project for use with his Hip Hop Word Count initiative a few months back, and he used it to analyze a famous rap battle between Nas and Jay-Z:

Text Correlation Tool: Nas Versus Jay / Ether Versus The Takeover from Staple Crops on Vimeo.

This reminded me of a fairly valuable lesson as far as source code is concerned: If it works, it’s probably good enough to release.

So, without too much further ado, here’s a link to download the Text Comparison Tool:

Text Comparison Tool (1.1MB)

It’s pretty easy to use. First, you’ll need Processing to open and work with the sketch. Also, you’ll need the toxiclibs core library installed. Assuming you have those two things, these are the steps:

1. Drag the unzipped folder into your sketchbook.
2. Place your text files in the sketch’s data folder.
3. Open the sketch.
4. Look for the code block at the top of the main tab where the article information is set. It’s pretty clearly marked, and looks like this:

String title1 = "Tokyo";
String url1 = "asia.txt";
String desc1 = "Suntory Hall";
String date1 = "November 14th, 2009";
color color1 = #140047;

String title2 = "Cairo";
String url2 = "cairo.txt";
String desc2 = "Cairo University";
String date2 = "June 4th, 2009";
color color2 = #680014;

5. Replace the information here with appropriate information for your files.
6. Run the sketch!

That’s it. If you are getting strange results, you can tweak the clean() and cleanBody() methods at the bottom of the main tab to control how your text is filtered.

Hopefully I’ll still find the time to package this thing up in a bit more of a user-friendly form. But, in the meantime, hopefully people will find this useful as an exploratory tool. Note that at any time you can press the ‘s’ key to save out an image – if you find some interesting texts to compare, let me know!

Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL.

9 Comments

  1. Posted November 17, 2010 at 8:17 pm | Permalink

    Awesome. I am going to give infinite jest vs. the count of monte cristo a go.

  2. Posted November 17, 2010 at 8:50 pm | Permalink

    Hi there,

    The newest version of toxiclibs "renamed package toxi.geom.util into toxi.geom.mesh".

  3. blprnt
    Posted November 17, 2010 at 8:59 pm | Permalink

    Hey,

    Thanks – I've uploaded a new version that uses the newest version of toxiclibs.

  4. Posted November 18, 2010 at 2:50 am | Permalink

    My code skills have failed me. I tried a short simple test and got an "IndexOutOfBoundsException: Index 5, Size 5"
    Is this a software issue, or a my lack of skill issue? Or a combo of both?

  5. blprnt
    Posted November 18, 2010 at 8:54 am | Permalink

    Hi Chris,

    Zip up your sketch folder and send it to me – blprnt at blprnt – and I'll have a look. It's very (very) likely that it's a code problem, since I've only ever tested this on three sets of text files.

    -Jer

  6. Simon
    Posted November 18, 2010 at 8:26 pm | Permalink

    Thanks for sharing this, feeding in some text files now, and it's working great!

  7. typelab
    Posted December 7, 2010 at 3:22 pm | Permalink

    Hi Jer,

    I got it working, strangely it seems to have been the formatting in my txt document that was the problem. But it is working and working great.

  8. David
    Posted February 8, 2011 at 11:32 pm | Permalink

    This is fantastic, thanks. I'm planning on running some early Malcolm Lowry fiction through it, against various source texts.

  9. Erika
    Posted November 21, 2011 at 2:11 pm | Permalink

    A great tool – thank you. I'm having only one difficulty – common words such as "and," "the," "with" etc. are still appearing in the comparison, though I looked through the code and it seems you've written a section to resolve that. Not sure why it isn't filtering those words out. Any suggestions?

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*