Photo has been removed: Poetry with Twitter, Yfrog, and OCR

chapbook.pdf View this file

The mechanism for today's poetry: I collected several thousand images from Yfrog and processed them with tesseract (an OCR package). For each image that returned from tesseract with at least one English word with four or more characters, I added a line to the output. The Yfrog images were collected using the Twitter Streaming API (tracking the keyword "yfrog"). I collected lines for about an hour last night, split the lines into a collection of "pieces," and put it into this chapbook. (Each line in the text is a hyperlink to the Yfrog image that generated it.)

When given photographic images at a fairly low resolution, tesseract returns a huge number of false positives---finding text where there isn't any. Even for images that have obvious and decipherable textual content, tesseract's reading is... let's just say creative. What's more, the tesseract is trained on English words, so it tends to find English wherever it looks (including, amusingly, Hangul). I'm pretty pleased with the results, though---tesseract reads photographed text in a weirdly dreamlike, alien way.

799 views and 0 responses