ATHENA: Automatic Text Height ExtractioN for the Analysis of old handwritten manuscripts

Authors:

Ruggero Pintus, Yale University, ruggero.pintus@yale.edu
Ying Yang, Yale University, ying.yang.yy368@yale.edu
Holly Rushmeier, Yale University, holly.rushmeier@yale.edu

Abstract:

A massive digital acquisition of huge sets of deteriorating historical documents is mandatory due to their value and delicacy. The study and the browsing of such digital libraries is becoming crucial for scholars in the Cultural Heritage field, but it requires automatic tools for analyzing and indexing those dataset items. We present here a layout analysis method to perform automatic text height estimation, without the need of any kind of manual intervention and user defined parameters. It proves to be a robust technique in the case of very noisy and damaged handwritten manuscripts. The effectiveness of the method is demonstrated on a huge heterogeneous corpus of medieval manuscripts, with different writing styles, and affected by other uncontrollable factors, such as ink bleed-through, background noise, and overtyping text lines.

Go to the Paper Web Page.

The database consists in 21 Medieval manuscripts (6922 pages), written by hand before 1500 AD. They are from Yale University’s Beinecke Rare Book and Manuscript Digital Library.

Go to the database. (Please limit downloads from the image server. Use local copies of images for analysis.)

Note: To cite this page, please use the persistent link at http://hdl.handle.net/10079/cz8w9v8.