README | searchcode

/historic-newspaper/README

https://code.google.com/p/ocropus/ · #! · 11 lines · 9 code · 2 blank · 0 comment · 0 complexity · eefba0b94e4a824ad71de708ff53b4cc MD5 · raw file


This is a pretty hard document, a scanned, warped historical newspaper
page.  It's mostly here as a test case to see how we can improve OCRopus 
in the future.

The script illustrates how to adjust the layout analysis parameters
in ocropus-gpageseg for these kinds of documents. Note that there are
some layout analysis errors.

Better character recognition performance will require retraining models
on historical books and newspaper prints (the current models are trained
on modern scanned documents only).