#! | 11 lines | 9 code | 2 blank | 0 comment | 0 complexity | eefba0b94e4a824ad71de708ff53b4cc MD5 | raw file
1This is a pretty hard document, a scanned, warped historical newspaper 2page. It's mostly here as a test case to see how we can improve OCRopus 3in the future. 4 5The script illustrates how to adjust the layout analysis parameters 6in ocropus-gpageseg for these kinds of documents. Note that there are 7some layout analysis errors. 8 9Better character recognition performance will require retraining models 10on historical books and newspaper prints (the current models are trained 11on modern scanned documents only).