Filewatcher File Search File Search
Catalog
Content Search
» » » » ocropus_0.3.1.orig.tar.gz » Content »
pkg://ocropus_0.3.1.orig.tar.gz:12061574/ocropus-0.3/ocr-pageseg/  info  downloads

README

The purpose of this module is to segment text from other non-text components in
the page, like haltones, graphics, math, table etc. First a dual RAST segmenter
is applied to segment the page into zones. Then each zone is classified into one
of the following classes:

text
math
table
logo
drawing
halftone
ruling
noise

Zone classification is done using a logisitic regression classifier. A file
(log-reg-training-file.txt) containing the coefficients for the logisitic
regression classifier obtained by training the classifier on the UW-III dataset
is included. Since UW-III dataset has images scanned at 300-dpi, the system
works the best on 300-dpi scanned documents. For more information about the
algorithm, please refer to:

D. Keysers, F. Shafait, T.M. Breuel. "Document Image Zone Classification - A
Simple High-Performance Approach", VISAPP 2007, pages 44-51.
 
Results 1 - 1 of 1
Help - FTP Sites List - Software Dir.
Search over 15 billion files
© 1997-2017 FileWatcher.com