Optical Character Recognition (OCR) refers to a software technology and processes that involve the translation of printed text into computer searchable text.
Done correctly, OCR enables users to search for and retrieve individual words contained within a file or page. In addition, when a set of files is indexed, users are able to search for keywords across an entire document library and retrieve each page with exact precision. OCR enables users to execute searches in seconds, searches that once could take several hours or days to complete.
However, this technology did not work well on older or poor quality documents that contained mixed fonts or combinations of texts and graphics. Until now!!
Due to several recent technology advances, it is now possible to obtain six-sigma level character accuracy from these types of document collections.
Although it is important to keep in mind that the quality and condition of the paper documents are still key factors in the successful OCR conversion, dramatically improved results can be obtained by enhancing the quality of the scanned image prior to processing.
Noise removal of borders, speckles and...