OCR is the process of converting document images to text that can be
manipulated in an editor or word processor. The image is created by using a
scanner. Once the document exists as a computer image file, the OCR program
analyzes the image and extracts the text. The text file will take up only a
small fraction of the disk storage space the image would take, and can then
be loaded into the word processor of your choice for editing or inclusion into
Accuracy of OCR conversion varies widely. With good quality text pages using just standard fonts it can be in the 99% area. For a page that is not good quality (a fax or poor photocopy) it can be a lot lower. When a page contains a lot of fonts in different sizes, that will also affect accuracy.
A professional level OCR program will mark all the characters about which it is not certain. An operator can then pop the document up on the screen and go automatically from mark to mark verifying or correcting as needed.
There are specialty OCR programs that can read a printed mailing list and enter the names and addresses into a database so the list can be handled by computer programs.
Some document imaging programs have the ability to highlight and OCR selected words and/or phrases for inclusion in a keyword index which can be searched to retrieve documents.
©:Andrew Grygus - Automation Access - www.aaxnet.com -
Velocity Networks: Network Consulting Service - Internet Service Provider - Web Page Design and Hosting
All linked pages are copyright © the original creator. All trademarks and trade names are recognized as property of their owners