We have thousands of paper documents with valuable information. Before we can use that information, someone needs to take the time to key the data. If you work with people whose days are consumed with tedious data entry, consider simplifying their workday with automated data extraction. Data extraction allows you to reduce manual data entry, increase throughput and often even reduce errors. Such technology is referred to as OCR, ICR, or MICR. It is easy to see how someone interested to automatic data extraction can get lost in a sea of acronyms. Here is a brief overview of some data extraction technologies and how they may benefit you.
Quick Note: Data extraction technology does not completely eliminate data entry. If you get a slick salesperson that promises OCR will magically make your data entry needs disappear, do the following. Allow him take you out to a free meal (order the lobster), smile and nod at everything he says and then never return his phone calls. It is the least you can do to someone that knowingly deceives you. OCR technology will allow your people to do more work in less time – making them more productive.
Image capture is the first step in electronic data capture. Image capture is the process of converting a paper document into an electronic image. Usually these documents are stored as Tagged Image File Format (TIF) or Portable Document Format (PDF). There are many benefits to document imaging beside automatic data extraction. I’ll cover those in later articles.
The image is typically captured with a scanner. There is a wide variety of scanners available – from single workstation (five pages / minute) to full-scale production scanners (fifty pages / minute). Of course the price reflects the features of the scanner.
Many companies have electronic fax servers such as RightFax or Biscom. These fax servers convert incoming faxes into images automatically. These solutions can be very costly. If you are looking for a low cost alternative to expensive fax servers consider email-based fax solutions such as eFax. These solutions send inbound faxes to an email address of your choosing.
Optical Character Recognition (OCR) software reads an image and converts the information into digital data. Such software is capable of processing machine print, handwritten or even cursive text. OCR of handwritten text is often referred to as ICR (see below).
OCR of machine written text is largely considered a solved problem and yields high accuracy. Clean machine text may conservatively reach 95% character accuracy. In the real world documents are rarely perfect when they are scanned. Lines running through text or smudged ink can reduce the accuracy level. However, significant productivity gains are typical.
Intelligent Character Recognition (ICR), or Handwritten OCR, has come a long way in the last decade or so. Accuracy of handwritten data extraction is enhanced using constrained print fields. You may receive recognition rates of 80 to 90%.
ICR implementation uses constraint print fields to maximize recognition rates. These print fields encourage the user to separate each character and prevent written text from “running together”. Here are a couple examples of print constraint fields.