1. Trang chủ
  2. » Công Nghệ Thông Tin

How to Do Everything With Your Scanner- P50 pdf

5 156 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề How to Do Everything with Your Scanner
Trường học McGraw-Hill Education
Chuyên ngành Optical Character Recognition
Thể loại book
Năm xuất bản 2001
Thành phố New York
Định dạng
Số trang 5
Dung lượng 179,72 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Scanning Text Documents Using OCR Software Chapter 13... How To…■ Understand how OCR software works ■ Recognize the limitations of OCR software ■ Properly prepare your original document

Trang 1

Scanning Text Documents Using OCR Software

Chapter 13

Trang 2

How To…

■ Understand how OCR software works

■ Recognize the limitations of OCR software

■ Properly prepare your original document for OCR software to read

■ Optimize scanning of text documents Installing Optical Character Recognition (OCR) software is sort of like teaching your computer to read and type text into your PC It allows you to scan a document and convert it to a text file you can edit This is a very useful feature, but the process can

be plagued with inaccurately read characters and resultant errors This chapter gives you a few pointers on how to make the process more accurate Accuracy is the hallmark of effectiveness when scanning text documents

What OCR Software Does

OCR software has the capabilities to do the following:

■ Capture text images from your scanner

■ Compare those characters to an existing database of characters and identify them

■ Produce output that you can edit OCR software looks at the millions of tiny dots that make up the characters on a page of text It sifts to find characters that it recognizes, and converts those characters into a new, readable text file

A Look at the Leading OCR Programs

Currently the leading OCR software is TextBridge Pro Frequently bundled with scanners, TextBridge Pro converts scanned files into Microsoft Word Once a file

is in Microsoft Word format, you can covert it to other Microsoft applications, such as Access, Excel files, Internet Explorer, Netscape Navigator, and FrontPage You can learn more about this popular program at the manufacturer’s Website at www.scansoft.com, as shown in Figure 13-1 This program sells for about $80,

as of the writing of this book

Trang 3

FIGURE 13-1 TextBridge Pro is a popular OCR program that comes bundled with

many scanners

A more sophisticated version, TextBridge Pro Millennium Business Edition,

is intended to bridge the gap between bundled scanner software and a full-blown professional program It comes with special tools for managing documents in a business setting, and sells for around $500, as of the writing of this book

Trang 4

enhanced accuracy and automatic correction features for crooked and damaged pages It also has more sophisticated capabilities to retain the original formatting of documents, such as Excel spreadsheets and magazine articles with multiple columns This software also sells for just under $500, as of the writing of this book

How OCR Software Works

OCR software uses mathematical algorithms to classify the characters your scanner extracts during the scanning process Depending on what kind of software you’re using, the program might use one or both of the following methods:

Matrix matching This process identifies scanned symbols by matching them against character templates within the software This process is gradually being replaced by the feature extraction method in newer software programs

Feature extraction This method analyzes each character extracted by your scanner using a mathematical algorithm (a formula) For example, the algorithm

might describe mathematical information about a circle, which is the letter o, or

other shapes and angles that are common to all letters

Both these methods achieve fast results—much faster than having a typist simply input the documents into a PC For example, an average typist can input about 60 words per minute Your scanner, using OCR software, can extract upwards of 600 words per minute

Each character identified by the OCR software is assigned a confidence level,

depending on how closely the extracted character corresponds to the specifications of the algorithm Characters below a certain confidence level are flagged, and a certain character is substituted, such as a ~, which you must hand-correct Many OCR programs boast an accuracy rate of 90 percent or better, which means you might need to hand-correct up to 10 errors for every 100 words of text

Recognize the Limitations of OCR Software

Optical character recognition is designed to read only text—sometimes referred to as

machine type It’s important to understand that OCR software cannot read any of the

following:

■ Handwriting

Trang 5

■ Data that must be extracted from boxes on forms such as tax returns (although high-end programs offer some capabilities in this area)

■ Bar codes Additionally, OCR software can be very finicky about the print quality it will convert Carbon copies, newspapers, poor-quality faxes, and old documents typed on manual typewriters can pose problems for your OCR software

Prepare Your Original Document for OCR Software to Read

When scanning a document prior to using your OCR software, always choose the best original available Before scanning a paper document, inspect it carefully and attempt

to fix the following:

Missing or broken characters If your document has characters with small breaks or gaps, your eyes might be able to read it, but your OCR scanner might not Look closely at the copy (maybe even with a magnifying glass) Try to find a copy with the least number of gaps, or reduce the brightness setting on your scanner to obscure them

Dirt and smudges You can keep these marks from confusing your scanner and OCR software by covering them with thin white correction tape

Handwritten notations Obviously you want to avoid writing on documents destined for OCR conversion If someone has already done so, cover it up with white correction tape

Staple holes and wrinkles Cover these with correction tape as well

Facsimile documents If you plan to scan a facsimile document, ask the sender

to transmit it using fine mode rather than standard

Glossy paper Glossy paper is less susceptible to clear, clean scanning When an original has been scanned on glossy paper, a photocopy might work better for OCR conversion

Don’t try to fix a poor-quality original simply by photocopying it A poor original simply makes a poor photocopy.

Ngày đăng: 03/07/2014, 15:20

w