What
are the key steps involved in Anacomp's document capture
process?
What
is the difference between a document, a page, and an image?
What
are the standard U.S. paper sizes?
What
is document preparation?
When
is document preparation required?
What
is document de-preparation?
What
is the standard format used to store images?
What
are the different types of PDF image formats?
What
is the image size of a scanned document?
What
is image resolution?
What
about color files or photographs?
How
are double-sided or duplex documents scanned?
How
are blank pages handled during duplex scanning?
How
are "skewed" images handled?
Can
I view combinations of images, text and index fields side-by-side?
Can
I open and display more than one document at a time?
What
is OCR?
How
accurate is OCR?
What
is ICR?
What
is OMR?
What
are barcodes?
What
is MICR?
How
are images indexed?
What are the key steps involved in
Anacomp's document capture process?
-
Anacomp
has developed an accurate and efficient capture process
utilizing state-of-the-art hardware, software, and methodologies.
The following table summarized each step within Anacomp's
capture process.Top
| Process |
Description |
| Document
Preparation |
Prepare
documents for scanning by removing fasteners,
unfolding, repairing, sorting, inserting document
separators, etc. |
| Scan |
Scan
source documents, microfilm, or microfiche into
scanning workflow.Recognition Automatically
separate documents, identify forms, and perform
auto recognition (OCR, ICR, OMR, barcodes, etc.). |
| Quality
Control |
Rescan,
validate form identification, and image quality.
Validation Validate auto-recognized data, manual
data entry Verification Blind data entry, independent
verification if required. |
| Validation |
Validate
auto-recognized data, manual data entry. |
| Verification |
Blind
data entry, independent verification if required. |
| OCR
Full Text |
Full
text OCR for each document and output into a
specified format if required. |
| PDF
Generator |
Produce
Adobe PDF images |
| Release |
Format
the images and associated indexing data into
the required output format. |
| Document
De-Prep |
De-prepare
documents if required. |
What is the difference between a document,
a page, and an image?
-
A
"document" consists of one or more pages of
data that are typically related or part of a logical
group. A "page" generally refers to a physical
piece of paper that may either contain data on a single
side of the page (simplex) or on both sides of the page
(duplex). An "image" generally refers to a
digital representation of a single side of a page. For
example, consider a monthly bank statement for Account
123456 which contains 12 duplex pages. This statement
would then represent 1 document, 12 pages, and 24 images.Top
What are the standard U.S. paper sizes?
-
Please
refer to table below and remember that each successive
increase in paper size uses the long dimension of the
previous size as the new short dimension, and then doubles
the previous short dimension for the new long dimension.Top
| Description |
Dimensions |
| A
Size |
8-1/2"
x 11" |
| B
Size |
11"
x 17" |
| C
Size |
17"
x 22" |
| D
Size |
22"
x 34" |
| E
Size |
34"
x 44" |
What is document preparation?
-
Document
preparation is the manual process of preparing documents
for scanning and can often be critical to the success
of any imaging project. Document preparation typically
involves but is not limited to the following:
When is document preparation required?
-
In
most cases, document preparation is required prior to
scanning. Customers may reduce the amount of document
preparation performed by Anacomp prior to releasing
the documents for scanning by modifying their internal
workflow process. For example, documents may be stored
unfolded or the customer may eliminate the use of staples
when created.Top
What is document de-preparation?
-
Document
de-preparation is the manual process of preparing documents
for physical storage after they have been scanned. The
de-preparation process will vary based upon the degree
to which the documents must be handled. Document de-preparation
typically involves but is not limited to the following:
-
- Returning
documents into the envelopes, folders, or other containers
from which they came
- Inserting
staples, paper clips, or other fasteners that bound
the pages prior to scanning
- Resorting
documents
- Removing
document or batch separator sheets
What is the standard format used to
store images?
-
Black
and white images, sometimes referred to as "bi-tonal
images" are most commonly stored as standard TIFF
files using CCITT Group 4 compression. Grayscale and
color images may be stored as TIFF, JPEG, or PDF files
and generally result in larger image files.
If
scanned images are ingested into docHarbor Online as
PDF, the user may select multiple documents and create
a combined PDF document.Top
What are the different types of PDF
image formats?
-
PDF
stands for Portable Document Format and is the de facto
standard created by Adobe. PDF files are compact, cross-platform,
and can be viewed by anyone with an Acrobat Reader.
There are two formats relating to document capture.
The first format, PDF Image Only, is an image bitmap
representation of the actual document. The image' s
full text is not searchable. The second format, PDF
Searchable Image, also referred to as Image+Hidden text,
is a combination of a bitmap image in the PDF format
with embedded text within the document. The full text
of the document is searchable. This format is generally
more costly to produce and should be considered when
full text searches are justified.Top
What is the image size of a scanned
document?
-
A
single image typically occupies approximately 50 KB
of disk space if the image is stored in TIFF Group IV.
The actual size of an image depends upon the several
factors including the image type (bi-tonal, grayscale,
color), bit-depth, compression type, resolution, and
document data density. The following table
| Storage
Amount |
Number
of TIFF IV Images |
| 1
MB |
~20
images |
| 1
GB |
~20
images |
| Standard
CD (images only) |
Between
10,000 to 15,000 images |
| Standard
DVD (images only) |
Between
90,000 to 100,000 images |
What is image resolution?
-
Image resolution is measured in terms of dpi (dots per
inch) and typically range from 200 dpi to 400 dpi and
impacts the file size of the image. The recommended
resolution should take several items into consideration
- the optimum balance between image quality and image
size, number of pages per document, time to download
documents from the image repository, and the use of
recognition technologies including OCR,
ICR, or OMR.
Top
What about color files or photographs?
-
Imaging systems should support black and white, grayscale
and color images. Color files can be scanned with a
color scanner or imported into an imaging system.Top
How are double-sided or duplex documents
scanned?
-
An
imaging system should provide two different ways to
do this. It should support duplex scanners, which simultaneously
scan both sides of a page. Also, with a simplex scanner,
the user should be able to scan all the front sides,
place the documents in upside down and scan all the
back sides, and then the system should automatically
collate the pages into the correct order.Top
How are blank pages handled during
duplex scanning?
-
Blank
pages may be automatically identified and ignored during
the scanning process by configuring a predetermined
threshold within the capture system. For example, a
blank page threshold of 2,000 bytes could be configured
to ignore any image that is less than 2,000 bytes in
size. Anacomp works with each customer and each application
to determine the optimum threshold for blank page handling.Top
How are "skewed" images
handled?
-
Skewed
(crooked or tilted) images can adversely affect the
accuracy of the OCR process, so an imaging system should
include software that recognizes skewed images and compensates
for them.Top
Can I view combinations of images,
text and index fields side-by-side?
-
To
allow convenient access to document information, a well
designed imaging system will allow the view screen to
be configured to show the text, images, template index
fields or thumbnail images.Top
Can I open and display more than
one document at a time?
-
Some
imaging systems will allow you to display multiple documents,
with the number of documents you can have open simultaneously
limited only by the amount of memory available.Top
What is OCR?
-
OCR
stands for Optical Character Recognition, which is how
a computer converts words in an unsearchable scanned
image to searchable text. OCR engines can generally
only recognize typed or laser printed text, not handwriting.Top
How accurate is OCR?
-
Accuracy
on a freshly laser printed page in excellent condition
is typically better than 95%. Accuracy on faxed, dirty
or degraded documents may be significantly lower.Top
What is ICR?
-
ICR
(Intelligent Character Recognition) is pattern-based
character recognition and is also known as Hand Print
Recognition. Handwritten text is more difficult for
computers to recognize and results in higher error rates
than printed text. ICR engines usually do best at recognizing
constrained printing, which means block printed letters
with one letter in each box. Accurate recognition of
unconstrained handwriting, especially cursive handwriting,
typically requires that the ICR engine be trained to
recognize each user's style of writing.Top
What is OMR?
-
OMR
(Optical Mark Recognition), also called Mark Sense Recognition,
is the recognition of marks commonly used on forms,
such as check marks, circled choices, and filled-in
bubbles. OMR can be an important part of an imaging
system for organizations that process many standard
forms.Top
What are barcodes?
-
A
barcode is an array of vertical rectangular marks and
spaces in a predetermined pattern used to represent
data. Barcodes are generally used in document capture
systems to reduce or eliminate certain costs by automating
document type identification or index capture. There
are several barcode types used in document capture including
code 39, code 128, and interleaved 2 of 5.Top
What is MICR?
-
MICR
(Magnetic Ink Character Recognition) is a character
recognition system used on bank checks. Special characters
are printed using magnetized ink for automatic reading.Top
How are images indexed?
-
Images
may be indexed using several capture methods. docHarbor's
Professional Services team will evaluate the most accurate
and cost-effective way to meet and/or exceed the scan
project's requirements.
| Indexing
Method |
Description |
| Manual
Data Entry |
A
key operator manually views a document, identifies
the index value, and manually data enters the
value. |
| Manual
Data Entry with double blind verification |
A
key operator manually views a document, identifies
the index value, and manually data enters the
value. A second key operator repeats the same
process and is unaware of the index value keyed
by the first operator. If the values match, then
the document is considered accurate. If the values
do not match, then the document is handled as
an exception. |
| Zonal
OCR |
A
template is used to identify specific regions
of an image to predefined indexes. An OCR process
is applied to each region and captures the index
value based upon a predetermined confidence threshold. |
| Barcode
Recognition |
A
template is used to identify the location and
type of barcode used within a document. A barcode
recognition process is used to capture each barcode
index valued based upon a predetermined confidence
threshold. |
| Database
Merge |
A
unique index field is captured for each document
using manual, zonal OCR, or barcode recognition.
An automated process is created to match the unique
index field with other index fields contained
in the database. When a match is made, then the
other index fields are associated with the document. |
| Hybrid
Indexing |
docHarbor
will utilize multiple indexing methods and develop
the most accurate and cost-efficient system. |
-
|