293 lines
15 KiB
Plaintext
293 lines
15 KiB
Plaintext
DIzzIE's Scanning Tutorial (on using FineReader)
|
||
BY: DIzzIE [antikopyright 2003]
|
||
|
||
Intro.
|
||
This is a quick tutorial on how to scan content (i.e. books,
|
||
magazines, pamphlets and the like) using the popular software called
|
||
FineReader. I'm well aware that there's a couple tuts already out on
|
||
this, but they all focus around scanning fiction; that is to say
|
||
they're all focused on OCRing text from text-only books, or
|
||
conversely on perfecting images, i.e. scanning comic books. This
|
||
guide will briefly discuss working with raw images (as well as touch
|
||
on OCRing and the basic features of FineReader).
|
||
|
||
0. Naturally, you first need a scanner. You can pick one up cheap at
|
||
pawnshop, salvation army type store, or from a friend or nearby
|
||
library. You may also want to try scamming a scanner:
|
||
dizzy.ws/kodak.htm . Alternatively, if you have a high quality
|
||
digital camera some would suggest simply taking snapshots of pages.
|
||
|
||
1. Get ABBYY FineReader OCR Professional 7.0 (the latest version at
|
||
the time of this guide). Download here: download.com.com/3000-2079-
|
||
10228095.html?tag=lst-0-1 or fill in your e-mail and get download
|
||
link emailed to you here: download.abbyy.com/content/default.aspx
|
||
|
||
2. Download a keygen to register your try-n-buy version here:
|
||
allcracks.net/html/a-1.html or find more places to download here:
|
||
dizzy.ws/serials.htm
|
||
|
||
3. Once you install/run keygen, connect your scanner to your
|
||
computer and run FineReader (FR).
|
||
|
||
4. First thing to do is go to Tools > Options, under the Scan/Open
|
||
Image Tab if your scanner is not automatically listed in the TWAIN
|
||
Driver box, click on Select Source. If nothing is showing up, this
|
||
means FR can't detect your scanner. You should make sure the scanner
|
||
is connected, turned on, and that you have the latest drivers. Go to
|
||
the scanner manufacturer's website to download the latest drivers for
|
||
your scanner. After this restart your computer. If after updating
|
||
drivers/checking connection FR is still not picking up your scanner,
|
||
try running the default software that came with your scanner. If even
|
||
that does not work, contact your scanner's manufacturer.
|
||
|
||
5. If FR recognized your scanner in Step 4, that is if you can see
|
||
your scanner's name in the TWAIN Driver box, then select the Use
|
||
FineReader Interface radio button, not Use TWAIN-Source interface. If
|
||
however in Step 4 you could only get your scanner to work with its
|
||
default program, then keep the TWAIN-Source interface button checked.
|
||
|
||
6. If you are using the TWAIN-Source for scanner settings (the
|
||
default program that came with your scanner) you will need to
|
||
configure the same things I describe in the next steps in the default
|
||
program for your scanner. Using the TWAIN-Source software is not
|
||
recommended, only if you could not get FR to recognize your scanner.
|
||
|
||
7. Still in the Scan/Open Image Tab, click Scanner Settings.
|
||
|
||
8. Here you can configure a variety of options; these will vary
|
||
depending on what you are scanning. A few guidelines:
|
||
*Unless you are scanning something that is very light printed, the
|
||
Brightness should be kept on Automatic (default), with the slide bar
|
||
in the middle of the light/dark spectrum bar.
|
||
*Paper size should be changed to match the dimensions of whatever
|
||
you are scanning. This saves time in that you don't have to wait for
|
||
the scanner bar to go all the way to the end for every scan. Also
|
||
saves us the trouble of splitting excess image blocks later on.
|
||
*Pause between pages is how long you want the scanner to wait before
|
||
automatically scanning the next page. 5-10 seconds should be
|
||
sufficient.
|
||
*The Resolution should be at a minimum of 300dpi, moving upwards to
|
||
600dpi if what you are scanning is in small print/detailed pictures.
|
||
*Pictures Scanning Mode should be color if you're scanning color
|
||
images (magazines, book covers, etc), or grayscale if you're scanning
|
||
text/b&w pictures. The black & white mode is not recommended as it
|
||
produces grainy poor-quality images.
|
||
*Unless you want to see the Scanner Settings dialog every time you
|
||
scan a page, uncheck Show This Dialog Before Scanning.
|
||
*If you have a feeder scanner (versus a flatbed), that is if you
|
||
feed pages into your scanner like a fax machine versus lying them
|
||
down on the scanner like a copy machine, you may want to select Use
|
||
automatic document feeder (doesn't work for all feeder scanners).
|
||
*Finally hit OK to exit out of Scanner Settings
|
||
|
||
9. Now let's configure a few more things in the Options menu again.
|
||
Still in the Scan/Open Image Tab, select the following options:
|
||
*Despeckle Image
|
||
*Split Dual Pages (optional, more on this in a little bit)
|
||
*Detect Image Orientation (during recognition)
|
||
*Open Image During Scanning
|
||
|
||
10. Under the Recognition Tab select the following options:
|
||
*Recognition Language: obviously make sure it's set to the language
|
||
the content that you're scanning is in
|
||
*Autodetect Layout
|
||
*Clear Background Noise
|
||
*Autodetect (print type)
|
||
*Do not use user patterns
|
||
|
||
11. Under the Formatting Tab select the following options:
|
||
*Retain Full Page Layout
|
||
*Keep Pictures
|
||
|
||
12. Finally hit OK to exit out of the Options menu. Feel free to
|
||
look at any other options and modify them as you wish, most are self
|
||
explanatory and if not FR has a great help file (just hit F1) or
|
||
download an additional FR tutorial from the manufacturer:
|
||
download.abbyy.com/content/default.aspx
|
||
|
||
13. One more thing that needs to be changed: go to Process and
|
||
select Start Background Recognition.
|
||
|
||
14. Before you start scanning, clean your scanner (if it's a
|
||
flatbed) with some window cleaning solution, or just soapy water, use
|
||
a window cleaner if possible to avoid streaks, or a towel with even
|
||
swipes to avoid leaving streaks. Once your scanner is clean and dry
|
||
proceed to step 15.
|
||
|
||
15. Now then, onto scanning. Position the material onto the scanner
|
||
and hit the Scan&Read button. You should see a "collaborating
|
||
scanner...." Pop-up window followed by a ScanGear progress bar. The
|
||
image should then be scanned. Wait for the automatic recognition
|
||
process to finish and then you can work on the image.
|
||
|
||
16. Let's look at the image you scanned. You should see a thumbnail
|
||
picture of the image on the left-hand menu, a larger picture in the
|
||
middle menu, and any recognized text on the right hand menu. The
|
||
middle "image" window is where we'll be looking at next.
|
||
|
||
17. You need to make some decisions about how you want your finished
|
||
scan to look: do you want it OCRed (optical character recognition),
|
||
meaning that the words will be converted to text. The upside of
|
||
OCRing is that your finished product will be smaller in terms of file
|
||
size, it will be searchable for specific words, and it will be easier
|
||
to read. The downside is that it takes more time to produce an OCRed
|
||
text because it will require at least minimal proofreading of the
|
||
text to root out any OCRing mistakes. OCRing is thus recommended if
|
||
you're scanning a largely text-only book, have sufficient time on
|
||
your hands to proofread the scan, and are not doing a precise text
|
||
that involves important formulas/calculations. If you're scanning a
|
||
magazine, comic book, or a scientific text with precise formulae,
|
||
OCRing is NOT recommended.
|
||
|
||
18. In the Image menu, you should see a list of button on the left-
|
||
hand side. The two we'll be working with are the OCR (text) button,
|
||
and the Image button. They are the 2nd (The green-bordered T button)
|
||
and 4th (the red-bordered mountain button) buttons from the top,
|
||
respectively. Briefly, you select text blocks (that will be OCRed)
|
||
and image blocks (that wont) and then hit the Read All button. But
|
||
before you do this, there are a few things you need to do first.
|
||
|
||
19. If you had automatic image splitting enabled and FR didn't split
|
||
the scanned images the way you want, or you want to get rid of excess
|
||
borders and such, go to Image > Split Image and then select how you
|
||
want to split the image. You can then delete portions you don't want
|
||
by clicking on the thumbnail image in the left-hand menu, and
|
||
pressing delete.
|
||
|
||
20. If the image is not rotated correctly, go to Image and choose
|
||
the needed rotation.
|
||
|
||
21. Also if at any time you find that an image scanned badly or you
|
||
skipped a page and such, scan the image again, it should now appear
|
||
as the last numbered page in the thumbnailed Batch menu. Then, if you
|
||
are simply replacing an image, select the image to be replaced in the
|
||
thumbnailed Batch menu and delete the image. Then highlight (select)
|
||
the rescanned image, go to Batch > Renumber Pages... and selecting
|
||
Selected Pages, type in the page number that the original image was,
|
||
thus sliding it into place.
|
||
If you're inserting a missed image, things are a little trickier.
|
||
Find the spot where the image should be and then do the following:
|
||
(for this example the image should have been #21), select all images
|
||
from the current 21 (inclusive, meaning select the current 21) to the
|
||
end (non-inclusive, meaning don't select the image that you are going
|
||
to be inserting), (click on number 21, hold down shift, and click on
|
||
the last-to-last image). Go to Batch > Renumber Pages, and selecting
|
||
All Pages, Continuous Page Renumbering, type in 22 for First
|
||
Renumbered Page. Then repeat the steps for replacing an image
|
||
explained in the preceding paragraph.
|
||
|
||
22. Now that you have your images scanned/fitted correctly, back to
|
||
the middle Image menu we go. If you're not happy with the
|
||
fields/boxes auto recognition selected for you, you can click on that
|
||
box and just delete it. Then select portions that you want OCRed (if
|
||
any), and the images. After you have done this for all scanned images
|
||
hit Read All. Note that after you experiment with a few sample pages,
|
||
you can select Scan&Read Multiple Images from the Scan&Read dropdown
|
||
menu (this will save you the trouble of hitting the same button for
|
||
every scan).
|
||
|
||
23. Once your new recognition has finished, if you have only chosen
|
||
to recognize images (no OCRing) you are ready to save.
|
||
|
||
24. Click the Save button. If you want to save all your images as
|
||
one PDF file (FR has a built-in PDF printer driver, so need to
|
||
install any additional software ) click on Formats Settings... and go
|
||
to the PDF Tab. Flirt with the Save Mode options, by saving only a
|
||
page or two of your scan and seeing if you're satisfied with how it
|
||
looks in the created PDF document. Text and Pictures Only will save
|
||
only the pictures you recognized (recommended), while Page Image
|
||
saves the original, unedited image seen as a thumbnail in the left-
|
||
hand Batch menu.
|
||
|
||
25. Under Font Use Mode, keep the default Use Standard Fonts option,
|
||
and under Reduce Picture Resolution To and JPEG Quality, experiment
|
||
with amounts to balance the total file size (the higher
|
||
resolution/quality the larger the file size) with image quality. You
|
||
may want to create two versions of your scan, one with smaller file
|
||
size and slightly worse quality, and one with a larger size and
|
||
better quality. Regardless of different sizes/recognition ratios, any
|
||
of your versions should be readable without eyestrain. After the
|
||
Formats Settings, click Save to File (keeping the Keep Pictures box
|
||
checked), select PDF, and keep the default save options unless
|
||
there's something you want to change (all the save options are self
|
||
explanatory so I wont go into them here).
|
||
|
||
26. Once you are satisfied with your PDF, you are done
|
||
|
||
27. If however, your scan involves text that you felt like OCRing,
|
||
there is some more work that you will have to do.
|
||
|
||
28. Once you have selected all the text/image portions and clicked
|
||
Read All (as per step 22), you will now need to edit/format the
|
||
scanned material. This is best done in a word processing program
|
||
rather than FR.
|
||
|
||
29. Select the Save button, and click on Formats Settings. Under the
|
||
DOC/RTF/Word XML Tab select the following options:
|
||
*Default Paper Size: Letter (the 'automatically increase paper size'
|
||
feature usually does not matter, if you start getting irregular paper
|
||
sizes, by all means uncheck it)
|
||
*Make sure that everything else is unchecked, save for Retain Text
|
||
Color and Save in Word 97 or Later Format (both are default options)
|
||
|
||
30. Back in the main save menu be sure to select either retain font
|
||
and font size or remove all formatting in the retain layout section.
|
||
Keeping the default radio button, retain full page layout, selected
|
||
will result in restricted margins, awkward page breaks and other
|
||
annoyances when you are editing the file.
|
||
|
||
31. Now save the scan as either doc or rtf (both can be opened using
|
||
Microsoft Word, or the free WordPad or the free desktop publishing
|
||
package Openoffice <20> .openoffice.org ). You will now have to
|
||
proofread/format the scan. Some basic things to do include:
|
||
*Thoroughly skim over the text/spell check to catch any spelling
|
||
mistakes, as well as any false positives, that is words that when
|
||
OCRed form real words, just not the correct contextual words, for
|
||
instance "mom" instead of "morn"
|
||
*Cut/Paste misplaced pictures/captions/titles. While the advantages
|
||
of saving without the formatting feature in FR are many, one of the
|
||
disadvantages is that graphics often get shifted from their correct
|
||
order, sometimes requiring that you look at the original paper
|
||
(treeware) document to see where they belong.
|
||
*Set desired spacing. Various spacing issues may need to be fixed as
|
||
well, these include (but are not limited to): paragraph indentation,
|
||
spacing between chapters and removing the '-' mark that may have
|
||
split words in the original treeware version.
|
||
*Renumbering the Table of Contents (TOC). If your document included
|
||
a table of contents, you may want to change the page numbers to
|
||
change your scanned version.
|
||
|
||
32. Finally you are ready to save your work and release it to the
|
||
public . You may save as rtf, which is a popular formatted similar to
|
||
doc save for the fact that it is much more versatile and does not
|
||
require special software, while at the same time allowing formatting
|
||
features (unlike pure txt files). The downside is that rtf files are
|
||
usually a bit larger than doc files. If you wish to save in pdf
|
||
format you will need to get a pdf printer driver such as Fineprint
|
||
PDF Factory PRO (check out dizzie.serein.us/serials.htm for tips on
|
||
finding serial numbers).
|
||
|
||
33. Also remember that even if you scanned your document using
|
||
another program, and now just have images of pages, you can easily
|
||
import them into FR and OCR/format/save as pdf (basically any of the
|
||
aforementioned steps). To import images go to File > Open Image and
|
||
select the images you want to import (hold down ctrl or shift and
|
||
select more than one image). If a pop-up window appears asking about
|
||
resizing, select Leave Original. The images should now appear in the
|
||
left-hand Batch menu.
|
||
|
||
Well, I should wrap this up; this guide has gotten a tad bit longer
|
||
than I intended. As you will have doubtless realized by now, FR is a
|
||
very powerful tool with a vast array of features. To give a final
|
||
summary, a basic process of creating an e-document involves: 1)
|
||
scanning the document and 2) editing/formatting/proofing/saving the
|
||
document. Obviously everything could not be covered in this guide; if
|
||
you have a question about something, look through the official FR
|
||
help file, and if you still can't find an answer feel free to drop me
|
||
a line.
|
||
|
||
-
|
||
Comments? Get in touch: xcon0 @t yahoo \/d0t/\ c||o|m
|
||
(or call +1 (610) 887-6072)
|
||
|
||
For more knowledge check out www.rorta.net and www.dizzy.ws |