Scanning and OCR in the Faculty Lab

When to use OCR?

If you scan in paper documents for use in your online classes, you will need to use an OCR (optical character recognition) software such as Adobe Acrobat Professional or Omnipage, available on every campus in the Faculty Production Labs.

Why use OCR?

When you scan in documents, the computer saves the document as an image, not as actual text.  OCR reads your document and converts the words into real text.  There are many advantages to this:

text is editable:
easier for you to make corrections and updates
accessibility:
a screen reader can now read your document aloud to a student
searchable:
readers can search your document for words and phrases
usability:
text can be selected, copied and pasted - no more retyping
smaller file size:
text takes up less space than image
saves time:
much faster than re-typing the document by hand

Is it accurate?

OCR technology has come a long way in recent years.  If you're starting with a good scan of a typed document, the results will be surprisingly accurate.  However, stray marks, blurred photocopies, unique fonts, and handwritten characters can produce unpredictable results.  The better shape the original document is in, the better the OCR interpretation will be.   OCR programs come with built in proofreading tools, so you can double check the interpretation.

How to use OCR programs in the faculty lab

PCC faculty labs have two OCR programs: Adobe Acrobat Pro and Omnipage.  Omnipage is a dedicated OCR program with a big feature set, while Acrobat has a more straightforward interface.   For most documents, either program will produce accurate results - feel free to use whichever one you are most comfortable with.

If you need assistance determining which OCR program you should work with, you can consult with your campus Instructional Technology Specialist - find them listed on the Instructional Support contacts page.

How to run OCR in Adobe Acrobat

Step One: scan or open your document
  • Open Adobe Acrobat X Pro
  • If you've already scanned your document,  go to File > Open and open your file
  • If you will be scanning your document now, go to File > Create > From Scanner > and choose "custom scan"
    • In the Custom Scan window, choose your scanner from the list and adjust any necessary settings (document size, double sided, etc.)
    • Adjust the resolution to 300 dpi (dots per inch) or above.  The idea is to get a good quality scan of your document - the better the scan, the easier it is for the OCR to work.
    • Pay special attention to the slider under "Optimized Scanned PDF".   The default option (roughly at the center of the slider) is appropriate for basic scanning and OCR.  If you anticipate that your document will require significant correction, then drag the slider closer to the "high quality" end.
    • Un-check box next to "Make Searchable (Run OCR)"  Later, you can save a step by running the OCR at this time, but for now, it's easier to run the OCR separately.
Step Two: recognize the text
  • Open the Tools panel (click "Tools" in top right) and click "Recognize Text"
  • Click "In this File" and in the window pops up, click the Edit button to adjust your OCR settings.
    • Select the language of the text
    • For output style, choose "Searchable Image"
  • Click okay and you're off!
Step Three: correct any errors
  • In the Tools panel, under "Recognize Text", click on "Find First Suspect"
  • Acrobat will now go through and identify any word conversion it is unsure of and allow you to manually correct them.
  • The pop-up window shows you the picture of the suspected word.  The text on the page shows you Acrobat's interpretation of the suspected word. To fix spelling, click on the word on the page, enter the correct text, and the click "Accept and Find" to move to the next suspect.
    screenshot of proof reading in acrobat

    proof reading in Acrobat

  • To save as a PDF, go to File > Save.  For other file types, go to File > Save As and choose from Word, HTML, plain text, and others.
Help with using Acrobat OCR

How to run OCR in OmniPage

Step One: scan or open your document
  • Open OmniPage Pro (should be an icon on desktop)
  • If you've already scanned your document, locate the drop-down list under Button 1 and choose "load image file".  Now press Button 1 and find your file.
  • If you will be scanning your document now, locate the drop-down list under Button 1 and choose "scan".  Now press Button 1 and follow prompts to scan.
    • If you encounter the "scansoft scanner wizard" at this point, click next until you get to the available scanner list and select the scanner that's connected to your computer.   When it asks you if you want to run diagnostic tests, say No.   It should now tell you your scanner is ready for use.  You will now go be able to select a scan type from the drop-down menu and press button 1 again.
Step Two: recognize the text
  • In the drop-down list under Button 2, select "automatic" and press Button 2. This begins the OCR process.
Step Three: correct any errors
  • When the OCR process is finished,  OmniPage will automatically open the OCR Proofreader.  The proofreader shows you any text OmniPage believes is misspelled, and is similar to a spellchecker in a word processing program.   You can use the proofreader, or close and proofread the document mwanually in the text editor window on the right.
    screenshot of proof reading in omnipage

    proof reading in Omnipage

  • To save document, go to the drop-down list under Button 3 and select "save to file".  Press Button 3, and a dialog box pops up asking you what to name the file, where to save it, and what format you would like it saved in. A few notes on saving your file:
    • switching between Text, Image, and Multiple will reveal different file types in the file type drop-down list.
    • the Converter Options button allows you to further customize the file types.
    • This page has more information for the saving options: OmniPage User Guide - Saving Results.
Help with using Omnipage OCR