1 / 75

Session 901 Using Optical Character Recognition Programs

Explore the world of optical character recognition (OCR) programs at the CTEBVI Conference. Learn how OCR turns images into e-text, the importance of structural recognition, and discover preferred programs for production and consumers. Find useful tips and tricks for improving recognition accuracy and document processing. Join us to delve into the efficient use of OCR tools!

marianasser
Download Presentation

Session 901 Using Optical Character Recognition Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session 901Using Optical Character Recognition Programs Gaeir Dietrich Director High Tech Center Training Unitof the California Community Colleges

  2. Overview • Optical character recognition • Structural recognition • Options • Loading • Zoning • OCR • Editing CTEBVI Conference

  3. Optical Character Recognition (OCR) • OCR turns pictures of text into e-text • Does well unless… • The picture is fuzzy • The contrast is poor • The font is unusual • The font is too small or too large • The material has unusual characters CTEBVI Conference

  4. Structural Recognition • Analyzes the layout of the page • Columns • Headings • Graphics • Tables • Usually does fairly well, unless the layout is non-standard CTEBVI Conference

  5. Getting Better…but… • Although the programs are improving all the time, it is unwise to trust to the automated features. • Learn to know what the program is doing and correct it when it errs. CTEBVI Conference

  6. Programs that Run OCR • Programs for consumers • Kurzweil 1000, 3000 • OpenBook • Intel Reader • Many others… • Programs for production • ABBYY FineReader • Nuance OmniPage CTEBVI Conference

  7. Consumer Programs • Highly automated • Designed for individuals who have print disabilities • Are not good production tools • Do not provide flexibility • Do not allow much overriding • Interfaces not designed for editing CTEBVI Conference

  8. Production Programs in General • A good program for production allows you to… • Control the zones (areas or blocks of text and graphics) • Add, delete, change • Edit easily • Improve recognition CTEBVI Conference

  9. Preferred Programs • ABBYY FineReader • Relatively easy to learn • Fairly intuitive • Good structural recognition • Nuance OmniPage • Less intuitive but more accessible • Often does better with technical materials CTEBVI Conference

  10. Both Good Tools • If you can afford to have both, it’s nice, but not absolutely necessary. • If you have both, run a couple test pages through each to see which is doing better on a particular job. CTEBVI Conference

  11. For Today • Focus on ABBYY FineReader • A little less expensive • Easier for folks who do not use an OCR program every day • Let’s launch and go! CTEBVI Conference

  12. Wizards Are Evil… • Turn off the automated “Tasks” manager • Uncheck the Show at startup check box • Bottom left corner of the Tasks box • Choose Open Image/PDF CTEBVI Conference

  13. CTEBVI Conference

  14. Under the Hood • For best results with a program, set up your options before you begin! • Tools > Options • Shortcut keys: Ctrl + Shift + O CTEBVI Conference

  15. CTEBVI Conference

  16. Document Tab • Languages drop-down menu allows you to select the languages that are in your document. CTEBVI Conference

  17. CTEBVI Conference

  18. More Languages • If you do not see you languages you need, select More Languages. • Notice at the end of the list, it includes computer languages, numbers, and chemical formulas. • Turn on what you need, but only what you need. CTEBVI Conference

  19. CTEBVI Conference

  20. Tip • If you are running OCR on math, turn on Greek. • Greek will allow the program to recognize alphas, deltas, sigmas, etc. • For foreign language, turn on all the languages in the book. • It will recognize the diacritical marks. CTEBVI Conference

  21. Scan and Open Tab • Change the radio button under General to “Do not read and analyze acquired page images automatically.” • Remember…wizards are evil… CTEBVI Conference

  22. Another Decision • Under Image Preprocessing, you have the choice to Detect page orientation. • Try it if you have many pages turned, but it sometimes goofs. • Also note the Split facing pages feature. • Nice if you have a two-page spread. CTEBVI Conference

  23. CTEBVI Conference

  24. Read Tab • The “pattern editor” is useful if you have a book with a very unusual font. • You can map the letters by telling the program what each letter is. • Not worth it for occasional errors, but very useful for books filled with otherwise unreadable fonts. CTEBVI Conference

  25. CTEBVI Conference

  26. Save Tab • Specify which format you want as an end product. • For Word docs, choose either Formatted Text or Plain Text. • Otherwise, you can get the dreaded “textbox.” CTEBVI Conference

  27. Considerations • You may or may not want to keep headers and footers. • I generally keep them to pull the page numbers. • You may want to keep the page breaks. • Retaining page breaks helps to maintain one-to-one page correspondence with the book. CTEBVI Conference

  28. Paper Size • In some cases, you may wish to work with a custom paper size and choose “Increase paper size to fit content.” • This feature can be helpful when you are retaining everything on the page but not the layout. CTEBVI Conference

  29. CTEBVI Conference

  30. View Tab • The view tab has some nice features for those with visual impairments. • Colors are completely customizable. • Choose the mark-up, then click on the color swatch. • Choose Define Custom Colors for more choices. CTEBVI Conference

  31. CTEBVI Conference

  32. More Choices • The View Tab also allows you to control the appearance of your working window. • Pages window > Thumbnails • Shows graphics of the pages on the left-hand side (under “Pages”). CTEBVI Conference

  33. CTEBVI Conference

  34. More Accessible • Instead, you can see a detail view. • Detail view is more accessible for screen readers. • Otherwise, it is personal preference. • Pages window > Details • Shows text instead of graphics CTEBVI Conference

  35. CTEBVI Conference

  36. Advanced Tab • This tab has choices about spell check and editing. • Please note that if the program is handling spacing around punctuation incorrectly, there is an option on this tab to fix the problem. CTEBVI Conference

  37. CTEBVI Conference

  38. Customizing Tools • Choose Tools > Customize • Under Categories, select Image • Move two tools to your Quick Access toolbar • Select the tool and use the double arrow button to move the tool CTEBVI Conference

  39. Move Eraser CTEBVI Conference

  40. Move Order Areas CTEBVI Conference

  41. Turn on Quick Tools • View > Toolbars > Quick Access CTEBVI Conference

  42. CTEBVI Conference

  43. Ready • We have set our options. • We have customized our tools. • These features are now set. • Do not need to do again until reinstall program. CTEBVI Conference

  44. Time to Start Working! CTEBVI Conference

  45. Please Note • Although you can scan with the program, preference is to scan with your scanning utility (that came with your scanner) and load the resulting TIFF or JPEGs into FineReader. • No scanning utility? Then go ahead and scan with FineReader (Ctrl + K). CTEBVI Conference

  46. Loading a File • Open an Image • Click the open icon • Control + O • Image files include TIFF, JPEG, PDF, BMP, GIF, etc. CTEBVI Conference

  47. CTEBVI Conference

  48. Workspace • The program has three primary areas • Pages Pane • Either thumbnails or details • Allows simple navigation of pages • Image Pane • Your graphic • Text Pane • Area where the text from OCR will show CTEBVI Conference

  49. CTEBVI Conference

  50. Handy Tip • Whichever pane has your focus, bring up more information by using the shortcut Alt + Enter. • Use shortcut again to toggle off • Under the Image Pane, you get information about the image. CTEBVI Conference

More Related