Link Search Menu Expand Document

OCRMode Enumeration

OCR (Optical Character Recognition) usage mode.

Namespace:Bytescout.PDFExtractor
Assembly: Bytescout.PDFExtractor (in Bytescout.PDFExtractor.dll) Version: 13.4.0.4760-master
Syntax
public enum OCRMode
Members
Member nameValueDescription
Off0 Off. No OCR is used.
Auto1 Similar to TextFromImagesAndVectorsAndFonts but checks if the page contains only raster images to decide if need to use OCR. Runs OCR only if page contains very few text and one or more raster images. The result contains text objects produced from images and vector drawings.
TextFromImagesAndVectorsAndFonts2 Always runs OCR to extract text from images and vector drawings (if any). See also .TextFromImagesAndFonts mode to read from objects except vector drawings. The result contains text objects from PDF and text objects produced from images and vector drawings using OCR functionality if any.
TextFromImagesAndVectorsAndRepairedFonts3 Special mode: extracts text from images and vector drawings and repairs text from fonts fixing the incorrect encoding. Some PDF files contain visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like that using the OCR functionality. The result contains text objects from PDF and text objects produced from images and vector drawings using OCR functionality if any.
TextFromRepairedFontsOnly4 Special mode: repairs text objects with incorrect encoding using OCR functionality. Images and vectors are not processed in this mode. Some PDF files contains visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like this using OCR function. This mode returns repaired text objects only (no images or vector drawings are processed).
TextFromImagesAndRepairedFonts5 Special mode: extracts text from raster images (but skips vector drawings) and repairs text objects with incorrect encoding Some PDF files contains visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like this using the OCR functionality. This mode returns repaired text objects and text objects produced from raster images (no vector drawings are processed).
TextFromImagesAndFonts6 Runs OCR to extract text from images (but skips vector drawings) plus the text objects. The result contains text objects from PDF and text objects produced from images (but no vector drawings are processed) using OCR functionality.
TextFromImagesOnly7 Runs OCR to extract text from images (but skips vector drawings) plus the text objects. The result contains text extracted from images only.
TextFromVectorsOnly8 Runs OCR to extract text from vector drawings only. The result contains text objects from vector drawings only.
TextFromImagesAndVectorsOnly9 Runs OCR to extract text from images and vector drawings only. no text from pdf objects is included. The result contains text objects from vector drawings only.
TextFromVectorsAndRepairedFonts10 Special mode: extracts text from vector drawings and repairs text from fonts fixing the incorrect encoding. Some PDF files contain visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like that using the OCR functionality.
TextFromVectorsAndFonts11 Runs OCR to extract text from vector drawings (but skips images) plus the text objects. The result contains text objects from PDF and text objects produced from vector drawings using OCR functionality.
AutoRepairFonts16 Sets whether to automatically try to detect PDF documents with corrupted text and forces OCR font repair instead.

(!) Warning: the detection does not work with non-English texts or with small amount of text on the page.

See Also

Reference