Link Search Menu Expand Document

TextRecognizer Class

Represents text recognizer that able to extract text from scanned PDF files and PNG, JPEG, BMP and TIFF (single-page) images using Optical Character Recognition (OCR).
Inheritance Hierarchy
SystemObject
ByteScout.TextRecognitionBaseRecognizer
ByteScout.TextRecognitionTextRecognizer

Namespace:ByteScout.TextRecognition
Assembly: ByteScout.TextRecognition (in ByteScout.TextRecognition.dll) Version: 2.6.0.314-master
Syntax
public class TextRecognizer : BaseRecognizer

The TextRecognizer type exposes the following members.

Constructors
NameDescription
Public methodTextRecognizer
Initializes a new instance of the TextRecognizer class.
Public methodTextRecognizer(String, String)
Initializes a new instance of the TextRecognizer class.
Top
Properties
NameDescription
Public propertyAutoDetectPageRotation
Gets or sets a value indicating whether the TextRecognizer will try to automatically detect the rotation of a scanned page. Default is false.
Public propertyBlackList
A set of characters not allowed to be recognized from scanned document. The resulting text will only contain characters that are not in this list. This helps improve uncertain recognition.
Public propertyComHelpers
Set of helping methods for use from COM/ActiveX.
Public propertyCorrections
Collection of corrections automatically applied to recognized text to fix repeating recognition errors.
Public propertyImagePreprocessingFilters
Collection of image preprocessing filters.
Public propertyIsDocumentLoaded
Gets whether a document is loaded.
(Inherited from BaseRecognizer.)
Public propertyKeepTextFormatting
Gets or sets whether to try to keep the text formatting.
Public propertyLicenseInfo
Gets license information.
(Inherited from BaseRecognizer.)
Public propertyMaximizeCPUUtilization
Gets or sets maximum OCR performance using Intel OpenMP (if available) to accelerate to approximately 30%. Default is false.
(Inherited from BaseRecognizer.)
Public propertyOCRLanguage
Language for Optical Character Recognition (OCR). The valid values are:
  • "eng" - English (default)
  • "deu" - German
  • "fra" - French
  • "spa" - Spanish

Download more languages at https://github.com/bytescout/ocrdata.

(Inherited from BaseRecognizer.)
Public propertyOCRLanguageDataFolder
Folder containing OCR language data files.
(Inherited from BaseRecognizer.)
Public propertyPageSeparator
Gets or sets the page separator character or string. Default is "\r\n".
Public propertyPDFRenderingOptions
Gets or sets PDF rendering options.
(Inherited from BaseRecognizer.)
Public propertyPDFRenderingResolution
Gets or sets PDF rendering resolution. Default is 300 DPI.
(Inherited from BaseRecognizer.)
Public propertyRecognitionAreas
Collection of page areas intended for text recognition.
Public propertyRegistrationKey
Gets or sets the key number part of registration information.
(Inherited from BaseRecognizer.)
Public propertyRegistrationName
Gets or sets the name part of the registration information.
(Inherited from BaseRecognizer.)
Public propertyTrimLeadingSpaces
Gets or sets whether to trim redundant leading spaces. Default is false. Works only if KeepTextFormatting is true.
Public propertyUnwrapParagraphs
Gets or sets whether to unwrap paragraph text. Default is false. Works only if KeepTextFormatting is true.
Public propertyVersion
Gets version of the component.
(Inherited from BaseRecognizer.)
Public propertyWhiteList
A set of characters allowed to be recognized from scanned document. Only characters from this list will appear in the result text. This helps improve uncertain recognition.
Top
Methods
NameDescription
Protected methodCheckOCRComponents
(Inherited from BaseRecognizer.)
Public methodClear
Releases loaded document and allocated resources.
(Inherited from BaseRecognizer.)
Public methodDispose
Releases managed resources of the component.
(Inherited from BaseRecognizer.)
Public methodEquals (Inherited from Object.)
Protected methodFinalize (Inherited from Object.)
Public methodGetHashCode (Inherited from Object.)
Public methodGetOCRObjects
Performs the recognition and returns list of recognized text objects of specified level of discretization.
Public methodGetOCRObjectsAsJSON
Performs the recognition and returns the list of recognized text objects of specified level of discretization as JSON string.
Public methodGetOCRObjectsAsXML
Performs the recognition and returns the list of recognized text objects of specified level of discretization as XML string.
Public methodGetPageCount
Returns number of pages in loaded document.
(Inherited from BaseRecognizer.)
Public methodGetPageHeight
Returns document page height in pixels.
(Inherited from BaseRecognizer.)
Public methodGetPageSize
Returns document page dimensions in pixels.
(Inherited from BaseRecognizer.)
Public methodGetPageWidth
Returns document page width in pixels.
(Inherited from BaseRecognizer.)
Public methodGetPreprocessedPageBitmap
Returns preview image of document page with preprocessing filters applied.
Public methodGetText
Reads text from specified document page range.
Public methodGetType (Inherited from Object.)
Public methodLoadDocument(Byte)
Loads document from byte array.
(Inherited from BaseRecognizer.)
Public methodLoadDocument(Image)
Loads document from Image object.
(Inherited from BaseRecognizer.)
Public methodLoadDocument(Int64)
Loads document from Win32 HBITMAP structure.
(Inherited from BaseRecognizer.)
Public methodLoadDocument(Stream)
Loads document from stream.
(Inherited from BaseRecognizer.)
Public methodLoadDocument(String)
Loads document from file.
(Inherited from BaseRecognizer.)
Public methodLoadDocument(ScreenshotMaker)
Load screenshot from the main display. Use SetScreenshotArea(Int32, Int32, Int32, Int32) to set a portion of the screen to take screenshot from.
(Inherited from BaseRecognizer.)
Protected methodMemberwiseClone (Inherited from Object.)
Protected methodOnPasswordRequired
(Inherited from BaseRecognizer.)
Public methodSaveOCRObjectsAsJSON
Performs the recognition and saves the list of recognized text objects of specified level of discretization to JSON file.
Public methodSaveOCRObjectsAsXML
Performs the recognition and saves the list of recognized text objects of specified level of discretization to XML file.
Public methodSavePreprocessedPageBitmap
Saves bitmap of document page with preprocessing filters applied. The image is saved in PNG format.
Public methodSaveText(Stream, Int32, Int32, Encoding)
Saves text from specified page range to Stream.
Public methodSaveText(String, Int32, Int32, Encoding)
Saves text from specified page range to file.
Public methodToString (Inherited from Object.)
Top
Events
NameDescription
Public eventPasswordRequired
Occurs when a password is required to open PDF document.
(Inherited from BaseRecognizer.)
Top
See Also

Reference