TextRecognizer Class
Free Trial Web API version Licensing Request A Quote
HAVE QUESTIONS OR NEED HELP?SUBMIT THE SUPPORT REQUEST FORM or write email toSUPPORT@BYTESCOUT.COM
Represents text recognizer that able to extract text from scanned PDF files and PNG, JPEG, BMP and TIFF (single-page) images using Optical Character Recognition (OCR).
Inheritance Hierarchy
ByteScout.TextRecognitionBaseRecognizer
ByteScout.TextRecognitionTextRecognizer
Namespace:ByteScout.TextRecognition
Assembly: ByteScout.TextRecognition (in ByteScout.TextRecognition.dll) Version: 2.6.0.314-master
Syntax
The TextRecognizer type exposes the following members.
Constructors
Name | Description | |
---|---|---|
![]() | TextRecognizer | Initializes a new instance of the TextRecognizer class. |
![]() | TextRecognizer(String, String) | Initializes a new instance of the TextRecognizer class. |
Properties
Name | Description | |
---|---|---|
![]() | AutoDetectPageRotation | Gets or sets a value indicating whether the TextRecognizer will try to automatically detect the rotation of a scanned page. Default is false. |
![]() | BlackList | A set of characters not allowed to be recognized from scanned document. The resulting text will only contain characters that are not in this list. This helps improve uncertain recognition. |
![]() | ComHelpers | Set of helping methods for use from COM/ActiveX. |
![]() | Corrections | Collection of corrections automatically applied to recognized text to fix repeating recognition errors. |
![]() | ImagePreprocessingFilters | Collection of image preprocessing filters. |
![]() | IsDocumentLoaded | Gets whether a document is loaded. (Inherited from BaseRecognizer.) |
![]() | KeepTextFormatting | Gets or sets whether to try to keep the text formatting. |
![]() | LicenseInfo | Gets license information. (Inherited from BaseRecognizer.) |
![]() | MaximizeCPUUtilization | Gets or sets maximum OCR performance using Intel OpenMP (if available) to accelerate to approximately 30%. Default is false. (Inherited from BaseRecognizer.) |
![]() | OCRLanguage | Language for Optical Character Recognition (OCR). The valid values are: (Inherited from BaseRecognizer.)
Download more languages at https://github.com/bytescout/ocrdata. |
![]() | OCRLanguageDataFolder | Folder containing OCR language data files. (Inherited from BaseRecognizer.) |
![]() | PageSeparator | Gets or sets the page separator character or string. Default is "\r\n". |
![]() | PDFRenderingOptions | Gets or sets PDF rendering options. (Inherited from BaseRecognizer.) |
![]() | PDFRenderingResolution | Gets or sets PDF rendering resolution. Default is 300 DPI. (Inherited from BaseRecognizer.) |
![]() | RecognitionAreas | Collection of page areas intended for text recognition. |
![]() | RegistrationKey | Gets or sets the key number part of registration information. (Inherited from BaseRecognizer.) |
![]() | RegistrationName | Gets or sets the name part of the registration information. (Inherited from BaseRecognizer.) |
![]() | TrimLeadingSpaces | Gets or sets whether to trim redundant leading spaces. Default is false. Works only if KeepTextFormatting is true. |
![]() | UnwrapParagraphs | Gets or sets whether to unwrap paragraph text. Default is false. Works only if KeepTextFormatting is true. |
![]() | Version | Gets version of the component. (Inherited from BaseRecognizer.) |
![]() | WhiteList | A set of characters allowed to be recognized from scanned document. Only characters from this list will appear in the result text. This helps improve uncertain recognition. |
Methods
Name | Description | |
---|---|---|
![]() | CheckOCRComponents | (Inherited from BaseRecognizer.) |
![]() | Clear | Releases loaded document and allocated resources. (Inherited from BaseRecognizer.) |
![]() | Dispose | Releases managed resources of the component. (Inherited from BaseRecognizer.) |
![]() | Equals | (Inherited from Object.) |
![]() | Finalize | (Inherited from Object.) |
![]() | GetHashCode | (Inherited from Object.) |
![]() | GetOCRObjects | Performs the recognition and returns list of recognized text objects of specified level of discretization. |
![]() | GetOCRObjectsAsJSON | Performs the recognition and returns the list of recognized text objects of specified level of discretization as JSON string. |
![]() | GetOCRObjectsAsXML | Performs the recognition and returns the list of recognized text objects of specified level of discretization as XML string. |
![]() | GetPageCount | Returns number of pages in loaded document. (Inherited from BaseRecognizer.) |
![]() | GetPageHeight | Returns document page height in pixels. (Inherited from BaseRecognizer.) |
![]() | GetPageSize | Returns document page dimensions in pixels. (Inherited from BaseRecognizer.) |
![]() | GetPageWidth | Returns document page width in pixels. (Inherited from BaseRecognizer.) |
![]() | GetPreprocessedPageBitmap | Returns preview image of document page with preprocessing filters applied. |
![]() | GetText | Reads text from specified document page range. |
![]() | GetType | (Inherited from Object.) |
![]() | LoadDocument(Byte) | Loads document from byte array. (Inherited from BaseRecognizer.) |
![]() | LoadDocument(Image) | Loads document from Image object. (Inherited from BaseRecognizer.) |
![]() | LoadDocument(Int64) | Loads document from Win32 HBITMAP structure. (Inherited from BaseRecognizer.) |
![]() | LoadDocument(Stream) | Loads document from stream. (Inherited from BaseRecognizer.) |
![]() | LoadDocument(String) | Loads document from file. (Inherited from BaseRecognizer.) |
![]() | LoadDocument(ScreenshotMaker) | Load screenshot from the main display. Use SetScreenshotArea(Int32, Int32, Int32, Int32) to set a portion of the screen to take screenshot from. (Inherited from BaseRecognizer.) |
![]() | MemberwiseClone | (Inherited from Object.) |
![]() | OnPasswordRequired | (Inherited from BaseRecognizer.) |
![]() | SaveOCRObjectsAsJSON | Performs the recognition and saves the list of recognized text objects of specified level of discretization to JSON file. |
![]() | SaveOCRObjectsAsXML | Performs the recognition and saves the list of recognized text objects of specified level of discretization to XML file. |
![]() | SavePreprocessedPageBitmap | Saves bitmap of document page with preprocessing filters applied. The image is saved in PNG format. |
![]() | SaveText(Stream, Int32, Int32, Encoding) | Saves text from specified page range to Stream. |
![]() | SaveText(String, Int32, Int32, Encoding) | Saves text from specified page range to file. |
![]() | ToString | (Inherited from Object.) |
Events
Name | Description | |
---|---|---|
![]() | PasswordRequired | Occurs when a password is required to open PDF document. (Inherited from BaseRecognizer.) |
See Also