Link Search Menu Expand Document

DocumentParser Class

Represents a document parser that use regex-based templates to find and extract required data from documents in PDF or raster image formats.
Inheritance Hierarchy
SystemObject
ByteScout.DocumentParserDocumentParser

Namespace:ByteScout.DocumentParser
Assembly: ByteScout.DocumentParser (in ByteScout.DocumentParser.dll) Version: 6.4.1.617-master
Syntax
public class DocumentParser : IDisposable

The DocumentParser type exposes the following members.

Constructors
NameDescription
Public methodDocumentParser
Initializes a new instance of the DocumentParser class.
Public methodDocumentParser(String, String)
Initializes a new instance of the DocumentParser class.
Top
Properties
NameDescription
Public propertyGenerateTimestamp
Sets whether to generate a timestamp in the parsing results (JSON or XML output format).
Public propertyIgnorePDFPermissions
This option instructs if the SDK should ignore permissions in the PDF document and not generate ParserPermissionsException when permissions is not set for the desired action.

Default is false.

IMPORTANT: THIS OPTION HAVE NOT TO BE ENABLED TO RESPECT OWNERS OF PDF DOCUMENTS. IF YOU SET IT TO TRUE TO IGNORE PERMISSIONS WHICH ARE SET IN PDF DOCUMENT THEN YOU ARE SOLELY LIABLE FOR THIS ACTION AND FOR ANY COPYRIGHT OR OTHER VIOLATIONS AT YOUR OWN RISK. BYTESCOUT IS NOT LIABLE FOR ANY DAMAGES, LOSSES, COPYRIGHT INFRINGEMENTS OR ANY OTHER CONSEQUENCES CAUSED BY IGNORING PERMISSIONS OF PDF DOCUMENT. BY CHANGING THIS OPTION YOU ARE CONFIRMING YOU ARE UNDERSTANDING ALL WRITTEN ABOVE AND DOING IT AT YOUR OWN RISK.

Public propertyLicenseInfo
Gets license information.
Public propertyOCRDetectPageRotation
Detect scanned page rotation.
Public propertyOCRImagePreprocessingFilters
Collection of image preprocessing filters.
Public propertyOCRLanguage
The default language for Optical Character Recognition (OCR). It can be overridden by the template option "ocrLanguage". The valid values are:
  • "eng" - English (default)
  • "deu" - German
  • "fra" - French
  • "spa" - Spanish
  • "nld" - Dutch

Download more languages at https://github.com/bytescout/ocrdata.

Public propertyOCRLanguageDataFolder
Folder that contains OCR language data files.
Public propertyOCRMaximizeCPUUtilization
Gets or sets maximum OCR performance using Intel OpenMP (if available) to accelerate to approximately 30%. Default is false.
Public propertyOCRMode
Recognizes text from embedded images using Optical Character Recognition (OCR).

This option requires appropriate language files in OCRLanguageDataFolder folder. The SDK is shipped with language files for English, French, German and Spanish. You can download more at https://github.com/bytescout/ocrdata.

Public propertyOCRResolution
Resolution of Optical Character Recognition (OCR). Default is 300 DPI.
Public propertyRegistrationKey
Gets or sets the key number part of registration information.
Public propertyRegistrationName
Gets or sets the name part of the registration information.
Public propertyVersion
Gets component version number.
Top
Methods
NameDescription
Public methodAddTemplate(String)
Loads template from YAML or JSON file and adds it to internal template list. File can contain a single template or several templates as array. Multiple YAML templates can also be separated by "---" line instead of array.
Public methodAddTemplate(Template)
Public methodAddTemplateFromString(String)
Loads template from YAML or JSON string and adds it to internal template list. String can contain a single template or several templates as array. Multiple YAML templates can also be separated by "---" line instead of array.
Public methodAddTemplateFromString(String, String)
Loads template from YAML or JSON string and adds it to internal template list. String can contain a single template or several templates as array. Multiple YAML templates can also be separated by "---" line instead of array.
Public methodAddTemplates
Loads templates from specified folder (and its subfolders) to internal template list. Template files must have ".yml" or ".json" extension.
Public methodClearTemplates
Removes all loaded templates.
Public methodDispose
Releases managed resources of the component.
Public methodEquals (Inherited from Object.)
Public methodStatic memberExportResultsToCSV
Exports parsing results to CSV format.
Public methodStatic memberExportResultsToJSON
Exports parsing results to JSON format.
Public methodStatic memberExportResultsToXML
Exports parsing results to XML format.
Public methodStatic memberExportResultsToYAML
Exports parsing results to YAML format.
Protected methodFinalize (Inherited from Object.)
Public methodGetCashedDocumentText
Returns text of the document that is parsed with most recent template. Available only in the FULL version. TRIAL version returns empty string.
Public methodGetDocumentText(Stream, Int32)
Extracts text from a page or entire document. You can use it for template composing and testing.
Public methodGetDocumentText(String, Int32)
Extracts text from a page or entire document. You can use it for template composing and testing.
Public methodGetDocumentText(Stream, Stream, Int32)
Extracts text from a page or entire document. You can use it for template composing and testing.
Public methodGetDocumentText(String, String, Int32)
Extracts text from a page or entire document. You can use it for template composing and testing.
Public methodGetHashCode (Inherited from Object.)
Public methodGetPageCount(Stream)
Returns count of pages in PDF or TIFF document.
Public methodGetPageCount(String)
Returns count of pages in PDF or TIFF document.
Public methodGetType (Inherited from Object.)
Protected methodLog
Protected methodMemberwiseClone (Inherited from Object.)
Public methodParseDocument(Stream)
Process a document using loaded templates.
Public methodParseDocument(String)
Process a document using loaded templates.
Public methodParseDocument(Stream, OutputFormat, CSVOptions)
Parse document.
Public methodParseDocument(String, OutputFormat, CSVOptions)
Parse document.
Public methodParseDocument(Stream, Stream, OutputFormat, CSVOptions)
Process document.
Public methodParseDocument(String, String, OutputFormat, CSVOptions)
Process document.
Public methodParseDocumentPageRange(Stream, IListInt32)
Process a part of the document using loaded templates.
Public methodParseDocumentPageRange(Stream, String)
Process a part of the document using loaded templates.
Public methodParseDocumentPageRange(String, IListInt32)
Process a part of the document using loaded templates.
Public methodParseDocumentPageRange(String, String)
Process a part of the document using loaded templates.
Public methodParseDocumentPageRange(Stream, Int32, Int32)
Process a part of the document using loaded templates.
Public methodParseDocumentPageRange(String, Int32, Int32)
Process a part of the document using loaded templates.
Public methodParseDocumentPageRange(Stream, IListInt32, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(Stream, String, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(String, IListInt32, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(String, String, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(Stream, IListInt32, Stream, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(Stream, Int32, Int32, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(Stream, String, Stream, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(String, IListInt32, String, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(String, Int32, Int32, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(String, String, String, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(Stream, Int32, Int32, Stream, OutputFormat, CSVOptions)
Process a part of the document.
Public methodParseDocumentPageRange(String, Int32, Int32, String, OutputFormat, CSVOptions)
Process a part of the document.
Public methodToString (Inherited from Object.)
Top
Events
NameDescription
Public eventParsingLog
This event is used to deliver parsing warnings and errors.
Public eventPasswordRequired
Occurs when a password is required to open PDF document.
Top
See Also

Reference