Return to previous page Explore Document Parser SDK

DocumentParser Class

Free Trial Web API version Licensing Request A Quote

HAVE QUESTIONS OR NEED HELP?SUBMIT THE SUPPORT REQUEST FORM or write email toSUPPORT@BYTESCOUT.COM

Represents a document parser that use regex-based templates to find and extract required data from documents in PDF or raster image formats.

Inheritance Hierarchy

SystemObject
ByteScout.DocumentParserDocumentParser

Namespace:ByteScout.DocumentParser
Assembly: ByteScout.DocumentParser (in ByteScout.DocumentParser.dll) Version: 6.4.1.617-master

Syntax

C++

Copy

public class DocumentParser : IDisposable

Public Class DocumentParser
	Implements IDisposable

public ref class DocumentParser : IDisposable

type DocumentParser =  
    class
        interface IDisposable
    end

The DocumentParser type exposes the following members.

Constructors

	Name	Description
	DocumentParser	Initializes a new instance of the DocumentParser class.
	DocumentParser(String, String)	Initializes a new instance of the DocumentParser class.

Top

Properties

	Name	Description
	GenerateTimestamp	Sets whether to generate a timestamp in the parsing results (JSON or XML output format).
	IgnorePDFPermissions	This option instructs if the SDK should ignore permissions in the PDF document and not generate ParserPermissionsException when permissions is not set for the desired action. Default is false. IMPORTANT: THIS OPTION HAVE NOT TO BE ENABLED TO RESPECT OWNERS OF PDF DOCUMENTS. IF YOU SET IT TO TRUE TO IGNORE PERMISSIONS WHICH ARE SET IN PDF DOCUMENT THEN YOU ARE SOLELY LIABLE FOR THIS ACTION AND FOR ANY COPYRIGHT OR OTHER VIOLATIONS AT YOUR OWN RISK. BYTESCOUT IS NOT LIABLE FOR ANY DAMAGES, LOSSES, COPYRIGHT INFRINGEMENTS OR ANY OTHER CONSEQUENCES CAUSED BY IGNORING PERMISSIONS OF PDF DOCUMENT. BY CHANGING THIS OPTION YOU ARE CONFIRMING YOU ARE UNDERSTANDING ALL WRITTEN ABOVE AND DOING IT AT YOUR OWN RISK.
	LicenseInfo	Gets license information.
	OCRDetectPageRotation	Detect scanned page rotation.
	OCRImagePreprocessingFilters	Collection of image preprocessing filters.
	OCRLanguage	The default language for Optical Character Recognition (OCR). It can be overridden by the template option "ocrLanguage". The valid values are: "eng" - English (default) "deu" - German "fra" - French "spa" - Spanish "nld" - Dutch Download more languages at https://github.com/bytescout/ocrdata.
	OCRLanguageDataFolder	Folder that contains OCR language data files.
	OCRMaximizeCPUUtilization	Gets or sets maximum OCR performance using Intel OpenMP (if available) to accelerate to approximately 30%. Default is false.
	OCRMode	Recognizes text from embedded images using Optical Character Recognition (OCR). This option requires appropriate language files in OCRLanguageDataFolder folder. The SDK is shipped with language files for English, French, German and Spanish. You can download more at https://github.com/bytescout/ocrdata.
	OCRResolution	Resolution of Optical Character Recognition (OCR). Default is 300 DPI.
	RegistrationKey	Gets or sets the key number part of registration information.
	RegistrationName	Gets or sets the name part of the registration information.
	Version	Gets component version number.

Top

Methods

	Name	Description
	AddTemplate(String)	Loads template from YAML or JSON file and adds it to internal template list. File can contain a single template or several templates as array. Multiple YAML templates can also be separated by "---" line instead of array.
	AddTemplate(Template)
	AddTemplateFromString(String)	Loads template from YAML or JSON string and adds it to internal template list. String can contain a single template or several templates as array. Multiple YAML templates can also be separated by "---" line instead of array.
	AddTemplateFromString(String, String)	Loads template from YAML or JSON string and adds it to internal template list. String can contain a single template or several templates as array. Multiple YAML templates can also be separated by "---" line instead of array.
	AddTemplates	Loads templates from specified folder (and its subfolders) to internal template list. Template files must have ".yml" or ".json" extension.
	ClearTemplates	Removes all loaded templates.
	Dispose	Releases managed resources of the component.
	Equals	(Inherited from Object.)
	ExportResultsToCSV	Exports parsing results to CSV format.
	ExportResultsToJSON	Exports parsing results to JSON format.
	ExportResultsToXML	Exports parsing results to XML format.
	ExportResultsToYAML	Exports parsing results to YAML format.
	Finalize	(Inherited from Object.)
	GetCashedDocumentText	Returns text of the document that is parsed with most recent template. Available only in the FULL version. TRIAL version returns empty string.
	GetDocumentText(Stream, Int32)	Extracts text from a page or entire document. You can use it for template composing and testing.
	GetDocumentText(String, Int32)	Extracts text from a page or entire document. You can use it for template composing and testing.
	GetDocumentText(Stream, Stream, Int32)	Extracts text from a page or entire document. You can use it for template composing and testing.
	GetDocumentText(String, String, Int32)	Extracts text from a page or entire document. You can use it for template composing and testing.
	GetHashCode	(Inherited from Object.)
	GetPageCount(Stream)	Returns count of pages in PDF or TIFF document.
	GetPageCount(String)	Returns count of pages in PDF or TIFF document.
	GetType	(Inherited from Object.)
	Log
	MemberwiseClone	(Inherited from Object.)
	ParseDocument(Stream)	Process a document using loaded templates.
	ParseDocument(String)	Process a document using loaded templates.
	ParseDocument(Stream, OutputFormat, CSVOptions)	Parse document.
	ParseDocument(String, OutputFormat, CSVOptions)	Parse document.
	ParseDocument(Stream, Stream, OutputFormat, CSVOptions)	Process document.
	ParseDocument(String, String, OutputFormat, CSVOptions)	Process document.
	ParseDocumentPageRange(Stream, IListInt32)	Process a part of the document using loaded templates.
	ParseDocumentPageRange(Stream, String)	Process a part of the document using loaded templates.
	ParseDocumentPageRange(String, IListInt32)	Process a part of the document using loaded templates.
	ParseDocumentPageRange(String, String)	Process a part of the document using loaded templates.
	ParseDocumentPageRange(Stream, Int32, Int32)	Process a part of the document using loaded templates.
	ParseDocumentPageRange(String, Int32, Int32)	Process a part of the document using loaded templates.
	ParseDocumentPageRange(Stream, IListInt32, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(Stream, String, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(String, IListInt32, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(String, String, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(Stream, IListInt32, Stream, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(Stream, Int32, Int32, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(Stream, String, Stream, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(String, IListInt32, String, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(String, Int32, Int32, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(String, String, String, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(Stream, Int32, Int32, Stream, OutputFormat, CSVOptions)	Process a part of the document.
	ParseDocumentPageRange(String, Int32, Int32, String, OutputFormat, CSVOptions)	Process a part of the document.
	ToString	(Inherited from Object.)

Top

Events

	Name	Description
	ParsingLog	This event is used to deliver parsing warnings and errors.
	PasswordRequired	Occurs when a password is required to open PDF document.

Top

Reference

ByteScout.DocumentParser Namespace