Link Search Menu Expand Document

Classifier Class

Represents a class that uses list of rules to classify a document (PDF or image).
Inheritance Hierarchy
SystemObject
ByteScout.DocumentParserClassifier

Namespace:ByteScout.DocumentParser
Assembly: ByteScout.DocumentParser (in ByteScout.DocumentParser.dll) Version: 6.4.1.617-master
Syntax
public class Classifier : IDisposable

The Classifier type exposes the following members.

Constructors
NameDescription
Public methodClassifier
Initializes a new instance of the Classifier class.
Public methodClassifier(String, String)
Initializes a new instance of the Classifier class.
Top
Properties
NameDescription
Public propertyIgnorePDFPermissions
This option instructs if the SDK should ignore permissions in the PDF document and not generate ParserPermissionsException when permissions is not set for the desired action.

Default is false.

IMPORTANT: THIS OPTION HAVE NOT TO BE ENABLED TO RESPECT OWNERS OF PDF DOCUMENTS. IF YOU SET IT TO TRUE TO IGNORE PERMISSIONS WHICH ARE SET IN PDF DOCUMENT THEN YOU ARE SOLELY LIABLE FOR THIS ACTION AND FOR ANY COPYRIGHT OR OTHER VIOLATIONS AT YOUR OWN RISK. BYTESCOUT IS NOT LIABLE FOR ANY DAMAGES, LOSSES, COPYRIGHT INFRINGEMENTS OR ANY OTHER CONSEQUENCES CAUSED BY IGNORING PERMISSIONS OF PDF DOCUMENT. BY CHANGING THIS OPTION YOU ARE CONFIRMING YOU ARE UNDERSTANDING ALL WRITTEN ABOVE AND DOING IT AT YOUR OWN RISK.

Public propertyOCRDetectPageRotation
Detect scanned page rotation.
Public propertyOCRLanguage
The default language for Optical Character Recognition (OCR). It can be overridden by the template option "ocrLanguage". The valid values are:
  • "eng" - English (default)
  • "deu" - German
  • "fra" - French
  • "spa" - Spanish
  • "nld" - Dutch

Download more languages at https://github.com/bytescout/ocrdata.

Public propertyOCRLanguageDataFolder
Folder containing OCR language data files.
Public propertyOCRMaximizeCPUUtilization
Gets or sets maximum OCR performance using Intel OpenMP (if available) to accelerate to approximately 30%. Default is false.
Public propertyOCRMode
Recognizes text from embedded images using Optical Character Recognition (OCR).

This option requires appropriate language files in OCRLanguageDataFolder folder. The SDK is shipped with language files for English, French, German and Spanish. You can download more at https://github.com/bytescout/ocrdata.

Public propertyOCRResolution
Resolution of Optical Character Recognition (OCR). Default is 300 DPI.
Public propertyRegistrationKey
Gets or sets the key number part of registration information.
Public propertyRegistrationName
Gets or sets the name part of the registration information.
Top
Methods
NameDescription
Public methodAddRule
Adds a rule for the detection of a document by its content
Public methodAddRulesFromSpreadsheet(Stream, Boolean)
Adds rules and key phrases from a spreadsheet file stream (XLS, XLSX, CSV, ODS).
Public methodAddRulesFromSpreadsheet(String, Boolean)
Adds rules and key phrases from a spreadsheet file (XLS, XLSX, CSV, ODS).
Public methodClassifyDocument(Stream)
Classifies document the document using loaded rules. Use AddRule(String, ClassifierRuleLogic, ICollectionString, Boolean) and AddRulesFromSpreadsheet(Stream, Boolean) methods to add rules.
Public methodClassifyDocument(String)
Classifies document the document using loaded rules. Use AddRule(String, ClassifierRuleLogic, ICollectionString, Boolean) and AddRulesFromSpreadsheet(String, Boolean) methods to add rules.
Public methodDispose
Releases managed resources of the component.
Public methodEquals (Inherited from Object.)
Protected methodFinalize (Inherited from Object.)
Public methodGetHashCode (Inherited from Object.)
Public methodGetType (Inherited from Object.)
Protected methodMemberwiseClone (Inherited from Object.)
Public methodReset
Resets Classifier.
Public methodToString (Inherited from Object.)
Top
Events
NameDescription
Public eventPasswordRequired
Occurs when a password is required to open PDF document.
Top
See Also

Reference