ITextExtractor Interface
Free Trial Web API version Licensing Request A Quote
HAVE QUESTIONS OR NEED HELP?SUBMIT THE SUPPORT REQUEST FORM or write email toSUPPORT@BYTESCOUT.COM
Defines the PDF to Text extractor interface.
Namespace:Bytescout.PDFExtractorAssembly: Bytescout.PDFExtractor (in Bytescout.PDFExtractor.dll) Version: 13.4.0.4760-master
Syntax
The ITextExtractor type exposes the following members.
Properties
Name | Description | |
---|---|---|
FoundText | Contains the search result of Find(Int32, String, Boolean)or FindNext methods. | |
FuzzySearch | Sets whether to use "fuzzy" text search algorithm. It allows to find "approximately equal" strings. For example, the search string "fox" will also find "fix" and "fax. This might be useful for compensation of some common OCR errors, like "paralle1" or "paralle|". | |
FuzzySearchPermissibleErrors | Sets the string equality approximation for the fuzzy search algorithm. Simply, this is the number of permissible errors in the search string. Value 1 or 2 is okay, 3 is iffy, 4 is a poor match. Default is 1. | |
PageSeparator | Sets the page separator character or string. Default is '\f' (Form Feed). | |
RegexSearch | Sets whether to search the text using regular expressions. | |
WordMatchingMode | Sets the word matching mode (used in text search and automatic removal of hyphens). This option is ignored when regular expressions are enabled (when is true). In case of regular expressions, you should use '\b' metacharacter to specify word bounds. | |
WordMatchingPunctuationMarks | Sets punctuation marks used by word matching. These marks are considered as a part of a word. Default are: ."'“” |
Methods
Name | Description | |
---|---|---|
Find(Int32, String, Boolean) | Searches the document page for specified text. | |
Find(Int32, String, RegexOptions) | Searches the document page for specified text in Regex mode with specified options. | |
FindAll | Searches for all occurrences of specified text in specified document page or in entire document. | |
FindAllToJSON | Searches for all occurrences of specified text in specified document page or in entire document and returns result as JSON string. | |
FindNext | Continues the text search started by one of Find() methods. | |
GetPageTextAsVariant | Returns page text as array of bytes. This is COM/ActiveX-compatible version of the method SavePageTextToStream(Int32, Stream) for in-memory processing of PDF documents or images. | |
GetText | Extracts text from whole document. | |
GetText(IListInt32) | Extracts text from specified pages. | |
GetText(String) | Extracts text from specified page ranges. | |
GetText(Int32, Int32) | Extracts text from specified page range. | |
GetTextAsVariant | Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(Stream) for in-memory processing of PDF documents or images. | |
GetTextAsVariant(String) | Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(String, Stream) for in-memory processing of PDF documents or images. | |
GetTextAsVariant(Int32, Int32) | Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(Int32, Int32, Stream) for in-memory processing of PDF documents or images. | |
GetTextFromPage | Extracts text from specified document page. | |
SavePageTextToFile(Int32, String) | Saves page text to file. | |
SavePageTextToFile(Int32, String, Encoding) | Saves page text to file in specified encoding. | |
SavePageTextToStream(Int32, Stream) | Saves page text to stream. | |
SavePageTextToStream(Int32, Stream, Encoding) | Saves page text to stream in specified encoding. | |
SaveTextToFile(String) | Saves document text to file. | |
SaveTextToFile(IListInt32, String) | Saves text from specified pages to file. | |
SaveTextToFile(String, String) | Saves text from specified page ranges to file. | |
SaveTextToFile(String, Encoding) | Saves document text to file in specified encoding. | |
SaveTextToFile(IListInt32, String, Encoding) | Saves text from specified pages to file in specified encoding. | |
SaveTextToFile(Int32, Int32, String) | Saves text from specified page range to file. | |
SaveTextToFile(String, String, Encoding) | Saves text from specified page ranges to file in specified encoding. | |
SaveTextToFile(Int32, Int32, String, Encoding) | Saves text from specified page range to file in specified encoding. | |
SaveTextToStream(Stream) | Saves document text to stream. | |
SaveTextToStream(IListInt32, Stream) | Saves text from specified page range to stream. | |
SaveTextToStream(Stream, Encoding) | Saves document text to stream in specified encoding. | |
SaveTextToStream(String, Stream) | Saves text from specified page range to stream. | |
SaveTextToStream(IListInt32, Stream, Encoding) | Saves text from specified page range to stream in specified encoding. | |
SaveTextToStream(Int32, Int32, Stream) | Saves text from specified page range to stream. | |
SaveTextToStream(String, Stream, Encoding) | Saves text from specified page range to stream in specified encoding. | |
SaveTextToStream(Int32, Int32, Stream, Encoding) | Saves text from specified page range to stream in specified encoding. |
See Also