Link Search Menu Expand Document

ITextExtractor Interface

Defines the PDF to Text extractor interface.

Namespace:Bytescout.PDFExtractor
Assembly: Bytescout.PDFExtractor (in Bytescout.PDFExtractor.dll) Version: 13.4.0.4760-master
Syntax
public interface ITextExtractor

The ITextExtractor type exposes the following members.

Properties
NameDescription
Public propertyFoundText
Contains the search result of Find(Int32, String, Boolean)or FindNext methods.
Public propertyFuzzySearch
Sets whether to use "fuzzy" text search algorithm. It allows to find "approximately equal" strings. For example, the search string "fox" will also find "fix" and "fax. This might be useful for compensation of some common OCR errors, like "paralle1" or "paralle|".
Public propertyFuzzySearchPermissibleErrors
Sets the string equality approximation for the fuzzy search algorithm. Simply, this is the number of permissible errors in the search string. Value 1 or 2 is okay, 3 is iffy, 4 is a poor match. Default is 1.
Public propertyPageSeparator
Sets the page separator character or string. Default is '\f' (Form Feed).
Public propertyRegexSearch
Sets whether to search the text using regular expressions.
Public propertyWordMatchingMode
Sets the word matching mode (used in text search and automatic removal of hyphens). This option is ignored when regular expressions are enabled (when is true). In case of regular expressions, you should use '\b' metacharacter to specify word bounds.
Public propertyWordMatchingPunctuationMarks
Sets punctuation marks used by word matching. These marks are considered as a part of a word. Default are: ."'“”
Top
Methods
NameDescription
Public methodFind(Int32, String, Boolean)
Searches the document page for specified text.
Public methodFind(Int32, String, RegexOptions)
Searches the document page for specified text in Regex mode with specified options.
Public methodFindAll
Searches for all occurrences of specified text in specified document page or in entire document.
Public methodFindAllToJSON
Searches for all occurrences of specified text in specified document page or in entire document and returns result as JSON string.
Public methodFindNext
Continues the text search started by one of Find() methods.
Public methodGetPageTextAsVariant
Returns page text as array of bytes. This is COM/ActiveX-compatible version of the method SavePageTextToStream(Int32, Stream) for in-memory processing of PDF documents or images.
Public methodGetText
Extracts text from whole document.
Public methodGetText(IListInt32)
Extracts text from specified pages.
Public methodGetText(String)
Extracts text from specified page ranges.
Public methodGetText(Int32, Int32)
Extracts text from specified page range.
Public methodGetTextAsVariant
Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(Stream) for in-memory processing of PDF documents or images.
Public methodGetTextAsVariant(String)
Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(String, Stream) for in-memory processing of PDF documents or images.
Public methodGetTextAsVariant(Int32, Int32)
Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(Int32, Int32, Stream) for in-memory processing of PDF documents or images.
Public methodGetTextFromPage
Extracts text from specified document page.
Public methodSavePageTextToFile(Int32, String)
Saves page text to file.
Public methodSavePageTextToFile(Int32, String, Encoding)
Saves page text to file in specified encoding.
Public methodSavePageTextToStream(Int32, Stream)
Saves page text to stream.
Public methodSavePageTextToStream(Int32, Stream, Encoding)
Saves page text to stream in specified encoding.
Public methodSaveTextToFile(String)
Saves document text to file.
Public methodSaveTextToFile(IListInt32, String)
Saves text from specified pages to file.
Public methodSaveTextToFile(String, String)
Saves text from specified page ranges to file.
Public methodSaveTextToFile(String, Encoding)
Saves document text to file in specified encoding.
Public methodSaveTextToFile(IListInt32, String, Encoding)
Saves text from specified pages to file in specified encoding.
Public methodSaveTextToFile(Int32, Int32, String)
Saves text from specified page range to file.
Public methodSaveTextToFile(String, String, Encoding)
Saves text from specified page ranges to file in specified encoding.
Public methodSaveTextToFile(Int32, Int32, String, Encoding)
Saves text from specified page range to file in specified encoding.
Public methodSaveTextToStream(Stream)
Saves document text to stream.
Public methodSaveTextToStream(IListInt32, Stream)
Saves text from specified page range to stream.
Public methodSaveTextToStream(Stream, Encoding)
Saves document text to stream in specified encoding.
Public methodSaveTextToStream(String, Stream)
Saves text from specified page range to stream.
Public methodSaveTextToStream(IListInt32, Stream, Encoding)
Saves text from specified page range to stream in specified encoding.
Public methodSaveTextToStream(Int32, Int32, Stream)
Saves text from specified page range to stream.
Public methodSaveTextToStream(String, Stream, Encoding)
Saves text from specified page range to stream in specified encoding.
Public methodSaveTextToStream(Int32, Int32, Stream, Encoding)
Saves text from specified page range to stream in specified encoding.
Top
See Also

Reference