Link Search Menu Expand Document

TextExtractor Methods

The TextExtractor type exposes the following members.

Methods
NameDescription
Public methodAddFilter(String, Boolean, Boolean)
Adds a filter to remove a text from extracted data.
(Inherited from BaseTextExtractor.)
Public methodAddFilter(String, Int32, Boolean)
Adds filter to exclude text objects with specified attributes.
(Inherited from BaseTextExtractor.)
Public methodAddFilter(String, Int32, Color, Boolean)
Adds filter to exclude text objects with specified attributes.
(Inherited from BaseTextExtractor.)
Public methodAddFilter(String, String, Boolean, Boolean)
Adds a filter to replace a text in extracted data.
(Inherited from BaseTextExtractor.)
Public methodAddFilter(String, Int32, Int32, Int32, Int32, Boolean)
Adds filter to exclude text objects with specified attributes.
(Inherited from BaseTextExtractor.)
Public methodCreateProfile(String, Boolean, Boolean, Boolean)
Creates JSON profile will all extractor properties with current values.
(Inherited from BaseExtractor.)
Public methodCreateProfile(String, String, Boolean, Boolean, Boolean)
Creates JSON profile will all extractor properties with current values.
(Inherited from BaseExtractor.)
Public methodDispose
Releases the unmanaged resources used by the instance and optionally releases the managed resources.
(Inherited from BaseExtractor.)
Public methodDisposePage
Disposes the page object. Uses this method carefully to destroy the page object that should not be used further. Useful to free allocated memory when processing huge PDF documents.
(Inherited from BaseTextExtractor.)
Public methodEquals (Inherited from Object.)
Protected methodFinalize (Inherited from Object.)
Public methodFind(Int32, String, Boolean)
Searches the document page for specified text.
Public methodFind(Int32, String, RegexOptions)
Searches the document page for specified text in Regex mode with specified options.
Public methodFindAll
Searches for all occurrences of specified text in specified document page or in entire document.
Public methodFindAllToJSON
Searches for all occurrences of specified text in specified document page or in entire document and returns result as JSON string.
Public methodFindNext
Continues the text search started by one of Find() methods.
Protected methodFireParsingError (Inherited from BaseExtractor.)
Protected methodFireProgressChanged (Inherited from BaseExtractor.)
Public methodGetHashCode (Inherited from Object.)
Public methodGetPageCount
Returns document page count.
(Inherited from BaseExtractor.)
Public methodGetPageRect_Height
Gets the specified page height.
(Inherited from BaseExtractor.)
Public methodGetPageRect_Left
Gets the specified page left coordinate.
(Inherited from BaseExtractor.)
Public methodGetPageRect_Top
Gets the specified page top coordinate.
(Inherited from BaseExtractor.)
Public methodGetPageRect_Width
Gets the specified page width.
(Inherited from BaseExtractor.)
Public methodGetPageRectangle(Int32)
Gets the page rectangle in PDF Points (1 Point = 1/72 in.).
(Inherited from BaseExtractor.)
Public methodGetPageRectangle(Int32, Boolean)
Gets the page rectangle in PDF Points (1 Point = 1/72 in.).
(Inherited from BaseExtractor.)
Public methodGetPageRotationAngle
Returns the rotation angle of specified page.
(Inherited from BaseExtractor.)
Public methodGetPageTextAsVariant
Returns page text as array of bytes. This is COM/ActiveX-compatible version of the method SavePageTextToStream(Int32, Stream) for in-memory processing of PDF documents or images.
Public methodGetPreprocessedPagePreview
Returns preview image of document page with preprocessing filters applied.
(Inherited from BaseTextExtractor.)
Public methodGetText
Extracts text from whole document.
Public methodGetText(IListInt32)
Extracts text from specified pages.
Public methodGetText(String)
Extracts text from specified page ranges.
Public methodGetText(Int32, Int32)
Extracts text from specified page range.
Public methodGetTextAsVariant
Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(Stream) for in-memory processing of PDF documents or images.
Public methodGetTextAsVariant(String)
Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(String, Stream) for in-memory processing of PDF documents or images.
Public methodGetTextAsVariant(Int32, Int32)
Returns document text as array of bytes. This is COM/ActiveX-compatible version of the method SaveTextToStream(Int32, Int32, Stream) for in-memory processing of PDF documents or images.
Public methodGetTextFromPage
Extracts text from specified document page.
Public methodGetType (Inherited from Object.)
Public methodIsEncrypted
Gets the document encrypted state.
(Inherited from BaseExtractor.)
Public methodIsOCRRecommendedForPage
Detects whether OCR is recommended for specified page. OCR (Optical Character Recognition) is recommended when pages has no text objects bat has an image that might contain text.
(Inherited from BaseTextExtractor.)
Public methodLoadAndApplyProfiles
Loads profiles from JSON string and automatically applies them. Note that profiles containing detection keywords will be deferred until the extraction.
(Inherited from BaseExtractor.)
Public methodLoadDocumentFromFile
Loads PDF document from specified file.
(Inherited from BaseExtractor.)
Public methodLoadDocumentFromStream
Loads PDF document from provided stream.
(Inherited from BaseExtractor.)
Public methodLoadDocumentFromVariant
Loads PDF document from byte array presented as array of Variant or Byte objects ('Variant()' or 'Byte()'). This is COM/ActiveX-compatible version of the method LoadDocumentFromStream(Stream) for in-memory processing of PDF files.
(Inherited from BaseExtractor.)
Public methodLoadProfiles
Loads profiles from JSON file.
(Inherited from BaseExtractor.)
Public methodLoadProfilesFromString
Loads profiles from JSON string.
(Inherited from BaseExtractor.)
Protected methodMemberwiseClone (Inherited from Object.)
Protected methodPerformTextAnalysis (Inherited from BaseTextExtractor.)
Public methodReset
Resets the instance and disposes internal resources. Also automatically invoked by Dispose.
(Overrides BaseTextExtractorReset.)
Protected methodResetBaseExtractionData (Inherited from BaseTextExtractor.)
Public methodResetExtractionArea
Resets the extraction area to the full page.
(Inherited from BaseExtractor.)
Public methodResetFilters
Reset text filters.
(Inherited from BaseTextExtractor.)
Public methodSavePageTextToFile(Int32, String)
Saves page text to file.
Public methodSavePageTextToFile(Int32, String, Encoding)
Saves page text to file in specified encoding.
Public methodSavePageTextToStream(Int32, Stream)
Saves page text to stream.
Public methodSavePageTextToStream(Int32, Stream, Encoding)
Saves page text to stream in specified encoding.
Public methodSavePreprocessedPagePreview
Saves preview image of document page with preprocessing filters applied. Image is saved in PNG format.
(Inherited from BaseTextExtractor.)
Public methodSaveTextToFile(String)
Saves document text to file.
Public methodSaveTextToFile(IListInt32, String)
Saves text from specified pages to file.
Public methodSaveTextToFile(String, String)
Saves text from specified page ranges to file.
Public methodSaveTextToFile(String, Encoding)
Saves document text to file in specified encoding.
Public methodSaveTextToFile(IListInt32, String, Encoding)
Saves text from specified pages to file in specified encoding.
Public methodSaveTextToFile(Int32, Int32, String)
Saves text from specified page range to file.
Public methodSaveTextToFile(String, String, Encoding)
Saves text from specified page ranges to file in specified encoding.
Public methodSaveTextToFile(Int32, Int32, String, Encoding)
Saves text from specified page range to file in specified encoding.
Public methodSaveTextToStream(Stream)
Saves document text to stream.
Public methodSaveTextToStream(IListInt32, Stream)
Saves text from specified page range to stream.
Public methodSaveTextToStream(Stream, Encoding)
Saves document text to stream in specified encoding.
Public methodSaveTextToStream(String, Stream)
Saves text from specified page range to stream.
Public methodSaveTextToStream(IListInt32, Stream, Encoding)
Saves text from specified page range to stream in specified encoding.
Public methodSaveTextToStream(Int32, Int32, Stream)
Saves text from specified page range to stream.
Public methodSaveTextToStream(String, Stream, Encoding)
Saves text from specified page range to stream in specified encoding.
Public methodSaveTextToStream(Int32, Int32, Stream, Encoding)
Saves text from specified page range to stream in specified encoding.
Public methodSetCustomExtractionColumns
Helper method to set CustomExtractionColumns property when using the extractor though COM from VC++ VB, VBA, VBScript, or Delphi.
(Inherited from BaseTextExtractor.)
Public methodSetExtractionArea(RectangleF)
Sets the extraction area by rectangle.
(Inherited from BaseExtractor.)
Public methodSetExtractionArea(Double, Double, Double, Double)
Sets the extraction area by coordinates and dimensions.
(Inherited from BaseExtractor.)
Public methodSetExtractionArea(Single, Single, Single, Single)
Sets the extraction area by coordinates and dimensions.
(Inherited from BaseExtractor.)
Public methodToString (Inherited from Object.)
Top
See Also

Reference