Return to previous page Explore PDF Extractor SDK

XMLExtractor Methods

Free Trial Web API version Licensing Request A Quote

HAVE QUESTIONS OR NEED HELP?SUBMIT THE SUPPORT REQUEST FORM or write email toSUPPORT@BYTESCOUT.COM

The XMLExtractor type exposes the following members.

Methods

	Name	Description
	AddFilter(String, Boolean, Boolean)	Adds a filter to remove a text from extracted data. (Inherited from BaseTextExtractor.)
	AddFilter(String, Int32, Boolean)	Adds filter to exclude text objects with specified attributes. (Inherited from BaseTextExtractor.)
	AddFilter(String, Int32, Color, Boolean)	Adds filter to exclude text objects with specified attributes. (Inherited from BaseTextExtractor.)
	AddFilter(String, String, Boolean, Boolean)	Adds a filter to replace a text in extracted data. (Inherited from BaseTextExtractor.)
	AddFilter(String, Int32, Int32, Int32, Int32, Boolean)	Adds filter to exclude text objects with specified attributes. (Inherited from BaseTextExtractor.)
	CreateProfile(String, Boolean, Boolean, Boolean)	Creates JSON profile will all extractor properties with current values. (Inherited from BaseExtractor.)
	CreateProfile(String, String, Boolean, Boolean, Boolean)	Creates JSON profile will all extractor properties with current values. (Inherited from BaseExtractor.)
	Dispose	Releases the unmanaged resources used by the instance and optionally releases the managed resources. (Inherited from BaseExtractor.)
	DisposePage	Disposes the page object. Uses this method carefully to destroy the page object that should not be used further. Useful to free allocated memory when processing huge PDF documents. (Inherited from BaseTextExtractor.)
	Equals	(Inherited from Object.)
	Finalize	(Inherited from Object.)
	FireParsingError	(Inherited from BaseExtractor.)
	FireProgressChanged	(Inherited from BaseExtractor.)
	GetHashCode	(Inherited from Object.)
	GetPageCount	Returns document page count. (Inherited from BaseExtractor.)
	GetPageRect_Height	Gets the specified page height. (Inherited from BaseExtractor.)
	GetPageRect_Left	Gets the specified page left coordinate. (Inherited from BaseExtractor.)
	GetPageRect_Top	Gets the specified page top coordinate. (Inherited from BaseExtractor.)
	GetPageRect_Width	Gets the specified page width. (Inherited from BaseExtractor.)
	GetPageRectangle(Int32)	Gets the page rectangle in PDF Points (1 Point = 1/72 in.). (Inherited from BaseExtractor.)
	GetPageRectangle(Int32, Boolean)	Gets the page rectangle in PDF Points (1 Point = 1/72 in.). (Inherited from BaseExtractor.)
	GetPageRotationAngle	Returns the rotation angle of specified page. (Inherited from BaseExtractor.)
	GetPageXMLAsVariant	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SavePageXMLToStream(Int32, Stream) for in-memory processing of PDF documents or images.
	GetPreprocessedPagePreview	Returns preview image of document page with preprocessing filters applied. (Inherited from BaseTextExtractor.)
	GetType	(Inherited from Object.)
	GetXML	Extracts XML data from the entire document as string.
	GetXML(IListInt32)	Extracts XML data from specified page range.
	GetXML(String)	Extracts XML data from specified page range.
	GetXML(Int32, Int32)	Extracts XML data from specified page range.
	GetXMLAsVariant	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SaveXMLToStream(Stream) for in-memory processing of PDF documents or images.
	GetXMLAsVariant(String)	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SaveXMLToStream(String, Stream) for in-memory processing of PDF documents or images.
	GetXMLAsVariant(Int32, Int32)	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SaveXMLToStream(Int32, Int32, Stream) for in-memory processing of PDF documents or images.
	GetXMLDocument	Extracts XML data from the entire document as XmlDocument.
	GetXMLDocument(IListInt32)	Extracts XML data from specified pages as XmlDocument.
	GetXMLDocument(String)	Extracts XML data from specified page ranges as XmlDocument.
	GetXMLDocument(Int32, Int32)	Extracts XML data from specified page range as XmlDocument.
	GetXMLDocumentFromPage	Extracts XML data from specified document page as XmlDocument.
	GetXMLFromPage	Extracts XML data from specified document page as string.
	IsEncrypted	Gets the document encrypted state. (Inherited from BaseExtractor.)
	IsOCRRecommendedForPage	Detects whether OCR is recommended for specified page. OCR (Optical Character Recognition) is recommended when pages has no text objects bat has an image that might contain text. (Inherited from BaseTextExtractor.)
	LoadAndApplyProfiles	Loads profiles from JSON string and automatically applies them. Note that profiles containing detection keywords will be deferred until the extraction. (Inherited from BaseExtractor.)
	LoadDocumentFromFile	Loads PDF document from specified file. (Inherited from BaseExtractor.)
	LoadDocumentFromStream	Loads PDF document from provided stream. (Inherited from BaseExtractor.)
	LoadDocumentFromVariant	Loads PDF document from byte array presented as array of Variant or Byte objects ('Variant()' or 'Byte()'). This is COM/ActiveX-compatible version of the method LoadDocumentFromStream(Stream) for in-memory processing of PDF files. (Inherited from BaseExtractor.)
	LoadProfiles	Loads profiles from JSON file. (Inherited from BaseExtractor.)
	LoadProfilesFromString	Loads profiles from JSON string. (Inherited from BaseExtractor.)
	MemberwiseClone	(Inherited from Object.)
	PerformTextAnalysis	(Inherited from BaseTextExtractor.)
	Reset	(Overrides BaseTextExtractorReset.)
	ResetBaseExtractionData	(Inherited from BaseTextExtractor.)
	ResetExtractionArea	Resets the extraction area to the full page. (Inherited from BaseExtractor.)
	ResetFilters	Reset text filters. (Inherited from BaseTextExtractor.)
	SavePageXMLToFile	Saves page XML data to file.
	SavePageXMLToStream	Saves page XML data to stream.
	SavePreprocessedPagePreview	Saves preview image of document page with preprocessing filters applied. Image is saved in PNG format. (Inherited from BaseTextExtractor.)
	SaveXMLToFile(String)	Saves XML data from the entire document to file.
	SaveXMLToFile(IListInt32, String)	Saves XML data from specified pages to file.
	SaveXMLToFile(String, String)	Saves XML data from specified page ranges to file.
	SaveXMLToFile(Int32, Int32, String)	Saves XML data from specified page range to file.
	SaveXMLToStream(Stream)	Saves XML data to stream.
	SaveXMLToStream(IListInt32, Stream)	Saves XML data from specified pages to stream.
	SaveXMLToStream(String, Stream)	Saves XML data from specified page ranges to stream.
	SaveXMLToStream(Int32, Int32, Stream)	Saves XML data from specified page range to stream.
	SetCustomExtractionColumns	Helper method to set CustomExtractionColumns property when using the extractor though COM from VC++ VB, VBA, VBScript, or Delphi. (Inherited from BaseTextExtractor.)
	SetExtractionArea(RectangleF)	Sets the extraction area by rectangle. (Inherited from BaseExtractor.)
	SetExtractionArea(Double, Double, Double, Double)	Sets the extraction area by coordinates and dimensions. (Inherited from BaseExtractor.)
	SetExtractionArea(Single, Single, Single, Single)	Sets the extraction area by coordinates and dimensions. (Inherited from BaseExtractor.)
	ToString	(Inherited from Object.)

Top

Reference

XMLExtractor Class

Bytescout.PDFExtractor Namespace