Link Search Menu Expand Document


History (changes log)

ByteScout PDF Extractor SDK history of changes.

- - bug fixed
+ - new feature
= - changed
! - critical
------------------------- (April 10, 2023)
+ Added support for WEBP image format in 'RasterRenderer' and 'HTMLExtractor'
+ Adding Variant methods to extractors
- Improved fonts rendering
- fixing crash on text object where contentLength
= Performance improvements
= Other minor fixes and improvements. (September 27, 2022)
+ DocumentSplitter: added support for "**" split range that splits document into pairs of pages.
+ Added methods to all extractors that support Variant datatype for input and output. They allow to perform in-memory processing when using the SDK as COM/ActiveX object from Delphi, VC++, VBScript, etc.
- Fixed text search for RTL languages.
- Input photo images are now rotated according to EXIF information.
= Improved parsing of PDF documents.
= Other minor fixes and improvements. (June 7, 2022)
= 'DocumentRotator' now can automatically fix rotation of PDF files using OCR.
= Improved line removal algorithm.
= Improved loading of embedded fonts. 
= Performance improvements.
- Rotated text objects were combined with unrotated ones. Fixed now.
- Fixed parsing of names of file attachments.
- 'SearchablePDFMaker': fixed coordinates of transparent text in the output document when the input is an image.
= Suppressed junk console message.
= Improved parsing of PDF documents.
= Other minor fixes and improvements. (January 24, 2022)
+ DocumentMerger: Added property 'MergedDocumentTitle' allowing to override the title of merged document.
+ XLSExtractor: Added property 'CustomColumnWidths' allowing to specify exact column widths in generated Excel spreadsheet.
= JSONExtractor: The mode 'OutputStructure.Full' is renamed to 'OutputStructure.LegacyFixed' and made maximally compatible in field names with the mode 'OutputStructure.Legacy'.
+ Added support for UniKS-UCS2-H text encoding.
+ InfoExtractor: Added method 'GetFormFields()' returning information about form fields in PDF document.
= Improved COM/ActiveX interfaces for in-memory processing without file operations.
+ Extractors and SearchablePDFMaker: Added property 'OCRDisableAutoSegmentation' to solve OCR engine's segmentation issues.
= .NET Core min required version is 2.1 now (was 2.0).
- Line grouping was not affected by 'ConsiderFontSizes' and 'ConsiderFontColors' properties. Fixed now.
- Fixed disposing issue in 'SearchablePDFMaker'.
= Improved parsing of PDF documents.
= Other minor fixes and improvements. (October 4, 2021)
+ New column detection mode 'ColumnDetectionMode.ContentGroupsAI' that works better on tables without borders and on pages with multiple tables.
= Greatly improved tables detection in 'TableDetector2'.
= Improved filtering of shadow-like text ('ExtractShadowLikeText' option).
= Improved the 'LineGroupingMode.JoinOrphanedRows'.
= 'DocumentMerger': Improved merging of PDF forms. Now it can link fields with matching names or rename them to avoid unwanted linking. See the property 'RenameMatchingFieldsDuringMerge'.
= 'JSONExtractor' and 'XMLExtractor' now output the page size for each page.
= All extractor classes now support extraction of page ranges.
+ Added properties 'DetectUnderlineTextStyle' and 'DetectStrikeoutTextStyle' to 'CSVExtractor' and 'XLSExtractor'. They help to prevent underlined text affecting the line grouping in table cells.
= Improved background color detection for the option 'ConsiderBackgroundColors'.
+ Added property 'NormalizeText' to all extractors. It replaced unicode spaces and hyphens in the extracted text with normal ' ' and '-' characters.
- 'Remover2': fixed handling of PDF page rotation.
- 'Remover2': making unsearchable now performed only for edited pages.
+ 'XMLExtractor': Added property 'IndentedXML' to control indentation.
+ 'JSONExtractor': Added property 'IndentedJSON' to control indentation.
- 'Stamper': fixed stamping of rotated pages.
+ Added new OCR mode - 'OCRMode.AutoRepairFonts'. It automatically tries to detect PDF documents with corrupted text and forces OCR font repair for them. Works only for English texts.
+ Added property 'PageSeparator' to CSV and XLS extractors.
= 'XLSExtractor': improved negative numbers detection.
- 'TextExtractor.FindAll()' method was ignoring the case sensitivity option. Fixed now.
+ Added property 'OCRDetectLines' that helps to detect table structure in scanned documents.
+ 'JSONExtractor' and 'XMLExtractor' now outputs number of pages in the result and number of pages for which OCR was performed.
+ Added property 'OCRPageCount' to extractors that contains number of pages for which OCR was performed during the last extraction.
+ 'JSONExtractor': Added property 'OutputStructure' that allows to select structure of output JSON.
+ 'JSONExtractor': Added property 'OutputTransformation' that allows to apply JSONPath expression to the output JSON.
= Performance improvements.
= Improved parsing of PDF documents.
= Other minor fixes and improvements. (May 18, 2021)
+ Added property 'TextExtractor.FuzzySearch' that enables 'fuzzy' text search algorithm. It allows to find 
  'approximately equal' strings.
+ Added 'DocumentSplitter2' class that splits document by found text.
+ Added 'CSVExtractor.NormalizeCSV' property. It makes CSV data produced from different document pages to contain 
  the same number of columns.
+ Added property 'JSONExtractor.OutputStructure' that allows to change the structure of the generated JSON 
  to one of predefined variants for easier postprocessing.
+ Added property 'JSONExtractor.OutputTransformation' that allows to apply JSONPath expression to the generated JSON.
+ Added property 'OCRPageCount' to extractor classes that contains number of pages for which OCR was performed.
+ 'JSONExtractor' and 'XMLExtractor' now add to the generated JSON and XML result the number of process pages 
  and the number of pages for which OCR was performed.
+ Added property 'OCRDetectLines' to extractor classes that improves column detection in scanned documents.
+ Added property 'ConsiderBackgroundColors' to extractor classes that enables detection of background color 
  under text objects. It may helps to improve row and column detection in tables without borders but with 
  color stripes.
+ Added properties 'DocumentMerger.GenerateBookmarks' and 'DocumentMerger.BookmarkTitles' to enable automatic 
  generation of bookmarks pointing to the merged parts.
= Improved PDF optimization in 'DocumentSplitter'.
= 'DocumentMerger' now uses the first input document as the base for the merged document. This allows to keep document 
  information properties and outlines.
= DocumentMerger: added support for profiles.
= MultimediaExtractor: added support for more media types.
- 'TextExtractor.FindAll()' method was ignoring the case sensitivity option.
- Fixed issue with junk empty temporary files generated during OCR.
= Improved parsing of PDF documents.
= Other minor fixes and improvements. (February 8, 2021)
+ Added public 'BaseExtractor.ExtractionArea' property (in addition to 'SetExtractionArea()' method) for more intuitive use.
= Added the new property 'ColumnDetectionByTextAlignment' to extractors that affects the detection of table columns without separating lines between.
+ Added support for simplified profiles.
+ DocumentOptimizer: Added the property 'OptimizationOptions.GrayscaleImages' that converts all color images to grayscale.
+ UnsearchablePDFMaker: Added the new property 'KeepSkippedPages' that keeps pages excluded from the processing in the output document.
+ UnsearchablePDFMaker: Added the new property 'Grayscale' that converts all processed pages to grayscale.
+ Added the property 'BaseTextExtractor.TextAnalysisCorruptedTextThreshold' to fine-tune the text analysis.
= Member names in profiles are case-insensitive now.
= Improved filtering of invisible objects.
= Improved detection of bold fonts.
= Improved OCR rotation detection.
= Added missing OCR mode 'OCRMode.TextFromVectorsAndRepairedFonts'.
= RTL fonts detection is now enabled by default.
= JSON extractor now generates clean JSON (without the @ and# characters for attributes).
= Improved support for external Chinese fonts.
= Improved positioning of rotated PDF objects.
= Now the damaged CCITT and JBIG2 images are skipped from rendering avoiding crashes.
= SearchablePDFMaker: improved OCR when 'DiscardExistingDocumentText' is enabled.
= 'SearchablePDFMaker.GetPageOCRCells()' now detects text color.
= OCR in all extractors now detects text color if the 'ConsiderFontColors' property is enabled.
= 'LineGroupingMode.JoinOrphanedRows' now separates rows of different color if 'ConsiderFontColors' property is enabled.
- InfoExtractor: Fixed a crash if the input document is an image.
- Fixed OCR crash on rotated text.
- 'IsOCRRecommendedForPage()' now skips text objects outside the page crop box.
= Improved parsing of PDF documents.
= Other minor fixes and improvements. (October 26, 2020)
+ DocumentSplitter: Added support for regions with inverted page numbers.
  For example, "!1" means "the last page", "!1-!3" or "!3-" means "last three pages".
+ DocumentSplitter: Added support for "*" split range that means "split every single page".
+ Added 'InfoExtractor.Metadata' property that gets XMP metadata from the document.
= Improved joining of multi-line cells in tables without borders ('LineGroupingMode.JoinOrphanedRows' mode).
= Improved detection of OCR language file versions.
= Improved .NET Core 2.0 compatibility.
= Improved unwrapping of multi-line cell text.
- Fixed issue when invisible vector drawings were causing unwanted separation of text objects.
- Fixed extraction from area when running OCR against image file (not PDF!).
= Improved parsing of PDF documents.
- Other minor fixes and improvements. (June 20, 2020)
+ 'MultimediaExtractor' now supports extraction of 3D-animation objects.
- 'TextExtractor.Find()' now keeps original font names in found object information.
= Improved column detection in 'ColumnDetectionMode.Borders' mode.
- 'SearchablePDFMaker' did not process vector-only pages. Fixed now.
= Improved regex text search in 'TextExtractor'.
+ Added 'DetectUnderlineTextStyle' and 'DetectStrikeoutTextStyle' properties to 'JSONExtractor' and 'XMLExtractor'.
+ Added 'OCRWhiteList' and 'OCRBlackList' properties to extractors.
+ Added 'Invert' OCR preprocessing filter.
+ Added 'Scale' OCR preprocessing filter.
= Improved joining of multi-line cells in tables without borders ('LineGroupingMode.JoinOrphanedRows' mode).
= Improved performance of 'ImageExtractor'.
+ Added page rectangles to 'InfoExtractor'.
= Improved 'OCRAnalyzer'.
= Improved automatic deletion of duplicated text objects during the extraction.
- Fixed extraction issues in .NET Core version.
= Improved parsing of PDF documents.
- Other minor fixes and improvements. (March 19, 2020)
+ Added 'OCROverallConfidence' property in all extractors that.
+ SearchablePDFMaker: Added 'KeepOriginalRotation' property.
- SearchablePDFMaker: fixed crash on mixed English-Arabic text recognition.
+ PDF Multitool: Added "Developer Tools" sub-menu to the context menu.
= Improved parsing of PDF documents.
- Other minor fixes and improvements. (February 11, 2020)
+ Added support for new revision of PDF encryption (ISO 32000-2:2017 compliance).
+ Added 'LicenseInfo' property providing detailed information about your license.
+ Added 'Grayscale' filter to OCRImagePreprocessingFilters.
= Dramatically improved column extraction for multiple tables on a page. Works only in 'ColumnDetectionMode.Borders' mode for tables with borders between columns and rows.
= Greatly improved 'ColumnDetectionMode.BorderedTables'. As in the table detection, it now uses optical recognition to detect bordered tables and their columns on scanned documents.
= Improved 'InfoExtractor' to return the encrypted and password-protected states without asking a password or throwing an exception.
= Added document permissions information to 'InfoExtractor'.
= DocumentSplitter: added zero-padding to page numbers in generated file names.
= Improved extraction of duplicated text (shadow-like effect).
= Improved 'MultimediaExtractor'.
- Fixed text search issues on some documents.
- Fixed bug that damaged extracted text only during multi-thread processing.
- Fixed crash on subsequent extractions with different OCR modes.
- Fixed .NET Core compatibility issue.
= Improved parsing of PDF documents.
- Other minor fixes and improvements. (December 4, 2019)
+ Remover2: Added 'MaskColor' property that allows to change color of masking rectangle.
- Remover & Remover2: Fixed incomplete removal of the text in some cases.
- XMLExtractor and XFDFExtractor: fixed missing control types.
- Fixed parsing of combobox items that consist of value+label pairs.
= Improved handling of Arabic fonts and charsets.
= Improved handling of CJK fonts and charsets.
= Improved parsing of PDF documents.
- Other minor fixes and improvements. (November 1, 2019)
= Improved extraction of embedded images.
= Improved table columns detection.
- Remover2: fixed crash on sequential Add*() method calls.
- PDF Multitool: fixed crash on multimedia extraction.
= Improved parsing and processing of PDF documents.
- Other minor fixes and improvements. (October 1, 2019)
+ Added methods to remove vector objects to 'Remover' and 'Remover2' classes.
+ Added experimental 'TableDetector2' class demonstrating new table detection method.
= Improved replacement of not embedded PDF fonts.
= Improved splitting of text objects when using CustomExtractionColumns.
- Fixed text search on some documents.
- Added 'CreateProfile()' method to all extractors that creates profile from current object.
+ PDF Multitool: Added tools to remove text, image, and vector objects.
- PDF Multitool: Fixed "Save Vectors" option in XML extraction.
= Improved parsing and processing of PDF documents.
- Other minor fixes and improvements. (September 2, 2019)
+ DocumentMerger: Added "MergeFolder()" method allowing to merge all PDF files in folder.
= Improved extraction by CustomExtractionColumns.
= Remover: Improved appearance of partially removed text objects.
= Renderer and Viewer: Improved rendering of small fonts with stroke.
+ PDF Multitool: Added Full Screen mode.
+ PDF Multitool: Added "Night Mode".
- PDF Multitool: Fixed selection reset on switching a tool.
= Improved parsing and processing of PDF documents.
- Other minor fixes and improvements. (August 6, 2019)
+ Added extracted text analysis. See "EnableTextAnalysis" property.
= Improved columns detection.
+ Implemented replacement filters allowing to replace extracted text before analysis of table structure. 
  See "AddFilter()" method.
+ Added "SensitiveDataDetector" class allowing to detect sensitive data in PDF documents.
+ Added new "Remover2" class: improved version of "Remover" with better interface.
+ PDF Multitool: Added "Save vector objects" option to XML and JSON converters.
= PDF Multitool: Improved "Detect Tables" dialog.
= PDF Multitool: Improved conversion to HTML format.
+ PDF Multitool: Added set of tools "Sensitive Data Suite" allowing to detect and remove 
  sensitive data in PDF documents.
= PDF Multitool: Reduced memory consumption on extraction from very large documents.
- Other minor fixes and improvements. (July 2, 2019)
+ Added property 'OCRMaximizeCPUUtilization' that allows to improve OCR performance
  at the cost of maximized CPU utilization.
= Improved OCR rotation detection.
- Fixed OCR crash on systems with CPU without AVX and AVX2 extensions.
- Fixed OCR crash when working under limited system accounts.
= Improved the detection of the visibility of text objects when they are hidden 
  by a overlying opaque vector object.
= Improved extraction from cropped PDF pages.
- Fixed 'OutOfMemoryException' on tiling patterns with very large step or bounding box.
= Improved extraction of embedded images.
= Improved extraction of multimedia files.
- Fixed decoding of UTF-8 encoded text objects.
= Improved Japanese fonts decoding.
- Fixed 'LineGroupingMode.JoinOrphanedRows' mode for multiple single-cell lines.
= PDF Multitool: Replaced legacy 'FolderBrowserDialog' with modern 'FolderSelectDialog' everywhere.
= PDF Multitool: Added Ctrl-Shift-O hot key to open recent document.
- Other minor fixes and improvements. (May 28, 2019)
= Improved OCR engine stability when working in strict environments.
= Improved columns separation by 'CustomExtractionColumns'.
+ Added parameter for 'TextExtractor.Find()' method that allows to specify RegexOptions.
+ Added support for streams to 'DocumentSplitter' and 'DocumentMerger'
+ Added property 'TableDetector.EnhanceTableBorders' affecting the table detection in 'Bordered Tables' mode.
= Improved parsing and processing of PDF documents.
= PDF Multitool: Visited pages are now displayed much faster.
= PDF Multitool: Improved keyboard navigation.
= PDF Multitool: Improved CSV preview.
+ PDF Multitool: All tools now shows elapsed time in the status bar.
= PDF Multitool: Changed default OCR grade to 'Best'.
- Other minor fixes and improvements. (April 4, 2019)
- Fixed detection of rotation of scanned documents ('OCRDetectPageRotation' property in extractors).
+ PDF Multitool: Added "Detect rotation" to OCR options of basic extractors.
= Improved parsing and processing of PDF documents.
- Other minor fixes and improvements. (March 21, 2019)
+ Greatly improved OCR quality and performance.
+ PDF Multitool: New option to select OCR grade.
- PDF Multitool: Fixed behavior of "Remove" button in "Merge documents" tool.
= PDF Multitool: Reduced excessive painting in selection mode.
= Improved parsing and rendering of PDF documents.
- Other minor fixes and improvements. (March 12, 2019)
+ Added TextExtractor.FindAll() and TextExtractor.FindAllToJSON() methods.
+ Added 'AnnotationExtractor' class.
= Improved handling of embedded PDF fonts.
= Improved parsing of PDF documents.
+ PDF Multitool can now be set as default PDF viewer application in Windows.
+ PDF Multitool: Added the ability to preview the conversion.
+ PDF Multitool: Reworked converters' options dialogs. Removed weird options, added actual ones.
= PDF Multitool: Now Ctrl-PageUp and Ctrl-PageDown keys switch pages even if PDFViewerControl is not focused.
= PDF Multitool: Improved handling of PDF extraction permissions.
- Fixed unwanted byte order mark (BOM) when writing extracted text to MemoryStream.
- Fixed line grouping in table cells.
- Fixed crash in XMLExtractor when input document is image.
= Improved parsing of XFA forms.
= Improved Deskew image preprocessing filter.
+ Added 'ShrinkMultipleSpaces' property improving column detection if text in a table contains multiple spaces between words.
- Fixed column detection in rotated pages.
+ Improved support of Microsoft Excel formats.
- Other minor fixes and improvements. (January 31, 2019)
+ Added OCRCorrections property to all extractors that implement OCR.
+ Added .NET Core compatible assemblies.
= Improved support of Korean fonts.
= Improved parsing of PDF documents.
= Improved columns detection.
+ XMLExtractor, JSONExtractor: Added 'SaveVectors' property.
= OCRExtension: Suppressed unwanted console messages.
- Removed C++ runtime dependencies.
- Fixed merging of PDF forms containing fields with the same name.
- Other minor fixes and improvements. (October 22, 2018)
= Changed font rendering engine to improve text rendering 
  and to circumvent Windows GDI font processing issues.
= Improved extraction of embedded media files.
= Improved detection of columns when extracting tabular data.
= PDF Rederer SDK: Property 'RenderingOptions.PreferSystemFonts' made obsolete 
  due to change of font rendering engine.
= XLSExtractor: improved Excel format support.
= Embedded default fonts to fallback to if a font is missing in Windows.
= Improved support of cropped PDF documents.
= Improved extraction of text from rotated pages.
+ PDF Multitool: Added "OCR Analyzer" tool.
= Performance improvements.
- Other minor fixes and improvements. (July 18, 2018)
+ Added new line grouping mode 'LineGroupingMode.JoinOrphanedRows'.
+ Added new OCRAnalyzer class that can help to find optimal combination of OCR image preprocessing 
  filters. See source code examples.
+ Added new LineDetector class allowing to find all vertical and horizontal lines in document.
+ Added public methods GetPreprocessedPagePreview() and SavePreprocessedPagePreview() allowing 
  to preview the result of OCR image preprocessing filters work. 
= Greatly improved the line removing OCR image preprocessing filters.
- SearchablePDFMaker: fixed hanging on processing PDF documents with large count of vector objects.
- Fixed bug in RotationAngle property when processing already rotated PDF documents.
- ImageExtractor now correctly handles the rotation of embedded images.
+ PDF Multitool: added new feature "Optimize PDF document".
- PDF Multitool: fixed resolution selection in "Make PDF unsearchable".
- PDF Multitool: fixed rotation angle selection in "Rotate Document".
- Other minor fixes and improvements. (April 11, 2018)
+ Added RotationAngle property to rotate document pages before the extraction.
= TextExtractor: Improved plaint text columns alignment.
= XLSExtractor: Improved numbers detection.
= DocumentOptimizer: Greatly improved optimization effectiveness.
= Greatly improved Deskew algorithm for OCR of rotated scans.
= Remover: more accurate deletion of text objects.
- SearchablePDFMaker: Fixed processing of rotated scans.
- SearchablePDFMaker: Fixed resolution issues when the input is image.
- Other minor fixes and improvements. (January 29, 2018)
= Improved formatting of extracted plain text (TextExtractor). Now columns look better. (January 22, 2018)
- Fixed: OCR preprocessing filters were not applied if input document is image.
- PDF Multitool: Fixed image preprocessing filters in "Find Text" dialog.
+ TableDetector now provides detected cells information for ColumnDetectionMode.BorderedTables (see 'FoundTableCells' property).
+ XMLExtractor: Added annotations extraction; 
= XMLExtractor: Object coordinates in XML are fractional now for better precision (were integer).
= Improved support of encrypted PDF documents.
- Other minor fixes and improvements. (November 8, 2017)
+ DocumentOptimizer: added automatic resampling of high resolution images.
+ Added 'ParsingError' event allowing to handle parsing errors and interrupt or continue the processing.
+ SearchablePDFMaker: Added DiscardExistingDocumentText property allowing to overwrite previous OCR.
+ Added AllowStandalonePunctuation property to tabular extractors (CSV, XML, JSON, XLS).
= Performance improvements.
- SearchablePDFMaker: Invisible text dimensions now match recognized text pieces.
- DocumentSplitter: Fixed 'outputFolder' parameter in SplitCOM() method.
- Made IBaseTextExtractor interface public.
- Other minor fixes and improvements. (August 1, 2017)
+ XMLExtractor, JSONExtractor, HTMLExtractor: Added KeepOriginalFontNames property.
+ TextComparer: Added GetChanges() method to get comparison results in form convenient 
  for programmatic analysis.
+ DocumentRotator: It is now possible to specify pages to rotate.
= TextExtractor.ExtractColumnByColumn property now affects Find() method.
- Fixed font names in SearchResult elements.
- Fixed Contrast preprocessing filter.
- Extraction: subscript and superscript text objects were merged with normal text. Fixed now.
= Other minor fixes and improvements. (June 1, 2017)
= Improved Japanese text extraction.
= Removed obsolete ClientProfile builds.
= Improved multimedia files extraction.
- Other minor fixes and improvements. (March 29, 2017)
+ New event ProgressChanged in all time-consuming classes. The event reports the progress 
  in percents and also allows to interrupt the processing.
+ SearchablePDFMaker now supports single and multi-page images as the input 
  and produces a PDF document at the output.
= Performance improvements.
- Fixed crash when the input document is image and it's loading from stream.
- Other minor fixes and improvements. (March 06, 2017)
+ Added new Remover class allowing te remove text from PDF documents.
+ InfoExtractor now able to read custom document properties (see CustomProperties property).
+ XMLExtractor and JSONExtractor now able to extract document images and put them to outer files or embed as Base64 string.
= Text extraction: Unwrap property now affects the text in table cells.
= Text extraction: Improved lines grouping in table cells.
- AttachmentExtractor: Fixed extraction of attachments and portfolio created with Microsoft Outlook.
- DocumentSplitter: Fixed document optmization (OptimizeSplittedDocuments property).
= Performance improvements.
= Other minor improvements and bug fixes. (January 11, 2017)
- Fixed Unwrap option.
= Improved bordered tables detection.
= Improved attachments extraction.
+ Added support for profiles - quick way to apply multiple settings at once.
+ OCR: Implemented rotation detection of wrongly oriented scanned PDF pages.
+ SearchablePDFMaker now able to automatically rotate wrongly oriented scanned PDF pages.
- Fixed exception in SearchablePDFMaker when loading document from stream.
- Fixed memory leaks in OCR.
+ TextExtractor and CSVExtractor: Added Save* methods overrides allowing to specify the charachers encoding.
= Improved media files extraction.
= Improved Vertical Line Remover OCR preprocessing filter.
= Other minor improvements and bug fixes. (October 25, 2016)
- Fixed OCR preprocessing filters in SearchablePDFMaker.
- Fixed OCR preprocessing filters PDF Multitool demo app.
+ Added Gamma Correction preprocessing filter.
+ Added Horizontal Lines Remover preprocessing filter.
= Improved Dilate preprocessing filter. (October 21, 2016)
+ Added OCR preprocessing filters to improve the recognition quality on low-quality scanned documents.
+ Added new DocumentOptimizer class able to recompress all document images with JPEG or CCITT compression.
+ Added text removal filters.
+ All extraction class (TextExtractor, XMLExtractor, etc.) now able to load image files and extract 
  text from them using OCR.
+ PDF Multitool demo app now able to load image files and extract text from them using OCR.
- Fixed extraction of text in Korean charset (KSCms-UHC-H / Code Page 949).
= Improved text extraction from specified rectangular area.
- Improved extraction of invisible text.
- Fixed transparent color representation in XML extraction.
= Other minor improvements and bug fixes. (August 19, 2016)
+ Added filtering of extracted content by font name, font size and color.
! Updated OCR engine to the latest version. Update language files from "tessdata" folder.
= Improved text extraction.
= Improved lines grouping in tabular data.
= Improved performance.
= Improved XFA forms extraction.
= Improved TableDetector.
- Fixed PDF parsing issues.
- Fixed JBIG images decoding.
- ImageExtractor: fixed per-page image extraction.
- MultimediaExtractor: fixed extraction on embedded MPEG audio.
- TextExtractor: fixed non-working RemoveHyphenation property.
= Other minor improvements and bug fixes. (May 26, 2016)
+ Added new JSONExtractor class.
+ Added override for DocumentSplitter.Split() method allowing to specify the output folder for generated files.
- Fixed multi-threading bug in DocumentSplitter.
- TableDetector now respects extraction area set by SetExtractionArea() method.
+ New properties in extraction classes:
  ExtractionColumns - contains coordinates of detected columns;
  CustomExtractionColumns - allows to override the column detection.
- GetPageRect* methods did not take the page rotation into account.
- Fixed bug in installer causing some files from previous installation were interfering with updates.  
= Reworked the registration checking. Now the library will not throw an exception, 
  but work in demo mode if you missed or input wrong RegistrationName and RegistrationKey.
+ PDF Multitool: Added recent document list to "Open PDF Document" button.
+ PDF Multitool: Selection can be resized now.
+ PDF Multitool: Added Extract JSON feature.
= PDF Multitool: Improved Table Detector UI.
= PDF Multitool: Greatly improved font rendering quality.
+ PDF Multitool: Added debug option "Show Detected Extraction Columns" to the context menu to display 
  the detected columns on the current page. Becomes visible only after running any extraction against 
  the current displayed page.
- PDF Multitool: Fixed font rendering issue on 32-bit Windows.
= Other minor improvements and bug fixes. (March 23, 2016)
+ Added TextComparer utility class (available in .NET 4.0 assemblies only) allowing 
  to compare text in two PDF documents and generate report.
= Improved support of ICC color profiles.
= Improved handling of embedded fonts.
= Improved AttachmentExtractor.
- Fixed XMLExtractor.SaveXMLToStream() method.
- Fixed extracted text duplication when using OCRCacheMode.WholePage option.
= Other bug fixes and improvements. (January 20, 2016)
PDF To Text, PDF To CSV, PDF To XML functions improved
New Extract Video, Extract Audio examples
CSV and XML extractors improved support for tables with empty columns inside
new MultimediaExtractor to extract video and audio from PDF 
new property PageDataCaching 
new "MemoryCareProcessingOfHugeFiles" example 
fixed null exception when trying to dispose already disposed pages 
XLSExtractor: improves fonts support
SkipInvisibleText now skips clipped text (which is not visible)
text output rendering improved
XFDF Extractor: added support for checkboxes
Images output improved to support more sub-formats
Unicode text handling improved

6.11.2193 (August 3, 2015)
Batch Processing samples updated to show the use of Reset() method
C++ source code sample added for Pages Extraction
DocumentMerger adds Merge2(inputfile1, inputfile2, outputfile) method to merge 2 files
XLS Extractor minor bug-fixes
PDF Multitool now allows to enable/disable text, image, vector layers, adds advanced settings for text extraction
XML, CSV, Table extraction improves support for tables with emtpry cells inside columns

6.10.2136 (June 16, 2015)
improved PDF to Text extraction
.ExtractShadowLikeText property improved: better filtering for shadow-like text
improved stability and PDF text support

6.00.2071 (May 14, 2015)
PDF to XML, PDF To CSV, PDF To Text functionality improved
PDF To XLS command line sample added (based on vbscript)
PDF To HTML SDK adds new .DetectHyperLinks property (TRUE by default) to enable/disable automated links detection in the text
New SearchablePDFMaker (available for PRO licenses) to convert PDF into searchable PDF files
new properties in extractor: ConsiderFontNames, ConsiderFontSizes, ConsiderFontColors, ConsiderVerticalBorders in CFG files
header columns detection (when AutoAlighHeaderToColumns = true) improved
.DetectLinesInsteadOfParagraphs replaced with new .LineGroupingMode to control how lines are merged into paragraphs
IMPORTANT PDF To XML fixes long time issue with incorrect Y coordinate for text objects (was point to the bottom left instead of top left)
.TableXMinIntersectionRequiredInPercents and .TableYMinIntersectionRequiredInPercents properties added
C++ source code sample added
XML Extractor fixes missing empty columns in PreserveFormatting=true mode
Minor fixes in colors in some PDF files
support for for multiple OCR languages added
PDF Multitool GUI: adds Copy to Clipboard button to TXT, CSV, XML and raster renderer dialogs
XLSExtractor: adds PageToWorksheet property to enable/disable generation of separate worksheets per page.
new .TextEncodingCodePage property
PDFViewerControl: adds ValidateContextMenu allowing user to add custom items to context menu
PDF Viewer control: adds properties ShowTextObjects, ShowImageObjects, ShowVectorObjects.
XMLExtractor now adds "OCRConfidence" attribute for recognized text 
PDF/A checking functionality (in beta)
improving controls and text checking and alignment according to the original layout. The issue was caused by the shift of Y coordinates in controls while parsing: that was incorrect. The correct way is to shif...
XML Extractor updated: now produces <CONTROL> tag for checkboxes and text fields
changed using of current directory to temp directory.
checkboxes,radioboxes, editboxes, comboboxes are better supported
now allows partial trust callers.

5.20.1781 (January 27, 2015)
PDF to XML, PDF to CSV, PDF to Text functionality improved
OCRMode now provides 9 modes
.DetectLineInsteadOfParagraph now works much better. Set it to False to capture multiline text in table cells!
PDF controls support improved
FDF and XFDF data extraction added
Table detection improved to support multline text in cells and tables with absent rows
beta version of PDF/A validator added
minor fixes and improvements

5.10.1747 (November 25, 2014)
PDF to XML, PDF to CSV, PDF to Text functions improved
now supports text extraction from text controls
XML extractor now adds font style, size, name, text coordinates into <text> tags
ASP.NET sample for OCR usage added
new property OCRLanguageDataFolder to specify the location of "tessdata" folder
improved support of PDF files
improves support for rotated text
updated source code samples
updated documentation
minor improvements and fixes

5.00.1626 (August 14, 2014)
OCR (text from images) functionality added: now you may extract text from embedded images and repair damaged text
issue fixed with CSV and XML extractor missing last columns with some settings
improved support for damaged PDF files
multiline search text search with word matching modes is now supported
now may search text with hyphens and on different lines: see new source code sample Find Text With Hyphens
new property .RTLTextAutoDetectionEnabled (false by default) to auto detect RTL languages
PDF Viewer GUI demo improved
minor improvements and fixes

4.00.1487 (May 30, 2014)
improved pdf to text, pdf to csv, pdf to xml
issue with extraction area fixed
Improved Unicode handling
new .ContentType to check if PDF is PDF, Portfolio or XFAForm 
new properties: Unwrap, ExtractionAreaUsageMode 
new AttachmentInfo class to obtain details about attachment
new XFA Form XML extraction support (see XFAFormExtractor and XFAFormToXML samples)
new ZuGFeRD PDF support added
Multhithreading performance improved
Licensing updated: Now Licensing is per developer
new "match whole word" parameter to TextExtractor.Find()
improved XLS and XLSX output

3.40.1349 (March 10, 2014)
improved stability of the text extraction
issue with the very last text line missing in some PDF files fixed
tables with empty cells are handled better now
issue with incorrect extraction of overlapped text objects fixed
issue with missing spaces between words in some files fixed
issue with incorrect X coordinate returned while searching with extraction area defined
minor bug-fixes and improvements

3.30.1240 (November 27, 2013)
improved support for old formats PDF files
image flipping issue in some PDF files fixed 
improved text rendering in PDF files
minor bug-fixes

3.20.1209 (October 31, 2013)
table detection was not returning proper coordinates for 2nd and further tables, fixed
minor source code samples updates
DocumentSplitter now works with multipage TIF files 
minor bug-fixes

3.20.1200 (October 28, 2013)
minor rotated text issues fixed
table detection was not returning proper coordinates, fixed
minor bug-fixes

3.20.1179 (October 22, 2013)
pdf to text and pdf data extraction improved
new .AutoAlignColumnsToHeader (true by default) property to automatically align cells to the header column or not (switching this setting will help if you are getting some shifted cells)
new DocumentRotator class to rotate pages in PDF documents
new ExtractRawImages property in Images Extractor to define if we are extracting raw images or images with rotation and transformation applied 
improved support of PDF files with rotated objects and pages
new source code sample showing how to extract page found by a keyword "Find Keyword And Extract Page"
Images Extractor: SetExtractionArea() method added to define a rectangle area to extract images from 
improved Splitting Pages example
improved pages extraction from PDF
new RemoveUnusedResources method to remove unused resources from PDF to reduce file size
minor bug-fixes and improvements

3.20.1100 (August 22, 2013)
new method: DocumentSplitter.Split(sourcefile, splitPages) to extract mulitple ranges of pages from the same PDF file
minor bug-fixes in pdf to text engine

3.20.1093 (August 5, 2013)
pdf to text minor functionality fixes
x64 installer improvements
minor fixes for error messages
PDFDocument.Dispose() now not disposing the source stream with PDF if this stream was supplied by the user (so user should dispose it)
improved PDF format support
minor bug-fixes

3.20.1075 (July 11, 2013)
improved PDF To CSV, PDF To XLS, PDF To XML extraction
improved PDF reading speed and stability
minor bug-fixes

3.10.1051 (June 29, 2013)
improved table extraction support
improved pdf files support

3.10.1038 (June 26, 2013)
improved text extraction support
issues fixed related to incorrect extraction area coordinates for some PDF files with scanned images 
speed improvements
improved support for various PDF files

3.10.942 (May 30, 2013)
improved pdf text extraction support
minor bug-fixes and improvements

3.10.899 (May 14, 2013)
improved pdf to text conversion
improved PDF reading support
more source Visual Basic .NET, C# and VBScript code samples added 
documentation updated

3.00.864 (April 11, 2013)
improved PDF extraction support
improved PDF handling
pdf splitting and merging: new property to optimize PDF files after splitting DocumentSplitter.OptimizeSplittedDocuments may decrease file size when needed
improved PDF fonts handling
demo utility updated
source code samples updated to run on any .NET framework by default
minor bug-fixes

3.00.825 (March 12, 2013)
improved pdf to text, pdf to csv
demo utility PDF Viewer reworked and updated for better UI experience
minor improvements and fixes in PDF support
improved PDF stability while working with PDF files with high density vector graphics inside
improved support for indexed color pallettes 
improved embedded fonts rendering
better support for Unicode fonts
new .Version property to read exact version of the dll
minor updates and improvements

2.50.708 (November 11, 2012)
PDF data extraction speed improved
Windows 8 support improved
PDF images and colors support improved
PDF to csv, PDF xml, PDF to xls/xslx now skips first leading rows if they are empty
pdf text search now works better and provides more intelligent support for regular expressions
ActiveX support and installation improved and now provides single batches to run on Windows x86/x64 for Windows XP to 8 Pro 
new property: .ExtractShadowLikeText to enable/disable extraction of shadowed text (where it is used as effect to create visual shadows)
minor bug-fixes and improvements

2.40.650 (November 1, 2012)
improved support for Unicode text extraction 
improved support for PDF/A pdf files 
issues with white stripes appearing on multiple images combined fixed
data extraction internal optimizations
improved support for 8 bit images inside PDF
vector drawings improved to provide better support for multiple small objects 
Color representation in images with indexed colors fixed
Type2 fonts support improved
Improved support for embedded fonts in PDF produced by Ghostscript engine
CCIT images compression compression related issues fixed
LZW compressed PDF support improved
improved support for shading objects
improved PDF fonts support 
improved support for PDF with 4 bit images

2.30.594 (September 18, 2012)
PDF data extraction improved
memory and speed optimizations
fixing issue with empty data while extracting data from some PDF files
improved images extraction support (more image encoding variations are supported)
minor updates in examples
minor bug-fixes

2.30.568 (June 21, 2012)
pdf to text conversion quality improved
multithreading usage stability has been improved
hanging issue on some PDF fixed
PDF Extractor SDK: updated sample for StructuredExtractor (previously known as TableExtractor interface)
minor fixes and improvements (May 4, 2012)
improved stability
demo utility improved
important security fixes

2.20.525 (April 14, 2012)
improved speed (up to x2 faster on some documents)
Tables detection improved
updated PDF Viewer utility
improved support for structured text extraction (CSV and XML data extraction)
minor bug-fixes

2.20.458 (February 2, 2012)
minor fixes in TableDetector class (.TableDetectionMinNumberOfColumns and .TableDetectionMinNumberOfRows were working incorrectly)
improved text extraction for PDF files generated from text files
improved support for PDF files produced by Adobe Acrobat
PDF Viewer: CSV, XML and Text extractor forms updated to show .PreserveFormattingOnTextExtraction option
minor fixes in .NET 4.0 assemblies
Renderer SDK adds /Visual Basic/PDF To BMP using streams/ sample
improved support for PDF with forms objects
improved leading spaces format detection in text extraction
.SetExtractionArea() added to define area on a page to work with in PDF Renderer SKD
improved fonts information reading support in PDF files
new .PageSeparator property in TextExtractor allowing to define a separator string for pages if you need one
fixing issue with indexed colorspaces in PDF
improved PDF format support

2.20.415 (December 21, 2011)
PDF Extractor SDK: minor update for PDF to XLS sample
rendering: improved fonts support
text extraction with formatting improved
new source code sample to show how to save extracted text to a stream
performance optimized and pdf processing speed improved
improved support for PDF format

2.20.396 (November 30, 2011)
fixing issues with CSV, XML and XLS extraction on long tables
PDF Viewer now provides ability to turn on/off text formatting support on extraction
PDF support improved
minor bug-fixes

2.20.392 (November 25, 2011)
NEW table detection implemented, see new Bytescout.PDFExtractor.TableDetector interface and source code samples in /Find Table And Extract As CSV/ sub-folder in examples
NEW regular expressions support for text search in TextExtractor (see .RegexSearch property)
Text search functionality improved
minor bug-fixes

2.10.303 (October 4, 2011)
NEW: DocumentMerger and DocumentSplitter interfaces and classes to merge and split PDF documents
improved support for PDF documents
PDF processing speed increased
minor bug-fixes

2.10.276 (August 26, 2011)
NEW: AttachmentExtractor interface to extract file attachments and embedded files from PDF (see /Examples/Extract Attachments/ for sample source code)
NEW: XLSExtractor interface to extract tables from PDF as XLS and XLSX Excel files (including font formatting)
improved text extraction functionality
improved output image quality
improved support of Unicode text
improved support of damaged PDF files (not hanging on damaged files anymore)

2.00.228 (12 July 2011)
CSVExtractor: SeparationSymbol and QuotationSymbol properties were added
TrimValues property for CSVExtractor and XMLExtractor: turned on by default to trim detected cell values automatically
Default properties for CSV extraction improved
fixed incorrect default space ratio in text extractor to 0.4, previous value 1.2 was causing to join some words into a single one
TextExtractor.detectNewColumnBySpacesRatio renamed into .SpaceRatioBetweenWords property
PDFViewer now shows options dialog to adjust SpaceRatioBetweenWords if needed
minor bug-fixes

2.00.217 (21 June 2011)
CSV and XML extraction speed greatly improved
CSVExtractor and XMLExtractor classes add new .DetectNewColumnBySpacesRatio property: use this property to control space between detected columns of text
XML and CSV Extractor adds .SkipCellsWithEmptyValues property (true by default to skip cells with empty values)
PDF Viewer now shows extraction options dialog for XML and CSV export functions
PDF To CSV to XLS source code sample added
PDF To CSV\Delphi\ source code sample added
minor bug-fixes and improvements

2.00.206 (6 June 2011)
support for .NET 3.5, .NET 4.00 added
Delphi source code sample has been added
minor bug-fixes and improvements

2.00.186 (May 16, 2011)
pdf processing speed increased up to x10 times
minor bug-fixes and improvements

1.10.168 (May 6 2011)
support for password protected PDF documents improved (was not working properly in previous release)
minor bug-fixes and improvements

1.10.160 (12 April 2011)

XML comments are available now to show hints for methods, classes and properties in Visual Studio
New property: .ExtractColumnByColumn (false default), set to True to extract text column by column instead of line by line
PDF Viewer freeware utility updated to feature "Extract Text (line by line)" and "Extract Text (column by column)" buttons
improved support for single paged PDF documents produced by Acrobat Distiller software
clipping issues were fixed 
fixed hanging on some broken PDF documents 
improved text decoding support
minor bug-fixes

1.10.150 (10 March 2011)
* PDF files support improved
+ now handles PDF files from Google Doc without errors
* minor bug-fixes

1.10.144 (26 February 2011)
+ now works with secured documents (provide passsword if needed in .Password property)
+ minor bug-fixes and improvements
+ updated GUI demo application

1.10.121 (11 February 2011)
+ PDF to CSV extractor added
+ PDF to XML extractor added
+ support for invisible text extraction added
+ minor bug-fixes and improvements

1.00.30 (9 November 2010)
+ new version