Link Search Menu Expand Document

Read Text From Noisy Image - VB.NET

PDF Extractor SDK sample in VB.NET demonstrating ‘Read Text From Noisy Image’

Imports Bytescout.PDFExtractor

Module Program

    Sub Main()


            Using extractor As New TextExtractor()

                ' Load noisy image document

                ' Set the font repairing OCR mode 
                extractor.OCRMode = OCRMode.TextFromImagesAndVectorsAndRepairedFonts

                ' Set the location of OCR language data files
                extractor.OCRLanguageDataFolder = "c:\Program Files\Bytescout PDF Extractor SDK\ocrdata_best\"

                ' Set OCR language
                extractor.OCRLanguage = "eng" ' "eng" For english, "deu" For German, "fra" For French, "spa" For Spanish etc - according To files In "ocrdata" folder
                ' Find more language files at

                ' Set PDF document rendering resolution
                extractor.OCRResolution = 300

                ' You can also apply various preprocessing filters
                ' to improve the recognition on low-quality scans.

                Console.WriteLine("Please wait while PDF Extractor SDK is processing noisy image to read data...")

                ' Automatically deskew skewed scans

                ' Remove vertical Or horizontal lines (sometimes helps to avoid OCR engine's page segmentation errors)
                ' extractor.OCRImagePreprocessingFilters.AddVerticalLinesRemover();
                ' extractor.OCRImagePreprocessingFilters.AddHorizontalLinesRemover();

                ' Repair broken letters

                ' Remove noise

                ' Apply Gamma Correction

                ' Add Contrast
                ' extractor.OCRImagePreprocessingFilters.AddContrast(20)

                ' (!) You can use New OCRAnalyser class to find an optimal set of image preprocessing 
                ' filters for your specific document.
                ' See "OCR Analyser" example.

                ' Read all text
                Dim allText = extractor.GetText()

                Console.WriteLine("Extracted Text: ")

            End Using

        Catch ex As Exception
            Console.WriteLine("Exception: " + ex.Message)
        End Try

        Console.WriteLine("Press any key to exit...")

    End Sub

End Module

Download Source Code (.zip)

Return to the previous page Explore PDF Extractor SDK

Copyright © 2016 - 2023 ByteScout