Link Search Menu Expand Document

Read Text From Noisy Image - VB.NET

PDF Extractor SDK sample in VB.NET demonstrating ‘Read Text From Noisy Image’

Program.vb
Imports Bytescout.PDFExtractor

Module Program

    Sub Main()

        Try

            Using extractor As New TextExtractor()

                ' Load noisy image document
                extractor.LoadDocumentFromFile("sample.png")

                ' Set the font repairing OCR mode 
                extractor.OCRMode = OCRMode.TextFromImagesAndVectorsAndRepairedFonts

                ' Set the location of OCR language data files
                extractor.OCRLanguageDataFolder = "c:\Program Files\Bytescout PDF Extractor SDK\ocrdata_best\"

                ' Set OCR language
                extractor.OCRLanguage = "eng" ' "eng" For english, "deu" For German, "fra" For French, "spa" For Spanish etc - according To files In "ocrdata" folder
                ' Find more language files at https://github.com/bytescout/ocrdata

                ' Set PDF document rendering resolution
                extractor.OCRResolution = 300

                ' You can also apply various preprocessing filters
                ' to improve the recognition on low-quality scans.

                Console.WriteLine("Please wait while PDF Extractor SDK is processing noisy image to read data...")

                ' Automatically deskew skewed scans
                extractor.OCRImagePreprocessingFilters.AddDeskew()

                ' Remove vertical Or horizontal lines (sometimes helps to avoid OCR engine's page segmentation errors)
                ' extractor.OCRImagePreprocessingFilters.AddVerticalLinesRemover();
                ' extractor.OCRImagePreprocessingFilters.AddHorizontalLinesRemover();

                ' Repair broken letters
                extractor.OCRImagePreprocessingFilters.AddDilate()

                ' Remove noise
                extractor.OCRImagePreprocessingFilters.AddMedian()

                ' Apply Gamma Correction
                extractor.OCRImagePreprocessingFilters.AddGammaCorrection()

                ' Add Contrast
                ' extractor.OCRImagePreprocessingFilters.AddContrast(20)

                ' (!) You can use New OCRAnalyser class to find an optimal set of image preprocessing 
                ' filters for your specific document.
                ' See "OCR Analyser" example.

                ' Read all text
                Dim allText = extractor.GetText()

                Console.Clear()
                Console.WriteLine("Extracted Text: ")
                Console.WriteLine(allText)

            End Using

        Catch ex As Exception
            Console.Clear()
            Console.WriteLine("Exception: " + ex.Message)
        End Try

        Console.WriteLine("Press any key to exit...")
        Console.ReadLine()

    End Sub

End Module

Download Source Code (.zip)

Return to the previous page Explore PDF Extractor SDK