EVO Classic PDF to Text Converter for .NET

EVO Classic PDF to Text Converter for .NET allows you to extract text from PDF documents and to search for text in PDF documents. Integration with existing .NET applications is extremely easy and no installation is necessary to run the converter. The downloaded archive contains the .NET assembly and ready-to-use samples for text extraction and text search. The full C# source code for the sample application is available in the Samples folder. The conversion result is a .NET String object that you can further manipulate.

The Classic library is compatible with .NET Framework, .NET Core and .NET Standard 2.0 on Windows platforms.

For applications that need to run on both Windows and Linux platforms, you can use the EvoPdf Next PDF to Text for .NET which enables text extraction from PDF documents in the original layout or optimized for reading, as well as text search in PDF documents that returns the exact positions of the matches.

The full EvoPdf Next Library for .NET can be used on Windows, Linux, Azure and Docker platforms to create, edit and merge PDF documents, convert HTML to PDF or images, convert Word, Excel, RTF and Markdown to PDF, extract text and images from PDF documents, search for text in PDF documents and convert PDF pages to images.

Extract text from PDF documents

Search text in PDF documents

Save the extracted text using various text encodings

Case sensitive and whole word options for text search

Support for password protected PDF documents

Extract the text or search only a range of PDF pages

Extract text preserving the original PDF layout

Extract text in PDF reading order or PDF internal order

Get the number of pages in a PDF document

Get the PDF document title, keywords, author and description

Does not require Adobe Reader or other third party tools

Support for .NET 4.0 framework and later

Documentation and C# samples for all the features

Quick Links

Download

API Reference

Support

Buy Now

Code Sample - Extract Text from PDF Documents

The code below was taken from the PDF to Text demo application available for download in the PDF to Text Converter archive. In this example an instance of the PdfToTextConverter class is constructed and used to extract the text from a PDF document into a .NET String object. The resulted text is saved in a file on disk using the UTF-8 encoding.

private void btnConvertToText_Click(object sender, EventArgs e)
{
    if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please choose a PDF file to convert", "Choose PDF file", MessageBoxButtons.OK);
        return;
    }

    // the pdf file to convert
    string pdfFileName = pdfFileTextBox.Text.Trim();
            
    // start page number
    int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
    // end page number
    // when it is 0 the extraction will continue up to the end of document
    int endPageNumber = 0;
    if (textBoxEndPage.Text.Trim() != String.Empty)
        endPageNumber = int.Parse(textBoxEndPage.Text.Trim());

    // the output text layout
    TextLayout textLayout = SelectedTextLayout();

    // the output text encoding
    System.Text.Encoding textEncoding = SelectedTextEncoding();

    // page breaks
    bool markPageBreaks = cbMarkPageBreaks.Checked;

    string outputFileName = System.IO.Path.Combine(Application.StartupPath, @"DemoFiles\Output", 
            System.IO.Path.GetFileNameWithoutExtension(pdfFileName) + ".txt");

    // create the converter object and set the user options
    PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

    pdfToTextConverter.LicenseKey = "ujQlNSAgNSU1IzslNSYkOyQnOywsLCw1JQ==";

    pdfToTextConverter.Layout = textLayout;
    pdfToTextConverter.MarkPageBreaks = markPageBreaks;

    Cursor = Cursors.WaitCursor;
    try
    {
        // extract text from PDF
        string extractedText = pdfToTextConverter.ConvertToText(pdfFileName, startPageNumber, endPageNumber);

        // write the resulted string into an output file 
        // in the application directory using the selected encoding
        System.IO.File.WriteAllText(outputFileName, extractedText, textEncoding);
    }
    catch (Exception ex)
    {
        MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
        return;
    }
    finally
    {
        Cursor = Cursors.Arrow;
    }


    try
    {
        System.Diagnostics.Process.Start(outputFileName);
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
        return;
    }
}

Code Sample - Search Text in PDF Documents

The code below was taken from the Find Text demo application available for download in the PDF to Text Converter archive. In this example an instance of the PdfToTextConverter class is constructed and used to search a given text in a PDF document and highlight that text in PDF document.

private void btnFindText_Click(object sender, EventArgs e)
{
    if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please choose a PDF file to search", "Choose PDF file", MessageBoxButtons.OK);
        return;
    }

    if (textToFindTextBox.Text.Trim().Equals(String.Empty))
    {
        MessageBox.Show("Please enter the text to find", "Text to Find", MessageBoxButtons.OK);
        return;
    }

    // the pdf file to search 
    string pdfFileName = pdfFileTextBox.Text.Trim();

    // start page number
    int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
    // end page number
    // when it is 0 the extraction will continue up to the end of document
    int endPageNumber = 0;
    if (textBoxEndPage.Text.Trim() != String.Empty)
        endPageNumber = int.Parse(textBoxEndPage.Text.Trim());

    Cursor = Cursors.WaitCursor;
    string outputFileName = System.IO.Path.Combine(Application.StartupPath, @"DemoFiles\Output",
            System.IO.Path.GetFileNameWithoutExtension(pdfFileName) + "_Highlighted.pdf");
    Document pdfDocument = null;
    try
    {
        // create the PDF to Text converter
        PdfToTextConverter pdfToTextConverter = new PdfToTextConverter();

        pdfToTextConverter.LicenseKey = "ujQlNSAgNSU1IzslNSYkOyQnOywsLCw1JQ==";

        // search text in PDF
        FindTextLocation[] findTextLocations = pdfToTextConverter.FindText(pdfFileName, textToFindTextBox.Text,
                    startPageNumber, endPageNumber, cbCaseSensitive.Checked, cbWholeWord.Checked);

        // open the PDF to search in PDF library
        pdfDocument = new Document(pdfFileName);

        // highlight the found text in PDF
        foreach (FindTextLocation findTextLocation in findTextLocations)
        {
            RectangleElement highlightRectangle = new RectangleElement(findTextLocation.X, findTextLocation.Y,
                findTextLocation.Width, findTextLocation.Height);
            highlightRectangle.BackColor = Color.Yellow;
            highlightRectangle.Opacity = 50;

            pdfDocument.Pages[findTextLocation.PageNumber - 1].AddElement(highlightRectangle);
        }

        // Save the modified PDF document in a memory buffer
        byte[] outPdfBuffer = pdfDocument.Save();

        // Write the memory buffer in a PDF file
        System.IO.File.WriteAllBytes(outputFileName, outPdfBuffer);
    }

    catch (Exception ex)
    {
        // The search failed
        MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
        return;
    }
    finally
    {
        // Close the PDF document
        if (pdfDocument != null)
            pdfDocument.Close();

        Cursor = Cursors.Arrow;
    }

    // Open the modified PDF document in default PDF viewer
    try
    {
        System.Diagnostics.Process.Start(outputFileName);
    }
    catch (Exception ex)
    {
        MessageBox.Show(String.Format("Cannot open highlighted PDF file '{0}'. {1}", outputFileName, ex.Message));
    }
}

EVO Classic PDF to Text for .NET

EVO Classic PDF to Text Converter for .NET can be used in any type of .NET application to extract text from PDF documents and to search for text in PDF documents. Integration with existing .NET applications is extremely easy and no installation is required to run the converter.

Code Sample - Extract Text from PDF Documents

Code Sample - Search Text in PDF Documents