In this article, we will continue our exploration of the iText library. However, today, we will take a closer look at working with existing files and the process of merging multiple PDFs into a single, comprehensive document.

In a previous article on the iText library, we delved into the creation of new PDF files in a piece titled ‘Introduction to PDF Manipulation With iText (Formerly iTextSharp).’ We also explored the concept of adding headers and footers in ‘Adding Header and Footer to a PDF Using the iText Library.’

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Now armed with this knowledge, we are ready to embark on the task of merging multiple PDFs into one cohesive file.

To download the source code for this article, you can visit our GitHub repository.

Let’s begin.

Merging Multiple PDFs

When it comes to merging documents with the iText library, the process is made easy thanks to the PdfMerger object.

Here we are using the iText library version 7, but recently a new version 8 was released. If you want to use it, you need to install the additional package:

dotnet add package itext.bouncy-castle-adapter

With this one, all the examples from our article will work without an issue. So, if you want, you can use the newest iText library.

PdfMerger Object in the iText Library

The PdfMerger object, as indicated by its name, is designed specifically for merging multiple PDF documents into a single, unified file. To accomplish this, it offers a dedicated method called Merge(). This method allows us to seamlessly integrate the content of source PDFs into the current one.

To effectively use this method, we typically follow a sequence of steps. First, we open the destination PDF for writing. Next, we associate this document with the PdfMerger object. Then, for each document we wish to merge into our destination, we open it and call the Merge() method to add its contents to our destination document:

  • Open PDF document for writing
  • Attach the opened document to PdfMerger object
  • For each source document
    • Open the source document
    • Use Merge() to merge it into the destination

Here’s a practical example:

public static void SimpleMerge(string[] pdfFiles, string mergedPdfFileName)
{
    using var writer = new PdfWriter(mergedPdfFileName);
    using var mergedPdfDocument = new PdfDocument(writer);

    var pdfMerger = new PdfMerger(mergedPdfDocument);
    foreach (var file in pdfFiles)
    {
        using var reader = new PdfReader(file);
        using var srcPdfDocument = new PdfDocument(reader);
        pdfMerger.Merge(srcPdfDocument, 1, srcPdfDocument.GetNumberOfPages());
    }
}

In this method, the essential code required to consolidate multiple PDF documents into a single file is presented. The steps for merging documents, as previously discussed, are reflected in the above code.

Merging Multiple PDFs With the Merge() Method of the PdfMerger Object

As demonstrated in the previous example, the Merge() method plays a pivotal role in the PDF merging process facilitated by the PdfMerger object. The method itself has two overloads:

PdfMerger Merge(PdfDocument from, int fromPage, int toPage);

PdfMerger Merge(PdfDocument from, IList<int> pages);

The first parameter in both overloads identifies the document whose pages we intend to copy, commonly referred to as the source document.

The second and third parameters of the first overload (fromPage and toPage) specify the range of pages to copy, indicating the starting and ending points within the source document.

The second overload allows us to define a collection of page numbers to merge. This option even allows us to rearrange pages by specifying them in a non-sequential order.

Closing the Source File

When working with files, we encounter the issue of resource management. It’s important to note that by default, the Merge() method copies selected pages into the destination document but doesn’t automatically close the source document.

However, in situations where this behavior is desired, we can call SetCloseSourceDocuments() to enable this behavior:

pdfMerger.SetCloseSourceDocuments(true);

Merging Multiple PDFs by Using PdfMerger Object in the iText Library

Armed with this knowledge, we can develop a more robust method for PDF merging. This method not only merges PDFs but also performs essential checks on input parameters:

public static class Merger
{
    private static void CheckParameters(string[] pdfFiles, string mergedPdfFileName)
    {
        if (string.IsNullOrWhiteSpace(mergedPdfFileName))
            throw new ArgumentOutOfRangeException(nameof(mergedPdfFileName));

        if (pdfFiles.Any(file => !File.Exists(file)))
            throw new ArgumentOutOfRangeException(nameof(pdfFiles));

        if (pdfFiles.Length == 0)
            throw new ArgumentOutOfRangeException(nameof(pdfFiles));
    }

    public static void Merge(string[] documents, string mergedDocument)
    {
        CheckParameters(documents, mergedDocument);

        using var writer = new PdfWriter(mergedDocument);
        using var mergedPdfDocument = new PdfDocument(writer);

        var pdfMerger = new PdfMerger(mergedPdfDocument);
        foreach (var file in documents)
        {
            using var reader = new PdfReader(file);
            using var srcPdfDocument = new PdfDocument(reader);
            pdfMerger.Merge(srcPdfDocument, 1, srcPdfDocument.GetNumberOfPages());
        }
    }
}

In this upgraded method, we initiate by examining the parameters. To accomplish this, we utilize a helper method that assesses both parameters and ensures that all arguments are valid. These improvements enhance the robustness and reliability of our PDF merging process.

Test Merging Documents Using the iText Library

Now that we’ve developed the necessary utilities, it’s time to put our new method to the test. In a previous article titled ‘Adding Header and Footer to a PDF Using the iText Library,’ we introduced a BigDocument class designed to generate test PDF documents.

Leveraging this class, we can effortlessly produce multiple test documents and merge them into a single, comprehensive PDF document.

Creating a Document Using BigDocument Class

To create a document using the BigDocument class, we can utilize the methods discussed in the earlier article:

public static string CreateDocument(string pdfFileName, PageSize pageSize)
{
    using var writer = new PdfWriter(pdfFileName);
    using var pdfDocument = new PdfDocument(writer);
    using var document = new Document(pdfDocument, pageSize, immediateFlush: false);

    try
    {
        var onlyNameOfTheFile = Path.GetFileName(pdfFileName);

        AddContent(document, onlyNameOfTheFile);
        PageXofYFooter(pdfDocument, document, pdfFileName);
    }
    finally
    {
        document.Close();
    }

    return pdfFileName;
}

In this code, we configure the necessary objects to create a PDF document. PdfWriter handles writing data to the disk, PdfDocument facilitates low-level PDF manipulation, and Document offers high-level document manipulation capabilities.

Subsequently, we insert random content into the file and append a footer at the document’s end.

Creating a Few Documents Using BigDocument Class

To prepare multiple documents, which we’ll later merge, we require a method that can create several documents, not just one:

public static IEnumerable<string> CreateFewDocuments(string folder, string documentPrefix,
    uint numberOfDocuments, PageSize? pageSize = null)
{
    var counter = 0;
    while (counter++ < numberOfDocuments)
    {
        var fileName = Path.Combine(folder, $"{documentPrefix}_{counter}.pdf");
        yield return CreateDocument(fileName, pageSize ?? GetRandomPageSize());
    }
}

This method accepts four parameters, with the last one being optional. The first parameter (folder) specifies the folder where the new documents will be stored. The second parameter (documentPrefix) sets a common prefix for all generated files, followed by a sequential number. The third parameter (documentCount) determines how many documents we wish to create. The last parameter (preferredPageSize) is optional and defines the page size for the newly generated documents. If not specified, the method will use a random page size for each document.

The core of the method generates a file name by appending a counter to the prefix and then calls the previously defined CreateDocument method to generate the document on the disk.

Joining Three A4 PDF Documents into One

To combine three A4-sized PDF documents into one, we can utilize our BigDocument class to create a merging method:

void MergeDocumentsOfTheSameSize(string documentsFolder)
{
    Console.WriteLine("Merge 3 PDF Documents with the Same Size\n\n");
    var documents = BigDocument.CreateFewDocuments(documentsFolder, "example", 3, PageSize.A4).ToArray();
    foreach (var document in documents)
    {
        Console.WriteLine($" * Document {Path.GetFileName(document)} created.");
        DisplayPDFFile(document);
    }

    Console.WriteLine("\n\nMerging documents ...\n");
    var mergedDocument = Path.Combine(documentsFolder, "merged_a4.pdf");
    Merger.Merge(documents, mergedDocument);

    Console.WriteLine($"\nDocuments merged into {Path.GetFileName(mergedDocument)}");
    DisplayPDFFile(mergedDocument);
}

This method takes a folder path (documentsFolder) where we store our files.

First, we create three A4-sized documents with random content and display their file names in the console.

Then, we create a new file named “merged_a4.pdf” in the same folder and merge all three documents into one using our Merger.Merge() method.

Note that, the resultant document maintains A4 page sizes throughout, as expected.

Joining PDF Documents with Different Page Sizes

When merging documents with identical page sizes, the resulting document naturally adopts that particular page size. For instance, merging five documents, each with A4-sized pages, yields a new PDF document with A4 page dimensions. However, what transpires when we attempt to merge documents with different sizes?

Let’s find out:

void MergeDocumentsOfDifferentPageSizes(string documentsFolder)
{
    Console.WriteLine("Merge 3 PDF Documents with Different Page Sizes\n\n");
    var documents = BigDocument.CreateFewDocuments(documentsFolder, "test", 3).ToArray();
    foreach (var document in documents)
    {
        Console.WriteLine($" * Document {Path.GetFileName(document)} created.");
        DisplayPDFFile(document);
    }

    Console.WriteLine("\n\nMerging documents ...\n");
    var mergedDocument = Path.Combine(documentsFolder, "merged_different.pdf");
    Merger.Merge(documents, mergedDocument);

    Console.WriteLine($"\nDocuments merged into {Path.GetFileName(mergedDocument)}");
    DisplayPDFFile(mergedDocument);
}

This method is nearly identical to the previous one, MergeDocumentsWithSamePageSizes(), with the distinction that in this method, we generate documents with different page sizes. Everything else remains the same.

The method will produce a PDF document that appropriately retains the distinct page sizes of each contributing document. Therefore, it produces a PDF document composed of pages with varying dimensions. While users may not commonly encounter such documents, using the iText Library, producing them is entirely feasible.

Resizing While Merging Multiple PDFs

When we merge documents with different page sizes, we notice that iText Library combines the original page sizes, creating a document with various page sizes. Now, we may wonder, “What if we want all pages to be the same size, even if the source documents had different sizes?”

In this scenario, the PdfMerger object alone won’t do the job. It merges source documents just as we provide them. So, we’ll need a special method to resize the source documents to the desired page size before merging them.

We’ll introduce two objects we haven’t explored yet: the PdfCanvas object and PdfFormXObject object. In this article, we’re mainly focusing on merging documents, so we won’t delve into the details of these two objects. Don’t worry; we’ll cover them in our upcoming articles.

For now, let’s understand their basic purposes:

  • PdfFormXObject: This is like a reusable graphical element that can be used on multiple pages. It can contain things like lines, images, and text.
  • PdfCanvas: This represents a tool for drawing lines, rectangles, text, and other graphical elements on a PDF page.

The ResizeToA5Method

To resize a document, we’ll read each page from the source document into a PdfFormXObject object. Then, we’ll scale this PdfFormXObject object to fit a page in the output document:

public static class Resizer 
{ 
    public static void ResizeToA5(string inputPdfDocument, string outputPdfDocument) 
    { 
        using var writer = new PdfWriter(outputPdfDocument);
        using var outputDocument = new PdfDocument(writer); 
        outputDocument.SetDefaultPageSize(PageSize.A5); 

        using var reader = new PdfReader(inputPdfDocument); 
        using var inputDocument = new PdfDocument(reader); 

        var (a5PageWidth, a5PageHeight) = (PageSize.A5.GetWidth(), PageSize.A5.GetHeight()); 
        for (var i = 1; i <= inputDocument.GetNumberOfPages(); i++) 
        { 
            var page = inputDocument.GetPage(i);
 
            var formXObject = page.CopyAsFormXObject(outputDocument); 
            var pdfCanvas = new PdfCanvas(outputDocument.AddNewPage()); 
            pdfCanvas.AddXObjectFittedIntoRectangle(formXObject, new Rectangle(0, 0, a5PageWidth, a5PageHeight)); 
        } 
    } 
}

We open the output PDF document in the first three lines and set the page size to A5. The following two lines are used to open the input document.

The crucial code is inside the for loop, where we scale it to fit the output document.

It’s important to note that this method is specific to resizing to A5 and is not generic.

Splitting Documents with PdfMerger Object in the iText Library

Despite the iText Library featuring a dedicated object known as PdfSplitter, there is room for creativity in employing PdfMerger to divide a document into odd and even pages.

Many individuals encounter challenges when printing double-sided PDF documents on printers that lack duplex capabilities. In such scenarios, the typical workaround involves printing only the odd pages initially, manually reinserting the paper, and subsequently printing the even pages.

Hence, having a method capable of splitting a document into odd and even pages proves beneficial. As we learned earlier, a PdfMerger object can accept a list of pages to copy from the source document into a merged document. We can utilize this feature for our purpose.

Rather than furnishing an array of source documents to the PdfMerger object, we’ll provide a single document and instruct it to copy odd pages exclusively. Subsequently, we’ll employ the same document but instruct the object to copy only even pages. We can elegantly achieve this through a small class:

public static class Splitter
{
    public static (string oddPages, string evenPages) Split(string sourcePdfFile)
    {
        using var reader = new PdfReader(sourcePdfFile);
        using var srcPdfDocument = new PdfDocument(reader);

        var sourceFilePath = Path.GetDirectoryName(sourcePdfFile)!;
        var oddPagesFileName = Path.Combine(sourceFilePath, "odd.pdf");
        ExtractPagesThatMatchCriteria(srcPdfDocument, oddPagesFileName, 
            pageNum => pageNum % 2 != 0);

        var evenPagesFileName = Path.Combine(sourceFilePath, "even.pdf");
        ExtractPagesThatMatchCriteria(srcPdfDocument, evenPagesFileName, 
            pageNum => pageNum % 2 == 0);

        return (oddPagesFileName, evenPagesFileName);
    }

    private static void ExtractPagesThatMatchCriteria(PdfDocument srcPdfDocument, 
        string resultPdfDocument, Func<int, bool> pageSelectionCriteria)
    {
        using var writer = new PdfWriter(resultPdfDocument);
        using var writerDocument = new PdfDocument(writer);

        var selectedPages = Enumerable
            .Range(1, srcPdfDocument.GetNumberOfPages())
            .Where(pageSelectionCriteria)
            .ToArray();

        var merger = new PdfMerger(writerDocument);
        merger.Merge(srcPdfDocument, selectedPages);
    }
}

Within this class, we find two significant methods. The primary workload is managed by the ExtractPagesThatMatchCriteria() method. This method’s purpose is to selectively merge pages that conform to specific criteria into a fresh document. This criterion is encapsulated within the Func delegate known as pageSelectionCriteria.

We can effortlessly segregate odd or even pages by employing this criterion. In the publicly accessible method named Split(), we initialize objects and subsequently invoke our supporting method while supplying lambda methods for selecting odd and even pages.

Finally, we return a tuple containing the names of the two recently generated PDF documents. This clever approach enables us to efficiently split a document into its odd and even components, providing a versatile solution for various printing scenarios.

Conclusion

In summary, we’ve seen how the iText Library simplifies the process of merging multiple PDFs into one.

The iText Library offers a dedicated object known as PdfMerger, which simplifies the merging process considerably. This article offers an in-depth look at how to use the library, with some clear examples of the techniques involved.

Furthermore, our exploration has unveiled the creative possibilities presented by the PdfMerger object. Besides merging, it can also effectively split a document as needed. Armed with these insights, we’re now well-equipped to harness the full potential of the PdfMerger object for our document management tasks.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!