Product and Development Musings

Efficiently Convert PDF to Multipage TIF

24. May 2010 08:28

One of our ImageMan.Net customers recently asked us to help him convert a multipage PDF file to a single multipage TIF.  His initial attempts to do so were causing memory issues due the resolution and color depth of the source PDF as well as the approach he was taking. 

Inefficient Approach

The less optimized, but convenient, way to solve this problem would be to load the PDF file into an ImageMan.Net image collection (as a standalone ImageCollection object or part of the Viewer's Images property).  Then you can simply save that collection using one of the many overloaded Save methods.  ImageMan.Net handles appending each image in the collection to a single file.

For example, here's one way to simply save the entire collection to a file called image.tif:

[C#]

myImageCollection.Save("image.tif");

[VB.Net]

myImageCollection.Save "image.tif"

Or if you are using a Viewer, then you can save the collection using the Viewer's Save method which will loop through the Images property and create a single file called image.tif:

[C#]

viewer1.Save("image.tif");

[VB.Net]

Viewer1.Save "image.tif"

Additionally, these Save methods have overloads that will allow you to specify the start page, the total number of pages and the image encoder to use.

The problem with this approach is that the larger your ImageCollection, the larger the amount of memory your program will use while running.  Our client was running out memory while trying to process the PDF.

Efficient Approach

The most efficient and less memory intensive (and lowest level) way to solve this problem is to make use of the PdfDecoder class to loop through the pages of the source PDF.  With the PdfDecoder you can also modify some of the source PDF's image details including the resolution and color depth.  Then, for each page in the source image, you can create and append to a single output image file using the TifEncoder's Save method.

At the end of this article, I'll provide links to working sample apps that will contain the code to perform the PDF to TIF conversion.  This sample app will cover all of the details on how to properly work with streams to create the decoder and encoder instances.  I'll cover some of the highlights below:

To loop through the pages of the source PDF, you'll need an instance of the PdfDecoder class which will allow you to retrieve the number of pages (using the Pages property) and set the current page (using the Page property).  Then, you can use an instance of the TifEncoder class to append each page to an output file using the Save method:

[C#]

int totalPageCount = pdfDecoder.Pages;
int currentPageNumber = 0;
 
while (totalPageCount-- > 0)
{
    pdfDecoder.Page = currentPageNumber++;
 
    ImImage tempImage = pdfDecoder.Load(null);
    tifStream.Seek(0, SeekOrigin.Begin);
    tifEncoder.Save(tifStream, tempImage, null);
    tempImage.Dispose();
}

[VB.Net]

Dim totalPageCount As Integer = pdfDecoder.Pages
Dim currentPageNumber As Integer = 0
 
Do While totalPageCount > 0
    pdfDecoder.Page = currentPageNumber
    currentPageNumber += 1  
        
    Dim tempImage As ImImage = pdfDecoder.Load(Nothing)
    tifStream.Seek(0, SeekOrigin.Begin)
    tifEncoder.Save(tifStream, tempImage, Nothing)
    tempImage.Dispose()
 
    totalPageCount = totalPageCount - 1
Loop

The decoder and encoder classes allow you to perform some very low level operations on images that will help your image processing code become much more efficient.  For more details on these ImageMan.Net classes, please refer to the documentation. 

The following sample apps are available online:

Log in