2 minute read

Introduction

Generating PDFs from web pages is an essential feature for many applications, especially those dealing with reports, invoices, and form submissions. This blog post delves into using Puppeteer, a popular Node.js library, in a C# environment to achieve seamless PDF generation. We will cover the critical setup and execution steps, ensuring your PDFs are generated efficiently and with the desired formatting.

Clone the repository from GitHub: PdfGenerator GitHub Repo.

Understanding the Code

The provided code snippet demonstrates how to configure Puppeteer for PDF generation using Selenium for browser authentication. Let’s break it down step-by-step.

Key Components

  1. Authentication:
    var authenticator = new SeleniumAuthenticator();
    

    The SeleniumAuthenticator is utilized to get the path to the installed Chrome browser, ensuring that the Puppeteer instance can launch Chrome correctly.

  2. PDF Options Configuration:
    PdfOptions = new PdfOptions
    {
        PrintBackground = true,
        Format = PaperFormat.A4,
        MarginOptions = new MarginOptions
            { Bottom = "0px", Left = "0px", Right = "0px", Top = "0px" },
        Scale = 1m
    };
    

    Here, the PDF options are configured to ensure that the PDF captures background graphics, adheres to A4 paper format, and has no margins. The scale is set to 1, maintaining the original size of the content.

  3. Launch Options:
    LaunchOptions = new LaunchOptions
    {
        Headless = true,
        ExecutablePath = authenticator.GetInstalledBrowserPath("chrome"),
        Timeout = Timeout,
        IgnoreHTTPSErrors = true
    };
    

    The browser is launched in headless mode, meaning it operates without a visible UI, making it suitable for automated environments. The path to Chrome is fetched, and HTTPS errors are ignored for more robust handling of web pages.

  4. Page Creation and PDF Generation:
    using (var page = Browser.NewPageAsync().GetAwaiter().GetResult())
    {
        await page.SetCookieAsync(cookieParam);
        await page.SetUserAgentAsync(UserAgent);
        var response = await page.GoToAsync(item.Href, WaitUntilNavigation.DOMContentLoaded);
    }
    

    A new page is created for the PDF generation. Cookies are set for authentication, and a user agent is defined to mimic a regular browser. The page navigates to the specified URL, waiting until the DOM content is fully loaded.

  5. Successful PDF Generation:
    if (response.Status == HttpStatusCode.OK)
    {
        await page.PdfAsync(docPhysicalPath, PdfOptions);
        var file = new FileInfo(docPhysicalPath);
        PropertySetUtilities.SetItemPropertyValue(item, ContentServerProperties.FileSizeProperty, file.Length);
        Log.WriteLine(LogLevel.Normal, ContentServerConstant.LogName, "Download success for Form '{0}': {1} in {2} ms | retryCount: '{3}'", item.Name, id, watch.ElapsedMilliseconds, retryCount);
        return true;
    }
    

    If the navigation is successful, the page is exported as a PDF to the specified path. The file size is updated in the item’s properties for future reference, and a success message is logged.

Conclusion

Using Puppeteer for PDF generation in C# is a powerful way to automate document creation directly from web content. By setting up browser options effectively and managing cookies for authentication, you can ensure reliable and high-quality PDF outputs.

Incorporate this method into your applications to enhance functionality and improve user experience. Whether you’re generating reports, invoices, or forms, this approach will streamline your document processing needs.


By adopting these practices, you’ll be well on your way to creating robust PDF generation workflows in your applications!

Leave a comment