Why use Puppeteer for PDF generation? Benefits, trade-offs, and alternatives
When Chromium-based rendering is the right choice, when it becomes an infrastructure burden, and which alternatives fit which PDF workloads.
Puppeteer is a strong choice for PDF generation when your document already exists as HTML/CSS and you need output that matches browser rendering — it handles modern layouts, web fonts, and SVG well. The trade-off is that every render depends on Chromium, which adds deployment size, cold-start latency, memory, and operational complexity. For high-volume or serverless workloads, a lighter library or a hosted PDF API is often a better fit.
When to use Puppeteer
- Use Puppeteer when you already have HTML/CSS and need browser-faithful rendering, and you control your own long-running server or container.
- Avoid it by default when you deploy to serverless (Vercel, Netlify, Lambda, Supabase Edge) or need high concurrency without operating a browser fleet.
- Main advantage: compatibility with modern web layouts — the same engine that renders your app renders your PDF.
- Main trade-off: you are now operating headless Chromium, reliably, at scale.
- Default recommendation: compare it against Playwright, PDFKit, pdf-lib, react-pdf, and hosted PDF APIs based on the document and the runtime — not only on how the output looks.
Disclosure: Paperbase is a hosted PDF API and one of the alternatives compared below. The trade-offs here are the ones we'd give a friend, including where Puppeteer is the right call.
What is Puppeteer PDF generation?
Puppeteer is a Node.js library that drives headless Chromium. For PDFs, the pattern is: launch a browser, load HTML or a URL into a page, and call page.pdf(). Because the renderer is Chrome, the output uses the same layout engine as the browser — CSS Grid, Flexbox, web fonts, and SVG all behave the way they do on screen. That fidelity is the whole appeal.
The mechanism is also the source of every downside on this page: each render carries a full browser with it.
Why do developers reach for it?
Three honest reasons. It is free and familiar — most developers already know the Chrome DevTools mental model. It gives real browser fidelity, so an existing HTML template renders without a rewrite. And it offers full control — headers, footers, margins, print media queries, and page ranges are all exposed. For a team running its own infrastructure with existing HTML documents, those three add up to a reasonable default.
When is Puppeteer the right choice?
Puppeteer earns its place when the constraints line up in its favor:
- You run your own VPS, container, or long-lived worker — you are not fighting a serverless function bundle limit.
- You need browser automation beyond rendering — navigating multiple pages, filling forms, waiting on client-side state, intercepting requests.
- You have existing, complex HTML/CSS templates you already maintain and don't want to reimplement in a document DSL.
- You want zero vendor dependency and have the engineering bandwidth to keep Chromium patched, monitored, and scaled.
If most of those are true, Puppeteer is a defensible, even preferable, choice. The rest of this article is about what happens when they aren't.
When should you avoid it?
Avoid Puppeteer as your first choice when you're shipping to a serverless platform, when the document is generated on demand at unpredictable concurrency, or when the person adding the feature — increasingly, a coding agent — has no appetite for operating a browser. In those cases the infrastructure cost dominates, and it shows up on day one of production, not later.
What are the infrastructure costs of running Chromium?
This is the loudest, most-documented pain, and it lands the moment local code hits production.
Bundle size. Chromium is roughly 100 MB. Vercel caps a serverless function at ~50 MB compressed (≈250 MB uncompressed), so the standard puppeteer package won't fit. The workaround is well-trodden: swap to puppeteer-core (no bundled browser) plus a serverless Chromium build like @sparticuz/chromium, fetched or bundled at runtime. Vercel maintains a knowledge-base walkthrough for exactly this — a fair signal that the unhappy path is the default path. Even the minimized build (@sparticuz/chromium-min, currently in the 143.x line) often has to be hosted on S3 or attached as a Lambda layer to stay under the limit.
Cold starts. When a function is idle and then invoked, the platform provisions a container, boots Node, and launches Chromium before the first render begins. On a Hobby-tier Vercel function the execution timeout can be shorter than that cold start. Community reports put serverless Puppeteer at several times slower than a dev machine unless you pay for more powerful function CPUs.
Failure strings you will meet. Could not find Chrome, libnss3.so: cannot open shared object file, and dev-vs-prod mismatches between local puppeteer and production puppeteer-core are recurring, documented on Vercel's own community forum. They are configuration problems, not bugs — but they are configuration problems every team rediscovers independently.
A representative "happy path" already carries this much ceremony:
// app/api/pdf/route.ts — Puppeteer on Vercel (serverless)
import chromium from "@sparticuz/chromium";
import puppeteer from "puppeteer-core";
export async function POST(req: Request) {
const { html } = await req.json();
const browser = await puppeteer.launch({
args: chromium.args,
executablePath: await chromium.executablePath(),
headless: true,
});
const page = await browser.newPage();
await page.setContent(html, { waitUntil: "networkidle0" }); // wait for assets
const pdf = await page.pdf({ format: "A4", printBackground: true });
await browser.close();
return new Response(pdf, { headers: { "Content-Type": "application/pdf" } });
}
That runs. It also hides the font, pagination, and asset issues below — which surface only once real documents flow through it.
Tested-on note (2026-06-30): reproduced on Next.js 15 (App Router) with
puppeteer-core+@sparticuz/chromium, Node 20, deployed to Vercel.
Why do table headers and page breaks break in Puppeteer PDFs?
Because Chromium's print path doesn't honor everything the screen does. The most-cited single failure is long tables: a <thead> that should repeat on every continuation page simply doesn't. This is a long-standing, reproducible Puppeteer issue present across versions, not user error.
The widely shared partial fix is:
thead { display: table-header-group; break-inside: avoid; }
tr { break-inside: avoid; }
It works often enough to be the standard advice — and inconsistently enough that developers only discover it after hours of debugging a report whose header vanished on page two. Related symptoms in the same family: page breaks landing inside a block that should stay together, and widows/orphans appearing with no protection. This cluster — repeating headers, break-inside, widows/orphans — is what we call pagination fidelity, and getting it right by default is harder than any single CSS snippet suggests.
Why do fonts look different in production than in dev?
Because the serverless Chromium build ships almost nothing. @sparticuz/chromium includes Open Sans and little else, so any other typeface must be loaded explicitly on the server. Spacing and hinting also shift unless you launch with --font-render-hinting=none. The result is the classic report: pixel-correct on your laptop, silently substituted to a fallback font in production. This is font determinism — and it's a real, recurring failure rather than a misconfiguration you caused.
Puppeteer vs Playwright for PDFs
Playwright is the close cousin: the same headless-browser model, a more modern automation API, and cross-browser support. For PDF generation specifically, it inherits Puppeteer's core trade-off almost exactly — you are still shipping and operating a browser binary, still hitting serverless bundle limits, still managing cold starts. Choose Playwright over Puppeteer for richer automation and multi-browser testing; the PDF-infrastructure calculus barely changes.
Puppeteer vs PDFKit, pdf-lib, and react-pdf
These are the programmatic alternatives — they draw the document instead of rendering a browser, so there's no Chromium to deploy.
- PDFKit / pdf-lib construct PDFs from primitives (text runs, vectors, existing pages). Tiny footprint, fully deterministic, no browser — but you build layout by hand, and there's no HTML/CSS.
- react-pdf (
@react-pdf/renderer) lets you write layouts in JSX with a constrained component set (View,Text,Image,Page) and a CSS subset. Flexbox works; CSS Grid and pseudo-selectors don't. In one May 2026 production comparison, react-pdf rendered invoices on Vercel in under 400 ms and added roughly 2 MB to the bundle, versus ~50 MB of deployment footprint for the Chromium route — while noting neither tool wins universally. The catch: it's a document API with React syntax, not "HTML compiled to PDF," so existing templates need a rewrite, and charts/SVG (e.g. recharts) are a known weak spot.
The honest rubric:
| Criterion | Puppeteer / Playwright | PDFKit / pdf-lib | react-pdf | Hosted PDF API (Paperbase) |
|---|---|---|---|---|
| Rendering model | Headless browser | Programmatic drawing | React → PDF primitives | Hosted browser-class rendering |
| HTML & CSS fidelity | Full (it's Chrome) | None (build by hand) | CSS subset, no Grid | Full HTML/CSS + Markdown in |
| Infrastructure | You operate Chromium | None | None | None (managed) |
| Performance | Cold starts, ~100 MB binary | Fast, tiny | ~sub-400 ms, ~2 MB | Managed; no cold start in your app |
| Pagination fidelity | Manual CSS, thead bug | Manual | Manual | Repeating headers / breaks by default |
| Dev & agent experience | Familiar, ops-heavy | Low-level | React-native feel | SDK + structured warning codes, agent-repairable |
| Best fit | Own server + browser automation | Precise, minimal docs | React apps, simple layouts | Serverless / AI-built apps needing branded reports |
| Poor fit | Serverless, high concurrency | Rich HTML layouts | Grid-heavy / chart-heavy docs | Cases needing full browser automation |
Can Puppeteer run reliably on Vercel, Lambda, or Supabase Edge?
Reliably enough to demo, yes. Reliably enough to forget about, rarely — without ongoing effort. On Vercel/Lambda you can make it work with puppeteer-core + @sparticuz/chromium and generous memory/timeout settings, but you own the bundle-size dance and the cold-start tax indefinitely. On Supabase Edge Functions (Deno, tight limits) it isn't a practical place to run Chromium at all — the common pattern is to call an external renderer instead. If your deployment target is serverless and you want to stop thinking about it, that's the signal to look past Puppeteer.
Alternatives by use case
- You run your own server and need browser automation + PDFs: Puppeteer or Playwright. This is their home turf.
- You need minimal, precise, deterministic documents and don't need HTML: PDFKit or pdf-lib.
- You have a React app and simple, non-Grid layouts: react-pdf.
- You're on serverless, or an agent is adding PDF export to an AI-built app, and the document is a branded report or proposal: a hosted PDF API. This is the case Paperbase is built for — you send HTML or Markdown, get back a paginated, brand-themed PDF, and never deploy a browser. The rendering is browser-class, but pagination fidelity (repeating headers, sane breaks, font determinism) is handled by default, and errors come back as structured warning codes a coding agent can act on.
// The same job, hosted — no Chromium to deploy
import { Paperbase } from "paperbase";
const pb = new Paperbase({ apiKey: process.env.PAPERBASE_API_KEY! }); // pb_live_...
const { url } = await pb.pdf.generate({
input: { type: "markdown", content: report },
template: "report",
theme: { accent_color: "#ff4e8c" },
});
Can I use both?
Yes, and many teams should. A common split: keep Puppeteer on a dedicated worker for the jobs that need genuine browser automation, and route on-demand, serverless, or agent-generated document rendering to a hosted API so it never touches your function bundle. The decision is per-workload, not per-company.
FAQ
Is Puppeteer good for PDF generation? Yes, when you already have HTML/CSS and run your own server. Its browser fidelity is excellent. The cost is operating headless Chromium — deployment size, cold starts, and pagination/font edge cases you fix yourself.
Why won't my Puppeteer table header repeat on every page?
It's a known Chromium print behavior. Add thead { display: table-header-group; break-inside: avoid; }. It helps but isn't fully reliable across cases — see puppeteer#10020.
Why does my PDF font change in production?
Serverless Chromium ships almost no fonts (Open Sans by default). Load your fonts explicitly server-side and launch with --font-render-hinting=none.
How big is the Chromium binary, and why does it matter?
About 100 MB, against Vercel's ~50 MB compressed function limit — which is why serverless deployments need puppeteer-core plus a slimmed Chromium build.
Puppeteer or Playwright for PDFs? For PDFs specifically, near-identical trade-offs. Prefer Playwright for richer, cross-browser automation; the infrastructure cost is the same.
Recommendation
Use Puppeteer when you control your own runtime and need browser-faithful rendering or browser automation — it's a solid, well-understood tool on that turf. Move off it when you're on serverless, when concurrency is unpredictable, or when an agent is wiring up PDF export and there's no one to babysit Chromium. In those cases, a programmatic library or a hosted PDF API removes the entire infrastructure surface.
Need browser-quality output without operating Chromium? Render the same document with Paperbase and compare the implementation side by side.