How to Convert HTML to PDF for Archival (When a URL Just Isn't Enough)

pavan

Jul 03, 2026

6 min read

You've just read a genuinely useful article, a critical piece of documentation, or a research paper — and you know from experience the URL might not still work in six months. Site owners rotate content, take down old posts, redirect archives to sales pages, or simply go out of business.

The reliable long-term answer: convert the HTML page to a PDF. Here's how to do that properly.

Why "just bookmark it" fails long-term

URL rot is more common than most people realise. A 2021 study found that ~10% of academic citation URLs stop working within four years. General web content rots faster. If it's worth reading now, it's worth preserving.

PDF is the archival format of record. Every device on the planet opens it, the format has been stable for decades, and it'll still be readable in 2050. The Internet Archive works, but PDF gives you a copy you own that doesn't depend on anyone else's infrastructure.

The fastest way

Two paths depending on where you're starting:

From a browser tab: Ctrl/Cmd + P → Save as PDF. This works on every browser and preserves the page as it looks to you. It's the fastest option if the article is already loaded in front of you.
From a URL or HTML file: Use the HTML to PDF converter to drop in a file or paste the raw HTML. Useful for archiving pages you don't have open, or for batch-converting saved HTML files from your download folder.

Making the browser method behave

Ctrl/Cmd + P → Save as PDF is the classic browser approach, but the default settings often produce ugly output — ads, cookie banners, sidebars, and navigation menus all preserved into the PDF, wasting space and cluttering the archive.

Before saving, use browser "Reader Mode":

Safari: click the "AA" icon → Show Reader.
Firefox: click the reader icon in the address bar (only appears on articles).
Chrome/Edge: Use an extension like "Reader Mode" or "Print Friendly".

Reader Mode strips everything except the article text and inline images. Then Ctrl/Cmd + P → Save as PDF produces a clean, focused archive of the actual content.

Save the URL and date too

If you're archiving research or reference material, add the source URL and the date you accessed it to the top of the PDF. Most browser PDF exports do this automatically (header shows URL, footer shows date). Verify before saving — the metadata matters more than the content itself sometimes.

When the browser method breaks

Some pages don't convert well through the browser:

Pages behind login walls: the browser method works if you're logged in and viewing the page. The online converter can't reach content behind auth walls unless you save the HTML locally first.
Pages that load content via JavaScript: some sites lazy-load images or comments only when you scroll. Scroll all the way through before saving to trigger everything.
Pages with sticky headers/footers: these often repeat on every printed page, wasting a lot of vertical space. Reader Mode fixes this.
Pages with modal popups: dismiss any cookie banners or subscription prompts before printing.

For scale: converting saved HTML files

If you've been saving pages as HTML files (browsers often let you "Save Page As" a complete .html file), you can batch-convert them to PDF later. Load them into the HTML to PDF converter one at a time, or as a batch on Pro.

This is also the workflow for archiving documentation you've written yourself — draft in HTML or Markdown, convert to PDF for the archive copy.

Image handling

Inline images preserve at their original resolution as long as they were loaded when you converted. Two common issues:

Lazy-loaded images: modern sites often defer loading images until you scroll to them. If you didn't scroll past an image before converting, it may not have loaded yet, and the PDF will show a placeholder or nothing.
External-hosted images: if the article embeds images from another domain that later goes offline, the images will 404 on future views. If archival matters, save the images locally and use a converter that embeds images inline.

What about JavaScript-heavy interactive articles?

Interactive charts, embedded videos, and JavaScript-driven visualisations don't survive conversion to PDF — they flatten to their initial state. For a chart, that's usually fine (you get the static image of the initial view). For an interactive tool or video, you lose the interactivity entirely. Consider taking screenshots of key interactive states as supplementary material.

Structural preservation

What survives cleanly:

Article text.
Headings (H1, H2, H3) with their hierarchy.
Inline images (as long as they were loaded).
Bulleted and numbered lists.
Tables (usually).
Hyperlinks (become clickable in the PDF).
Basic formatting (bold, italic, code blocks).

What often shifts:

Multi-column layouts collapse to single-column.
Sidebars and pull-quotes reflow inline.
Custom fonts substitute if the browser rendered them in a way the PDF export doesn't.

Organising your archive

Practical suggestions:

Name files with date + source: 2026-07-03-nytimes-article-title.pdf.
Store in a folder structure by topic or by year.
Back up to two places (local drive + cloud storage).
Consider a proper reference manager (Zotero, Mendeley) if you're archiving academic sources — they handle PDF storage, citation, and searchable metadata together.

Bottom line

URLs are ephemeral, PDFs are archival. Use Reader Mode plus browser Print-to-PDF for quick individual saves, or the online converter for saved HTML files. Preserve the URL and access date, watch out for lazy-loaded content, and know that interactive elements flatten. For anything worth remembering, PDF is the format that'll still be there in ten years.

Convert HTML to PDF now

Drop an HTML file or paste raw HTML, get a clean PDF back. Free tier handles files up to 10 MB.

Open the converter →