. Advertisement .
..3..
. Advertisement .
..4..
Many libraries have been written to save HTML as PDF in Python. This tutorial will introduce you to some of them.
Save HTML As PDF In Python
With pdfkit
pdfkit is a Python wrapper for wkhtmltopdf, an open-source program tool to render HTML into PDF. Thanks to the Qt Webkit engine under the hood, this library runs completely headless and doesn’t require a display service. pdfkit provides modules for both Python 2 and 3 versions.
You will need to install pdfkit with pip:
pip install pdfkit
Install wkhtmltopdf:
- Go to the download page of wkhtmltopdf and grab the correct installer for your system. The developers have made pre-built binaries for Windows, macOS, and various Linux distributions.
- Linux users can also install wkhtmltopdf directly from their system’s repository. For instance, use this command on Debian/Ubuntu:
sudo apt-get install wkhtmltopdf
You can also install it with Homebrew on macOS:
brew install homebrew/cask/wkhtmltopdf
Let’s say you have this index.html
file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<style>
.text {
margin: auto;
width: 60%;
padding: 70px 10px;
}
</style>
<title>ITTutoria</title>
</head>
<body>
<div class="text">
<h1>Welcome to ITTutoria.net!</h1>
<p>We want to connect the people who have the knowledge to the people who need it, to bring together people with different perspectives so they can understand each other better, and to empower everyone to share their knowledge.</p>
</div>
</body>
</html>
......... ADVERTISEMENT .........
..8..
The following code can convert it to an out.pdf
file in the same directory.
import pdfkit
pdfkit.from_file('index.html', 'out.pdf')
You can also pass the PDF file as an opened file in Python:
with open('index.html') as f:
pdfkit.from_file(f, 'out.pdf')
The from_file()
function should return True after each successful conversion.
The module supports every wkhtmltopdf’s option as well, which you can pass as a dictionary to the options parameters.
Example:
options = {
'page-size': 'A4',
'margin-top': '1in',
'margin-right': '1in',
'margin-bottom': '1in',
'margin-left': '1in',
'encoding': "UTF-8",
}
pdfkit.from_file('index.html', 'out.pdf', options=options)
With weasyprint
WeasyPrint is a feature-rich solution that can help you create PDF documents using gorgeous templates for tickets, invoices, reports, and so much more. This rendering engine is open-source and based on many libraries.
Install the weasyprint with pip in a virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install weasyprint
weasyprint --info
In addition to the command line interface, you can invoke WeasyPrint with its Python module and convert the index.html file into PDF.
from weasyprint import HTML
HTML(filename = 'index.html').write_pdf('out.pdf')
There are other parameters for this conversion:
- encoding: the character encoding the module needs to apply to the source file.
- media_type: the media type for @media in HTML.
- base_url: the base URL that will be used to resolve your HTML file’s relative URLs.
With pyhtml2pdf
This is a simple wrapper for Selenium (its headless Chrome in particular) to convert HTML to PDF.
Install pyhtml2pdf via pip:
pip install pyhtml2pdf
The module comes with the convert()
function of the converter class. When you invoke it, the function will call on the headless Chrome driver to load and render your HTML file, then convert it to a PDF file.
Example:
import os
from pyhtml2pdf import converter
path = os.path.abspath('index.html')
converter.convert(f'file:///{path}', 'out.pdf')
Summary
You can use a lot of third-party modules to save HTML as PDF in Python, such as pdfkit, weasyprint, or pyhtml2pdf. They use different engines to render your HTML file before converting it to PDF format. Want to do the same thing, but with JavaScript? Check out this guide.
Leave a comment