How to convert HTML to PDF in Python using PDFShift's package.

PDFShift provides a Python package available for free at https://github.com/pdfshift/pdfshift-python

Documentation

See the full documentation on PDFShift's documentation.

Installation

You should not require this code directly. Instead, just run:

pip install --upgrade pdfshift

or

easy_install --upgrade pdfshift

Requirements

Usage

This library needs to be configured with your api_key received when creating an account.
Setting it is easy as:

import pdfshift
pdfshift.api_key = 'your_api_key_here'

The sandbox parameter allows you to do unlimited conversion, but will add a watermark on top of the generated document.
No credits are deduced from your account when the sandbox mode is on.

You can set it like this:

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert('https://www.example.com', sandbox=True)
with open('result.pdf', 'wb') as output:
    output.write(binary_file)

With an URL

Converting an URL with PDFShift is really easy. All you have to do is send a POST request with the source parameter set to the URL, like the following:

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert('https://www.example.com')

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

With inline HTML data:

To convert a raw HTML data with PDFShift, simply send the raw string in the source parameter:

import pdfshift
pdfshift.api_key = 'your_api_key_here'

document = open('invoice.html', 'r')
document_content = document.read()
document.close()

binary_file = pdfshift.convert(document_content)
with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Custom HTTP Headers

You can pass custom HTTP headers, allowing you to adapt to the server handling your source. This can be a custom identification header, changing the language, or anything else.

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert(
    'https://httpbin.org/headers',
    http_headers={
        'X-Original-Header': 'Awesome value',
        'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'
    }
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Accessing secured pages

If your source requires a BASIC AUTH mechanism, you can either use the custom headers part, or use the auth parameter from the API that behaves the same.

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert('https://httpbin.org/basic-auth/user/passwd', auth=('user', 'passwd'))

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Using cookies

Cookies might help you access unauthorized areas that aren't restricted by a simple Basic Auth mechanism. You can define as many cookies as you want.

import pdfshift
pdfshift.api_key = 'your_api_key_here'

# cookies is a list of dict
# That way, you can add as many dict (cookie) as you want.
binary_file = pdfshift.convert(
    'https://httpbin.org/cookies',
    cookies=[
        [{'name': 'session', 'value': '4cb496a8-a3eb-4a7e-a704-f993cb6a4dac'}]
    ]
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Loading CSS from an URL:

By passing a css parameter, you will be able to modify the page with your CSS.
This allows you to customize the rendering of the page.

You can also call multiple CSS by calling a root css (like "print.css" in that case) that will call @import in it for each CSS files.

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert(
    'https://www.example.com',
    css="https://www.example.com/public/css/print.css"
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Loading CSS from a string:

Like for the source parameter, you can pass a raw set of CSS rules to the css parameter and they will be injected to the loaded document.

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert(
    'https://www.example.com',
    css="a {text-decoration: underline; color: blue}"
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Adding Watermark

Some documents that you share need a watermark to clearly identify your brand. That's easy with PDFShift:

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert(
    'https://www.example.com',
    watermark={
        'image': 'https://pdfshift.io/images/logo.png',
        'offset_x': 50,
        'offset_y': '100px',
        'rotate': 45
    }
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

You can add some custom header or footer to your generated document. These are often used to indicate the current page, or show the logo of your company on every page.

Note that the header and footer are not related to the body. For this reason, the CSS in your body doesn't apply to your header/footer.
By default, the font-size will be really small. You will have to set it manually, like in the following example:

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert(
    'https://www.example.com',
    footer={
        'source': '<div style="font-size: 12px">Page {{page}} of {{total}}</div>',
        'spacing': '50px'
    }
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

Protecting the generated PDF

Protecting your document is easy with PDFShift. You can specify a password for the user and for the owner.
(The owner will have full rights access while the user will have limited access based on your choice).

Please keep in mind that some PDF reader doesn't respect the rights as long as the user is authenticated.
This means that if you set an empty password for the user, with no rights to print or copy, some PDF reader will ignore this and still allow printing and copying.

This is outside of our capabilities here at PDFShift as we can't enforce a reader to respect PDF's standard.

import pdfshift
pdfshift.api_key = 'your_api_key_here'

binary_file = pdfshift.convert(
    'https://www.example.com',
    protection={
        'user_password': 'user',
        'owner_password': 'owner',
        'no_print': True
    }
)

with open('result.pdf', 'wb') as output:
    output.write(binary_file)

(Read our API documentation for more, in depth, details.)