Using cookies to convert HTML documents to PDF in Python with aiohttp
In this comprehensive guide, we will demonstrate the step-by-step process of incorporating custom Cookies into your URL request for PDF conversion using PDFShift's API. By leveraging this functionality, you can maintain an active session, enabling authentication as a specific user before generating the PDF document.
To do so, we'll use the cookies
parameter when sending a request to PDFShift. It expects a list of dictionnary that contains the name and the value of the cookies (more details after):
import aiohttp, asyncio, json, base64
# You can get an API key at https://pdfshift.io
api_key = 'sk_xxxxxxxxxxxx'
params = {
'source': 'https://www.example.com',
# The "cookies" parameter expects a list of dictionnary that contains the name and the value of the cookies
'cookies': [
{ 'name': 'PHPSESSID', 'value': 'el4ukv0kqbvoirg7nkp4dncpk3' }
]
}
response = None
try:
auth = base64.b64encode(
'api:{}'.format(api_key).encode('utf-8')
).decode('utf-8')
async with aiohttp.ClientSession() as session:
async with session.post(
'https://api.pdfshift.io/v3/convert/pdf',
headers={'Authorization': f'Basic {auth}'},
json=params
) as response:
if response.status >= 400:
raise Exception('Invalid request: {}'.format(await response.text()))
with open('result.pdf', 'wb') as f:
f.write(await response.read())
print('The PDF document was generated and saved to result.pdf')
except asyncio.TimeoutError:
raise Exception('The request took too long to process')
except aiohttp.ClientError as e:
raise Exception(f'An error occurred: {e}')
except Exception as e:
# We highly recommend you to handle exceptions. Often, PDFShift will provide you with a clear explanation about what happened.
# Moreover, in case of error, no PDF are returned !
raise Exception(f'An error occurred: {e}')
The cookies
parameter accept the following parameters:
name
: The name of the cookie.value
: The value of the cookie.secure
: A boolean (defaults to false) that will tell the browser to only send the cookie if the request is being sent over HTTPS.http_only
: A boolean (defaults to false) that will tell the browser to not expose it to client-side scripts.
Like in our guide to send custom HTTP headers or to access secured pages, this allows you to protect your documents from any visitors while allowing PDFShift to access the page and convert it to PDF.
For further details on the cookies
property and its usage, please refer to our dedicated documentation.
We hope this guide was helpful. If you have any questions or noticed any issues on the code above,
feel free to drop us a line.