Create and Modify PDF Files in Python :

Create and Modify PDF Files in Python
blow post content copied from  Real Python
click here to view original post

It’s really useful to know how to create and modify PDF (portable document format) files in Python. This is one of the most common formats for sharing documents over the Internet. PDF files can contain text, images, tables, forms, and rich media like videos and animations, all in a single file.

This abundance of content types can make working with PDFs difficult. There are several different kinds of data to decode when opening a PDF file! Fortunately, the Python ecosystem has some great packages for reading, manipulating, and creating PDF files.

In this tutorial, you’ll learn how to:

  • Read text from a PDF with pypdf
  • Split a PDF file into multiple files
  • Concatenate and merge PDF files together
  • Rotate and crop pages in PDF files
  • Encrypt and decrypt PDF files
  • Create and customize PDF files from scratch with ReportLab

To complete this learning, you’ll use two different tools. You’ll use the pypdf library to manipulate existing PDF files and the ReportLab library to create new PDF files from scratch. Along the way, you’ll have several opportunities to deepen your understanding with exercises and examples.

To follow along with this tutorial, you should download and extract to your home folder the materials used in the examples. To do this, click the link below:

Extracting Text From PDF Files With pypdf

In this section, you’ll learn how to read PDF files and extract their text using the pypdf library. Before you can do that, though, you need to install it with pip:

$ python -m pip install pypdf

With this command, you download and install the latest version of pypdf from the Python package index (PyPI). To verify the installation, go ahead and run the following command in your terminal:

$ python -m pip show pypdf
Name: pypdf
Version: 3.8.1
Summary: A pure-python PDF library capable of splitting,
 merging, cropping, and transforming PDF files
Author-email: Mathieu Fenniak <[email protected]>
Location: .../lib/python3.10/site-packages

Pay particular attention to the version information. At the time of publication for this tutorial, the latest version of pypdf was 3.8.1. This library has gotten plenty of updates lately, and cool new features are added quite frequently. Most importantly, you’ll find many breaking changes in the library’s API if you compare it with its predecessor library PyPDF2.

Before diving into working with PDF files, you must know that this tutorial is adapted from the chapter “Creating and Modifying PDF Files” in Python Basics: A Practical Introduction to Python 3.

The book uses Python’s built-in IDLE editor to create and edit Python files and interact with the Python shell, so you’ll find occasional references to IDLE throughout this tutorial. However, you should have no problems running the example code from the editor and environment of your choice.

Reading PDF Files With PdfReader

To kick things off, you’ll open a PDF file and read some information about it. You’ll use the Pride_and_Prejudice.pdf file provided in the downloadable resources for this tutorial.

Open IDLE’s interactive window and import the PdfReader class from pypdf:

>>> from pypdf import PdfReader

To create a new instance of the PdfReader class, you’ll need to provide the path to the PDF file that you want to open. You can do that using the pathlib module:

>>> from pathlib import Path

>>> pdf_path = (
...     Path.home()
...     / "creating-and-modifying-pdfs"
...     / "practice_files"
...     / "Pride_and_Prejudice.pdf"
... )

The pdf_path variable now contains the path to a PDF version of Jane Austen’s Pride and Prejudice.

Now create the PdfReader instance by calling the class’s constructor with the path to your PDF file as an argument:

>>> pdf_reader = PdfReader(pdf_path)

Read the full article at »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

May 31, 2023 at 07:30PM
Click here for more details...

The original post is available in Real Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.