Tag Archives: pypdf

Python || Pdf Merge Using PyPdf

The following is a simple pdf file merger program which utilizes the “pyPdf” library to manipulate pdf files. This program has the ability to merge entire selected pdf files together, and save the selected files into one single new pdf file.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

PyPdf - What Is It?
How To Create Executable Python Programs
Display The Time In Python
Metadata With PyPdf
Pdf Merge Executable File - Click Here To Download

This program first asks the user to place the pdf file(s) they wish to merge into a specified folder. The default input folder is titled “Files To Merge.” After the input pdf file(s) have been placed into the specified input folder, the program prompts the user to select which file(s) they wish to merge together. As soon as the input pdf file(s) have been selected, the file merging begins, with the files being saved to the output pdf file in the exact same order as specified by the user. As soon as the file merging is complete, the single merged pdf file is saved into an output folder titled “Completed Merged Files.”


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Click here to download a Windows executable file demonstrating the above use.

Python || Pdf Split & Extract Using PyPdf

The following is a simple pdf file split & extractor program which utilizes the “pyPdf” library to manipulate pdf files. This program has the ability to extract selected pages from an existing pdf file, and save the extracted pages into a new pdf file.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

PyPdf - What Is It?
How To Create Executable Python Programs
Display The Time In Python
Metadata With PyPdf
Pdf Split Executable File - Click Here To Download

This program first asks the user to place the pdf file(s) they wish to extract pages from into a specified folder. The default input folder is titled “Files To Extract.” After the input pdf file(s) have been placed into the specified input folder, the program prompts the user to select which file they wish to extract pages from. As soon as an input pdf file has been selected, the user is asked to enter in the page numbers they wish to extract from the specified input pdf file. After the page extraction is completed, the selected pages are merged into one single pdf file, and is saved into an output folder titled “Completed Extracted Files.”


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Click here to download a Windows executable file demonstrating the above use.

Python || How To Install PyPdf Using Python 3

pyPdf” is a pure Python library built as a PDF toolkit. It is capable of:

• Extracting document information (title, author, ...),
• Splitting documents page by page,
• Merging documents page by page,
• Cropping pages,
• Merging multiple pages into a single page,
• Encrypting and decrypting PDF files.

By being pure Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

=== 1. INSTALLING PYPDF ===

A Windows installer for pyPdf is available here. At the time of this writing, the installer that was listed on the download page was titled “pyPdf-1.13.win32.exe.”

Note: If you are using Python 2 and want to install pyPdf, the Windows installer available on the download page should be all you need. No further installation instructions are necessary.

Next, follow the command prompts from the installer and wait!

When the pyPdf installer is completed, you should see the newly installed files which correspond to this module located in the following directory:


C:\Python32\Lib\site-packages\pyPdf

Note: Python 3.2 is installed on my system, and the above directory reflects that. The Python directory may be named something different on your individual system depending on the Python version you are using.

If everything installed correctly, proceed to the next step.

=== 2. UPDATE EXISTING PYPDF FILES ===

pyPdf was originally written for Python 2, but a Python 3 compatible branch has since been made available. The updated files can be found here, and enable pyPdf to be integrated with Python 3.

To update these new Python 3 files with the old Python 2 files, locate the following directory on your system:


C:\Python32\Lib\site-packages\pyPdf

Here is a sample screenshot demonstrating the files which resides in the above directory: (click to enlarge)

Next, update all of the old Python 2 files that’s listed in the above directory with the new Python 3 files that’s listed on this page.

One way to update the files that’s currently stored on your computer is to locate the file on your system with the exact same file name as listed here, and copy/paste the contents of the new file into the contents of the old file that’s currently stored on your computer.

Or you can download all of the files at once here, and move/replace these new files with the existing files that’s currently located on your system.

=== 3. UBUNTU USERS ===

If you are an Ubuntu Linux user and want to install pypdf, open the terminal and run the following command:

NOTE: Replace “python3.2” with whatever version of python that’s installed on your system.


sudo mkdir /usr/local/lib/python3.2/dist-packages/pyPdf


Next, copy and paste the files located in the above download link into the following directory:


/usr/local/lib/python3.2/dist-packages/pyPdf


Once the following steps are completed, you should now be ready to use pyPdf with Python 3 programs!