SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 Python Specific Formats Processing Projects
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
The following ResumeService extracts the content from a PDF using pypdf
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
-
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Project mention: Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files | news.ycombinator.com | 2025-05-26I wonder how this compares to csvkit [1].
[1]: https://csvkit.readthedocs.io/
-
-
-
I use uv to invoke the CLI of the markdown package in Python. I then copy the output back into the clipboard.
-
InfluxDB
InfluxDB β Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
-
-
xlwings
xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.
-
-
-
-
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
-
-
Project mention: Show HN: PlutoPrint β Generate Beautiful PDFs and PNGs from HTML with Python | news.ycombinator.com | 2025-08-20
PlutoPrint supports a large subset of CSS, including flexbox for most common layouts, but itβs not a full browser engine, so there are some limitations. You can see a more complete list of supported features here: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m.... Weβre also actively tracking bugs and improvements on the GitHub repo: https://github.com/plutoprint/plutoprint/issues, and contributions or test cases are always appreciated to help expand coverage.
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Specific Formats Processing discussion
Python Specific Formats Processing related posts
-
WeasyPrint
-
Show HN: PlutoPrint β Convert HTML to Beautiful PDFs and PNGs with Python
-
Kreuzberg: The Python Document Intelligence Framework That Will Blow Your Mind!
-
Copy Markdown to Teams
-
Lyon Drops Microsoft to Boost Digital Sovereignty
-
Show HN: PlutoBook β Fast, lightweight C++ library for generating PDF from HTML
-
plutoprint VS WeasyPrint - a user suggested alternative
2 projects | 19 Jun 2025 -
A note from our sponsor - SaaSHub
www.saashub.com | 16 Nov 2025
Index
What are some of the best open-source Specific Formats Processing projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | PyPDF2 | 9,590 |
| 2 | PyMuPDF | 8,417 |
| 3 | WeasyPrint | 8,333 |
| 4 | csvkit | 6,277 |
| 5 | python-docx | 5,285 |
| 6 | tablib | 4,728 |
| 7 | Python-Markdown | 4,112 |
| 8 | XlsxWriter | 3,860 |
| 9 | borb | 3,541 |
| 10 | Camelot | 3,510 |
| 11 | xlwings | 3,264 |
| 12 | python-pptx | 3,028 |
| 13 | Mistune | 2,898 |
| 14 | markdown2 | 2,802 |
| 15 | docxtpl | 2,392 |
| 16 | pdftabextract | 2,246 |
| 17 | pyexcel | 1,271 |
| 18 | pymorphy2 | 1,155 |
| 19 | plutoprint | 999 |
| 20 | mistletoe | 991 |
| 21 | unp | 445 |
| 22 | vcspull | 209 |
| 23 | Marmir | 173 |