Thanks to visit codestin.com
Credit goes to www.libhunt.com

Python Specific Formats Processing

Open-source Python projects categorized as Specific Formats Processing

Top 23 Python Specific Formats Processing Projects

Specific Formats Processing
  1. PyPDF2

    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

    Project mention: AI-Powered Cover Letter Generator | dev.to | 2025-10-24

    The following ResumeService extracts the content from a PDF using pypdf

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. PyMuPDF

    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

    Project mention: Using Docling’s OCR features with RapidOCR | dev.to | 2025-04-03
  4. WeasyPrint

    The awesome document factory

    Project mention: WeasyPrint | news.ycombinator.com | 2025-10-12
  5. csvkit

    A suite of utilities for converting to and working with CSV, the king of tabular file formats.

    Project mention: Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files | news.ycombinator.com | 2025-05-26

    I wonder how this compares to csvkit [1].

    [1]: https://csvkit.readthedocs.io/

  6. python-docx

    Create and modify Word documents with Python

  7. tablib

    Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

  8. Python-Markdown

    A Python implementation of John Gruber’s Markdown with Extension support.

    Project mention: Copy Markdown to Teams | dev.to | 2025-07-19

    I use uv to invoke the CLI of the markdown package in Python. I then copy the output back into the clipboard.

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. XlsxWriter

    A Python module for creating Excel XLSX files.

  11. borb

    borb is a library for reading, creating and manipulating PDF files in python.

  12. Camelot

    A Python library to extract tabular data from PDFs

  13. xlwings

    xlwings is a Python library that makes it easy to call Python from Excel and vice versa. It works with Excel on Windows and macOS as well as with Google Sheets and Excel on the web.

  14. python-pptx

    Create Open XML PowerPoint documents in Python

  15. Mistune

    A fast yet powerful Python Markdown parser with renderers and plugins.

  16. markdown2

    markdown2: A fast and complete implementation of Markdown in Python

  17. docxtpl

    Use a docx as a jinja2 template

  18. pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

    Project mention: Liberate tabular data from scanned documents | news.ycombinator.com | 2024-12-18
  19. pyexcel

    Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

  20. pymorphy2

    Morphological analyzer / inflection engine for Russian and Ukrainian languages.

    Project mention: Ask HN: What Are You Working On? (July 2025) | news.ycombinator.com | 2025-07-27
  21. plutoprint

    A Python Library for Generating PDFs and Images from HTML, powered by PlutoBook

    Project mention: Show HN: PlutoPrint – Generate Beautiful PDFs and PNGs from HTML with Python | news.ycombinator.com | 2025-08-20

    PlutoPrint supports a large subset of CSS, including flexbox for most common layouts, but it’s not a full browser engine, so there are some limitations. You can see a more complete list of supported features here: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m.... We’re also actively tracking bugs and improvements on the GitHub repo: https://github.com/plutoprint/plutoprint/issues, and contributions or test cases are always appreciated to help expand coverage.

  22. mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python.

  23. unp

    Unpacks things.

  24. vcspull

    πŸ”„ Synchronize projects via yaml/json manifest. Built using `libvcs`.

  25. Marmir

    Python powered spreadsheets (by brianray)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Specific Formats Processing discussion

Log in or Post with

Python Specific Formats Processing related posts

  • WeasyPrint

    1 project | news.ycombinator.com | 12 Oct 2025
  • Show HN: PlutoPrint – Convert HTML to Beautiful PDFs and PNGs with Python

    1 project | news.ycombinator.com | 15 Aug 2025
  • Kreuzberg: The Python Document Intelligence Framework That Will Blow Your Mind!

    1 project | dev.to | 27 Jul 2025
  • Copy Markdown to Teams

    2 projects | dev.to | 19 Jul 2025
  • Lyon Drops Microsoft to Boost Digital Sovereignty

    2 projects | news.ycombinator.com | 25 Jun 2025
  • Show HN: PlutoBook – Fast, lightweight C++ library for generating PDF from HTML

    2 projects | news.ycombinator.com | 19 Jun 2025
  • plutoprint VS WeasyPrint - a user suggested alternative

    2 projects | 19 Jun 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 16 Nov 2025
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source Specific Formats Processing projects in Python? This list will help you:

# Project Stars
1 PyPDF2 9,590
2 PyMuPDF 8,417
3 WeasyPrint 8,333
4 csvkit 6,277
5 python-docx 5,285
6 tablib 4,728
7 Python-Markdown 4,112
8 XlsxWriter 3,860
9 borb 3,541
10 Camelot 3,510
11 xlwings 3,264
12 python-pptx 3,028
13 Mistune 2,898
14 markdown2 2,802
15 docxtpl 2,392
16 pdftabextract 2,246
17 pyexcel 1,271
18 pymorphy2 1,155
19 plutoprint 999
20 mistletoe 991
21 unp 445
22 vcspull 209
23 Marmir 173

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?