Popplex

An Elixir NIF (Native Implemented Function) wrapper for the Poppler PDF library, providing fast and efficient PDF processing capabilities.

View Changelog | View Contributing Guidelines

Features

Get page count - Quickly determine the number of pages in a PDF
Extract text - Extract text content from entire documents or specific pages
Combine PDFs - Merge multiple PDF files into one

Prerequisites

Before using Popplex, you need to have Poppler installed on your system:

macOS

brew install poppler pkg-config

Ubuntu/Debian

sudo apt-get install libpoppler-cpp-dev pkg-config

Fedora/RHEL

sudo dnf install poppler-cpp-devel pkgconfig

Arch Linux

sudo pacman -S poppler pkgconf

Installation

Add popplex to your list of dependencies in mix.exs:

def deps do
  [
    {:popplex, "~> 0.1.0"}
  ]
end

Then run:

mix deps.get
mix compile

The NIF will be automatically compiled during the build process.

Usage

Get Page Count

# Get the number of pages in a PDF
{:ok, count} = Popplex.get_page_count("document.pdf")
IO.puts("The PDF has #{count} pages")

Extract Text

# Extract text from all pages
{:ok, text} = Popplex.get_text("document.pdf")

# Extract text from a specific page (0-indexed)
{:ok, first_page} = Popplex.get_text("document.pdf", page: 0)
{:ok, second_page} = Popplex.get_text("document.pdf", page: 1)

# Explicitly extract all pages
{:ok, all_text} = Popplex.get_text("document.pdf", all: true)

Combine PDFs

# Merge multiple PDFs into one
{:ok, output} = Popplex.combine_pdfs(
  ["file1.pdf", "file2.pdf", "file3.pdf"],
  "combined.pdf"
)

# Verify the combined PDF
{:ok, count} = Popplex.get_page_count("combined.pdf")
IO.puts("Combined PDF has #{count} pages")

Error Handling

All functions return {:ok, result} on success or {:error, reason} on failure:

case Popplex.get_page_count("document.pdf") do
  {:ok, count} ->
    IO.puts("Success! Page count: #{count}")
    
  {:error, reason} ->
    IO.puts("Error: #{reason}")
end

Common error scenarios:

File doesn't exist: "Failed to open PDF document"
PDF is password protected: "PDF document is locked"
Invalid page number: "Page number out of range"

Development

Building from Source

# Clone the repository
git clone https://github.com/yourusername/popplex.git
cd popplex

# Get dependencies
mix deps.get

# Compile (including the NIF)
mix compile

# Run tests
mix test

# Run integration tests (requires sample PDF files)
mix test --include integration

Testing

Unit tests can be run without any PDF files:

mix test --exclude integration

For integration tests, place sample PDF files in test/fixtures/ and run:

mix test --include integration

Continuous Integration

The project uses GitHub Actions for CI, which:

Tests against multiple Elixir/OTP version combinations
Runs both unit and integration tests
Performs static analysis and code formatting checks
Automatically installs Poppler and dependencies

The CI workflow runs on:

Every push to main/master branch
Every pull request

You can view the CI status in the badge at the top of this README.

How It Works

Popplex uses Erlang's NIF (Native Implemented Function) interface to call C++ code that wraps the Poppler library. This provides:

Performance: Near-native speed for PDF operations
Direct library access: Full access to Poppler's capabilities
Memory efficiency: Minimal copying between Erlang and C++

The architecture consists of:

C++ NIF layer (c_src/popplex_nif.cpp) - Interfaces with Poppler
NIF loader (lib/popplex/nif.ex) - Loads the compiled NIF
Public API (lib/popplex.ex) - User-friendly Elixir interface

Limitations

Password-protected PDFs are not currently supported for text extraction
Some PDF features (forms, annotations, etc.) are not exposed in the API
PDF combining uses the pdfunite command-line tool rather than a NIF (spawns external process)

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

License

This project is available under the MIT License.

Acknowledgments

Built on top of the Poppler PDF rendering library
Uses elixir_make for NIF compilation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
c_src		c_src
lib		lib
test		test
.formatter.exs		.formatter.exs
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Popplex

Features

Prerequisites

macOS

Ubuntu/Debian

Fedora/RHEL

Arch Linux

Installation

Usage

Get Page Count

Extract Text

Combine PDFs

Error Handling

Development

Building from Source

Testing

Continuous Integration

How It Works

Limitations

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

nyo16/popplex

Folders and files

Latest commit

History

Repository files navigation

Popplex

Features

Prerequisites

macOS

Ubuntu/Debian

Fedora/RHEL

Arch Linux

Installation

Usage

Get Page Count

Extract Text

Combine PDFs

Error Handling

Development

Building from Source

Testing

Continuous Integration

How It Works

Limitations

Contributing

License

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages