An Elixir NIF (Native Implemented Function) wrapper for the Poppler PDF library, providing fast and efficient PDF processing capabilities.
- Get page count - Quickly determine the number of pages in a PDF
- Extract text - Extract text content from entire documents or specific pages
- Combine PDFs - Merge multiple PDF files into one
Before using Popplex, you need to have Poppler installed on your system:
brew install poppler pkg-configsudo apt-get install libpoppler-cpp-dev pkg-configsudo dnf install poppler-cpp-devel pkgconfigsudo pacman -S poppler pkgconfAdd popplex to your list of dependencies in mix.exs:
def deps do
[
{:popplex, "~> 0.1.0"}
]
endThen run:
mix deps.get
mix compileThe NIF will be automatically compiled during the build process.
# Get the number of pages in a PDF
{:ok, count} = Popplex.get_page_count("document.pdf")
IO.puts("The PDF has #{count} pages")# Extract text from all pages
{:ok, text} = Popplex.get_text("document.pdf")
# Extract text from a specific page (0-indexed)
{:ok, first_page} = Popplex.get_text("document.pdf", page: 0)
{:ok, second_page} = Popplex.get_text("document.pdf", page: 1)
# Explicitly extract all pages
{:ok, all_text} = Popplex.get_text("document.pdf", all: true)# Merge multiple PDFs into one
{:ok, output} = Popplex.combine_pdfs(
["file1.pdf", "file2.pdf", "file3.pdf"],
"combined.pdf"
)
# Verify the combined PDF
{:ok, count} = Popplex.get_page_count("combined.pdf")
IO.puts("Combined PDF has #{count} pages")All functions return {:ok, result} on success or {:error, reason} on failure:
case Popplex.get_page_count("document.pdf") do
{:ok, count} ->
IO.puts("Success! Page count: #{count}")
{:error, reason} ->
IO.puts("Error: #{reason}")
endCommon error scenarios:
- File doesn't exist:
"Failed to open PDF document" - PDF is password protected:
"PDF document is locked" - Invalid page number:
"Page number out of range"
# Clone the repository
git clone https://github.com/yourusername/popplex.git
cd popplex
# Get dependencies
mix deps.get
# Compile (including the NIF)
mix compile
# Run tests
mix test
# Run integration tests (requires sample PDF files)
mix test --include integrationUnit tests can be run without any PDF files:
mix test --exclude integrationFor integration tests, place sample PDF files in test/fixtures/ and run:
mix test --include integrationThe project uses GitHub Actions for CI, which:
- Tests against multiple Elixir/OTP version combinations
- Runs both unit and integration tests
- Performs static analysis and code formatting checks
- Automatically installs Poppler and dependencies
The CI workflow runs on:
- Every push to
main/masterbranch - Every pull request
You can view the CI status in the badge at the top of this README.
Popplex uses Erlang's NIF (Native Implemented Function) interface to call C++ code that wraps the Poppler library. This provides:
- Performance: Near-native speed for PDF operations
- Direct library access: Full access to Poppler's capabilities
- Memory efficiency: Minimal copying between Erlang and C++
The architecture consists of:
- C++ NIF layer (
c_src/popplex_nif.cpp) - Interfaces with Poppler - NIF loader (
lib/popplex/nif.ex) - Loads the compiled NIF - Public API (
lib/popplex.ex) - User-friendly Elixir interface
- Password-protected PDFs are not currently supported for text extraction
- Some PDF features (forms, annotations, etc.) are not exposed in the API
- PDF combining uses the
pdfunitecommand-line tool rather than a NIF (spawns external process)
Contributions are welcome! Please feel free to submit pull requests or open issues.
This project is available under the MIT License.
- Built on top of the Poppler PDF rendering library
- Uses elixir_make for NIF compilation