Searchable PDF Service

Convert scanned PDFs to natively searchable PDFs using the Docstrange API (Nanonets). The service extracts text with word-level bounding boxes and embeds invisible text at the correct positions so you can search and copy text using native PDF viewers.

Project structure

backend/ – FastAPI service and CLI
frontend/ – React UI for upload, processing, and PDF viewing

Features

Upload scanned PDFs (max 5 pages for sync processing)
Extract text + word-level coordinates via Docstrange API
Embed invisible searchable text layer at correct positions
Web UI: drag-and-drop upload, output PDF viewer, search (Ctrl+F)
CLI for batch processing

Prerequisites

uv (Python package manager)
Python 3.10+
Node.js 18+
Nanonets/Docstrange API key

Setup

Backend

cd backend
uv sync
cp .env.example .env
# Edit .env and set NANONETS_API_KEY=your_api_key_here

Frontend

cd frontend
npm install

Running

Option 1: Full stack (recommended)

Terminal 1 – backend:

cd backend
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 – frontend (proxies API to backend):

cd frontend
npm run dev

Frontend: http://localhost:5173
API docs: http://localhost:8000/docs

Option 2: Backend only

cd backend
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend

The UI provides:

Left panel: Upload zone (drag & drop or click to browse), Process button
Right panel: Output PDF viewer and Download button

Use Ctrl+F (Cmd+F on Mac) to search within the embedded PDF viewer.

To point the frontend at a different API URL, set VITE_API_URL when building, or use the built-in dev proxy (defaults to http://localhost:8000).

API

POST /process

Upload a PDF and receive a searchable PDF.

Request: multipart/form-data with file = PDF

Response: Searchable PDF file (application/pdf)

Status codes:

200: Success, returns PDF
400: Invalid file type, too many pages (>5), or extraction failed
401: Invalid API key
413: File too large (default max 10MB)
429: Rate limit exceeded

GET /health

Readiness check. Returns 503 if NANONETS_API_KEY is not configured.

Configuration

Variable	Required	Default	Description
NANONETS_API_KEY	Yes	-	Docstrange/Nanonets API key
MAX_FILE_SIZE	No	10485760	Max file size in bytes (10MB)
MAX_PAGES	No	5	Max pages (Docstrange sync limit)

CLI Utility

cd backend
uv run searchable-pdf document.pdf -o output.pdf

Tests

cd backend
uv sync --extra dev
uv run pytest tests/ -v

Example (curl)

curl -X POST http://localhost:8000/process \
  -F "[email protected]" \
  -o searchable-document.pdf

AWS Deployment (App Runner)

The project includes a Dockerfile and GitHub Actions workflow for deploying to AWS App Runner.

Local vs production

Environment	API key source
Local	`NANONETS_API_KEY` env var (from `.env` or shell)
Production	AWS Secrets Manager (never in config or code)

One-time setup

ECR, IAM roles – Already created for this account.

For production: Store the API key in Secrets Manager:

NANONETS_API_KEY=your_key ./deploy/create-secret.sh
# Saves the ARN – use it for production deploy

GitHub Actions uses OIDC (no AWS keys). Optionally add APP_RUNNER_SERVICE_ARN secret for auto-deploy on push.

Deploy

Local/dev (env var – for first-time service creation):

NANONETS_API_KEY=your_key ./deploy/deploy.sh

Production (Secrets Manager):

NANONETS_SECRET_ARN=arn:aws:secretsmanager:... ./deploy/deploy.sh --production

Automatic: Push to main triggers build and push to ECR (and deploy if APP_RUNNER_SERVICE_ARN is set).

Local Docker test

docker build -t searchable-pdf .
docker run -p 8000:8000 -e NANONETS_API_KEY=your_key searchable-pdf
# Open http://localhost:8000

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
backend		backend
deploy		deploy
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Searchable PDF Service

Project structure

Features

Prerequisites

Setup

Backend

Frontend

Running

Option 1: Full stack (recommended)

Option 2: Backend only

Frontend

API

POST /process

GET /health

Configuration

CLI Utility

Tests

Example (curl)

AWS Deployment (App Runner)

Local vs production

One-time setup

Deploy

Local Docker test

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Searchable PDF Service

Project structure

Features

Prerequisites

Setup

Backend

Frontend

Running

Option 1: Full stack (recommended)

Option 2: Backend only

Frontend

API

POST /process

GET /health

Configuration

CLI Utility

Tests

Example (curl)

AWS Deployment (App Runner)

Local vs production

One-time setup

Deploy

Local Docker test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages