BookForge is the EPUB translation engine that keeps the LLM away from your document structure. It parses EPUBs into validated JSON payloads, checkpoints every segment, preserves markup/footnotes/links, and rebuilds valid EPUBs.
I built this to translate books for my partner. It's MIT-licensed in case it's useful to you.
MVP functionality is implemented:
- EPUB inspect, parse, segment, and rebuild
- Plain, marker-safe, and run-preserving translation contracts
- Mock provider for deterministic tests
- OpenAI-compatible provider
- DeepSeek and OpenRouter presets
- Bounded parallel segment translation with
--concurrency - SQLite checkpoint store
- Resume and retry commands
- Status and tail commands for persisted jobs
- Segment-level cache reuse for compatible prior translations
- Static side-by-side review HTML with flag export/import
- QA reports in JSON and Markdown
- Optional LLM QA review pass
- Cost estimates for known provider/model pairs
cargo build --releaseThe binary is:
target/release/bookforgeFor development, use:
cargo run -p bookforge-cli -- <command>Inspect an EPUB:
cargo run -p bookforge-cli -- inspect book.epubEstimate tokens and approximate cost:
cargo run -p bookforge-cli -- estimate book.epub \
--source English \
--target Italian \
--provider openrouter \
--model deepseek/deepseek-v4-flashTranslate with OpenRouter:
export OPENROUTER_API_KEY=sk-or-...
cargo run -p bookforge-cli -- translate book.epub \
--source English \
--target Italian \
--provider openrouter \
--model deepseek/deepseek-v4-flash \
--concurrency 4 \
--timeout-seconds 120 \
--qa off \
--out book.it.epubTranslate with the v1 fast preset:
cargo run -p bookforge-cli -- translate book.epub \
--target Italian \
--provider-preset openrouter-paid-fast \
--profile v1-fast \
--ui progress \
--out book.it.epubTranslate with a glossary:
cargo run -p bookforge-cli -- glossary import glossary.series.toml
cargo run -p bookforge-cli -- translate book.epub \
--source English \
--target Italian \
--provider-preset openrouter-paid-fast \
--book-id fellowship \
--series-id lord-of-the-rings \
--glossary glossary.series.toml \
--glossary-budget-tokens 800 \
--glossary-format json \
--prompt-extra "Maintain a literary register." \
--out book.it.epubCheck provider and storage health:
cargo run -p bookforge-cli -- doctor --storage
cargo run -p bookforge-cli -- doctor \
--provider openrouter \
--model google/gemini-2.5-flash-liteTranslate with DeepSeek:
export DEEPSEEK_API_KEY=...
cargo run -p bookforge-cli -- translate book.epub \
--source English \
--target Italian \
--provider deepseek \
--model deepseek-v4-flash \
--concurrency 4 \
--out book.it.epubUse any OpenAI-compatible endpoint:
export OPENAI_API_KEY=...
cargo run -p bookforge-cli -- translate book.epub \
--source English \
--target Italian \
--provider openai-compatible \
--base-url https://api.example.com/v1 \
--api-key-env OPENAI_API_KEY \
--model provider/model \
--timeout-seconds 120 \
--out book.it.epubResume a job:
cargo run -p bookforge-cli -- resume <job-id> --timeout-seconds 120Generate a side-by-side review page:
cargo run -p bookforge-cli -- review <job-id> --openIngest exported review flags and mark bad translations for retry:
cargo run -p bookforge-cli -- ingest-flags <job-id> --flags flags.json
cargo run -p bookforge-cli -- retry <job-id> --only needs-reviewManage glossary terms:
cargo run -p bookforge-cli -- glossary list --language 'English->Italian'
cargo run -p bookforge-cli -- glossary add "Aragorn" "Aragorn" \
--category person \
--scope series \
--scope-id lord-of-the-rings \
--source-lang English \
--target-lang Italian \
--case-sensitive
cargo run -p bookforge-cli -- glossary export glossary.series.toml \
--scope series \
--scope-id lord-of-the-rings \
--language 'English->Italian'Inspect persisted job state and recent events:
cargo run -p bookforge-cli -- status <job-id>
cargo run -p bookforge-cli -- tail <job-id> --lines 40Retry failed or review-needed segments:
cargo run -p bookforge-cli -- retry <job-id> --only failed
cargo run -p bookforge-cli -- retry <job-id> --only needs-review
cargo run -p bookforge-cli -- retry <job-id> --only allValidate a translated EPUB and report:
cargo run -p bookforge-cli -- validate book.it.epub --report book.it.report.jsonTranslation always runs hard validators before committing a segment. The optional LLM QA pass is controlled with:
--qa off
--qa suspicious
--qa alloff is the default. Reports still include deterministic soft warnings such as changed URLs, changed numbers, suspicious length ratios, model commentary, and repeated text.
Runtime state is stored in:
.bookforge/jobs.sqliteThat path is ignored by git. Segment translations are persisted as each segment completes. New jobs reuse compatible cached translations when the source hash, prompt version, provider, model, source language, and target language match.
Progress events can be written in every UI mode:
cargo run -p bookforge-cli -- translate book.epub \
--target Italian \
--provider mock \
--model mock-prefix-target \
--ui json \
--progress-jsonl .bookforge/runs/example/events.jsonlReview artifacts contain the full source and translated text of the book. They are written locally under .bookforge/runs/<job-id>/review/; treat them as private user data.
Known limitations: provider API keys are read from environment variables, and validation is intentionally pragmatic rather than a full EPUBCheck replacement.
Run the mock release smoke benchmark with:
scripts/bench-mock.shSee docs/benchmarks.md for metrics to capture in real-provider runs.
Do not commit API keys or ad hoc test books. The repository ignores:
test/
.bookforge/
.claude/
.codex
*.env
*.key
key.txtFor local OpenRouter testing, place the key outside tracked paths or export it directly:
export OPENROUTER_API_KEY=...cargo fmt
cargo test
cargo clippy --all-targets --all-featurescrates/bookforge-core IR, segmentation, shared config
crates/bookforge-epub EPUB inspect/read/rebuild
crates/bookforge-llm prompts, providers, scheduler, validators
crates/bookforge-llm/prompts Versioned prompt templates
crates/bookforge-store SQLite checkpoint store
crates/bookforge-cli CLI commands and reports
docs/ Architecture notes
tests/fixtures/ Committed minimal fixture only