EPUB TOC

A Python tool for extracting table of contents from EPUB files with hierarchical structure support.

Features

Multiple extraction methods support (NCX, epub_meta, OPF)
Automatic best method selection
Hierarchical TOC structure preservation
Russian and English language support
JSON output format
Detailed logging
EPUB file analysis reports

Installation

From PyPI (recommended)

pip install epub_toc

From source (for development)

git clone https://github.com/almazilaletdinov/epub_toc.git
cd epub_toc
pip install -e .

Verify installation

python -c "import epub_toc; print(epub_toc.__version__)"

Dependencies

All dependencies will be automatically installed with pip:

epub_meta>=0.0.7
lxml>=4.9.3
beautifulsoup4>=4.12.2
ebooklib>=0.18
tika>=2.6

For development, additional dependencies can be installed with:

pip install -e .[dev]

Usage

As a module

from epub_toc import EPUBTOCParser

# Create parser
parser = EPUBTOCParser('path/to/book.epub')

# Extract TOC
toc = parser.extract_toc()

# Print to console
parser.print_toc()

# Save to JSON
parser.save_toc_to_json('output.json')

From command line

epub-toc path/to/book.epub

EPUB File Analysis

To analyze all EPUB files in tests/data/epub_samples directory:

python tests/integration/test_epub_analysis.py

Analysis results are saved in reports/ directory:

epub_analysis_YYYYMMDD_HHMMSS.json - detailed report in JSON format
epub_analysis_YYYYMMDD_HHMMSS.txt - brief report in text format
toc/*.json - extracted TOCs for each EPUB file

Report structure:

JSON report contains:
- Overall statistics for all files
- Extraction methods success rate
- Detailed results for each file
- Links to extracted TOC files
Text report includes:
- Brief statistics
- Information about each file
- Paths to extracted TOCs
TOC files:
- Saved in toc/ subdirectory
- Named as book_name_toc.json
- Contain complete TOC in JSON format

Output Format

TOC is saved in JSON format with the following structure:

{
  "metadata": {
    "title": "Book Title",
    "authors": ["Author 1", "Author 2"],
    "publisher": "Publisher Name",
    "publication_date": "2024-01-01",
    "language": "en",
    "description": "Book description",
    "cover_image_path": "path/to/cover.jpg",
    "isbn": "978-3-16-148410-0",
    "rights": "Copyright information",
    "series": "Series Name",
    "series_index": 1,
    "identifiers": {
      "isbn13": "978-3-16-148410-0",
      "uuid": "550e8400-e29b-41d4-a716-446655440000"
    },
    "subjects": ["Fiction", "Adventure"],
    "file_size": 1234567,
    "file_name": "book.epub"
  },
  "toc": [
    {
      "title": "Chapter 1",
      "href": "chapter1.html",
      "level": 0,
      "children": [
        {
          "title": "Section 1.1",
          "href": "chapter1.html#section1",
          "level": 1,
          "children": []
        }
      ]
    }
  ]
}

All metadata fields are optional and will be omitted if not available in the EPUB file.

Testing

The module includes comprehensive test suites:

Running Tests

To run all tests with detailed reporting:

./run_tests.sh

This will run:

Installation tests (pip install/uninstall)
Integration tests (EPUB parsing, TOC extraction)
Unit tests (core functionality)

Test Coverage

The test suite includes:

Installation verification
Package dependencies
Russian books (NCX method)
English books (epub_meta method)
Files with different TOC structures
Files of different sizes (from 400KB to 8MB)

Test results and coverage reports are generated in:

coverage.xml - Code coverage report
reports/ - Test execution reports
tests/data/epub_toc_json/ - Generated TOC files

Requirements

Python 3.7+
epub_meta>=0.0.7
lxml>=4.9.3
beautifulsoup4>=4.12.2

Contributing

We welcome contributions! If you'd like to help:

Fork the repository
Create a branch for your changes
Make changes and add tests
Ensure all tests pass
Create a Pull Request

See CONTRIBUTING.md for details.

Security

If you discover a security vulnerability, please DO NOT create a public issue. Instead, send a report following the instructions in SECURITY.md

License

This project is licensed under the MIT License. See LICENSE file for details.

Roadmap

Additional EPUB format support
Improved complex hierarchical structure handling
Integration with popular e-readers
Web service API
Additional language support

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
docs		docs
epub_toc		epub_toc
examples		examples
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
analyze_epub.py		analyze_epub.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EPUB TOC

Features

Installation

From PyPI (recommended)

From source (for development)

Verify installation

Dependencies

Usage

As a module

From command line

EPUB File Analysis

Output Format

Testing

Running Tests

Test Coverage

Requirements

Contributing

Security

License

Roadmap

About

Uh oh!

Releases

Packages

Languages

License

almazom/epub_toc

Folders and files

Latest commit

History

Repository files navigation

EPUB TOC

Features

Installation

From PyPI (recommended)

From source (for development)

Verify installation

Dependencies

Usage

As a module

From command line

EPUB File Analysis

Output Format

Testing

Running Tests

Test Coverage

Requirements

Contributing

Security

License

Roadmap

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages