BookNLP

A natural language processing pipeline for analyzing works of fiction, including entity detection, quotation attribution, and character relationship analysis.

Prerequisites

Python 3.9 or higher
pip (Python package installer)
Virtual environment (recommended)

Installation

Clone the repository:

git clone https://github.com/yourusername/booknlp.git
cd booknlp

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Install the required packages:

pip install --upgrade pip
pip install -r requirements.txt

Install the spaCy model:

python -m spacy download en_core_web_sm

Usage

Make sure your virtual environment is activated:

source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Run BookNLP on a text file:

./run_booknlp.py input_file.txt --output-dir output/directory

Command Line Arguments

input_file: The text file to process (required)
--output-dir: Directory where output files will be saved (default: 'output')
--model: Model size to use - 'big' or 'small' (default: 'small')

Output Files

The pipeline generates several output files in the specified output directory:

{book_id}.tokens: Word-level information including:
- Paragraph and sentence IDs
- Word forms and lemmas
- Part-of-speech tags
- Dependency relations
- Event annotations
{book_id}.entities: Named entity information including:
- Entity types
- Coreference IDs
- Text spans
{book_id}.quotes: Quotation information including:
- Quoted text
- Speaker attribution
- Coreference information
{book_id}.supersense: Semantic categories for words:
- Verb categories
- Noun categories
{book_id}.event: Event annotations including:
- Event types
- Participants
- Temporal information
{book_id}.book: JSON file containing:
- Character information
- Relationships
- Actions
- Attributes
{book_id}.book.html: Interactive HTML visualization of the text with:
- Entity annotations
- Character relationships
- Interactive features

Example

# Process a text file named "emma.txt" with maximum accuracy
./run_booknlp.py emma.txt --output-dir output/emma --model big

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
booknlp		booknlp
.gitignore		.gitignore
158_emma.txt		158_emma.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
T3V1.txt		T3V1.txt
T3V2.txt		T3V2.txt
booknlp_viewer.py		booknlp_viewer.py
pride.short.txt		pride.short.txt
requirements.txt		requirements.txt
run_booknlp.py		run_booknlp.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BookNLP

Prerequisites

Installation

Usage

Command Line Arguments

Output Files

Example

About

Uh oh!

Releases

Packages

Languages

License

Princeu3/booknlp

Folders and files

Latest commit

History

Repository files navigation

BookNLP

Prerequisites

Installation

Usage

Command Line Arguments

Output Files

Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages