Pattern Recognition and Ontologies for Prose Processing
A Natural Language Processing Framework for Narrative Analysis
Propp is a modular NLP pipeline designed to extract rich character-centric information from narrative texts, especially litterature.
This Notebook will guide you through the process of analyzing a French novel using the propp-fr library.
You'll learn how to load a novel, tokenize it, extract named entities, resolve coreferences, and analyze the main characters.
Installation
The French variant of the Propp python library can be installed via pypi:
pip install propp_frOneliner Processing
You can process a text file in one line with the default models:
from propp_fr import process_text_file
process_text_file("root_directory/my_french_novel.txt")This will generate three additional files in the same directory:
root_directory/
├── my_french_novel.txt
├── my_french_novel.tokens
├── my_french_novel.entities
└── my_french_novel.book
Antoine Bourgois and Thierry Poibeau. 2025. The Elephant in the Coreference Room: Resolving Coreference in Full-Length French Fiction Works. In Proceeding of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2025). EMNLP 2025, Suzhou, China. arxiv, hal.
Jean Barré, Olga Seminck, Antoine Bourgois, Thierry Poibeau. 2025. Modeling the Construction of a Literary Archetype: The Case of the Detective Figure in French Literature In Proceeding of the Sixth Conference on Computational Humanities Research 2025 (CHR 2025). Luxembourg, Luxembourg. arxiv.