Thanks to visit codestin.com
Credit goes to lib.rs

21 stable releases

new 1.0.21 Jan 5, 2026
1.0.20 Jan 4, 2026
1.0.18 Sep 14, 2025
1.0.17 Aug 30, 2025
1.0.3 Nov 30, 2024

#269 in Text processing


Used in rsrpp-cli

MIT license

110KB
2K SLoC

Rust Research Paper Parser (rsrpp)

CircleCI Crates.io Version

RuSt Research Paper Parser (rsrpp)

The rsrpp library provides a set of tools for parsing research papers.

LOGO

Quick Start

Pre-requirements

  • Poppler: sudo apt install poppler-utils
  • OpenCV: sudo apt install libopencv-dev clang libclang-dev

Installation

To start using the rsrpp library, add it to your project's dependencies in the Cargo.toml file:

cargo add rsrpp

Then, import the necessary modules in your code:

extern crate rsrpp;
use rsrpp::parser;

Examples

Here is a simple example of how to use the parser module:

let mut config = ParserConfig::new();
let url = "https://arxiv.org/pdf/1706.03762";
let pages = parse(url, &mut config).await.unwrap(); // Vec<Page>
let sections = Section::from_pages(&pages); // Vec<Section>
let json = serde_json::to_string(&sections).unwrap(); // String

Tests

The library includes a set of tests to ensure its functionality. To run the tests, use the following command:

cargo test

License: MIT

Releases

1.0.21
  • Fixed panic-causing unwrap() calls with proper error handling.
1.0.20
  • Fixed Poppler 25.12.0 compatibility on macOS.
1.0.19
  • Refactored fix_suffix_hyphens to support 31 compound word suffixes:
    • -based, -driven, -oriented, -aware, -agnostic, -independent, -dependent, -first, -native, -centric, -intensive, -bound, -safe, -free, -proof, -efficient, -optimized, -enabled, -powered, -ready, -capable, -compatible, -compliant, -level, -scale, -wide, -specific, -friendly, -facing, -like, -style
  • Added unit tests for suffix hyphenation functionality.
1.0.18
  • updated how to extract section titles from PDF.
1.0.17
  • restructured rsrpp.parser.
  • updated how to extract section titles from PDF.
  • updated tests.
1.0.16
  • removed init_logger form rsrpp.
1.0.15
  • fixed typo.
  • introdeced tracing logger.
1.0.14
  • Updated rsrpp version for rsrpp-cli.
1.0.13
  • Updated dependencies.
  • removed build.sh because it requires sudo when installing the crate.
1.0.12
  • Fixed a bug: remove unused println!.
1.0.11
  • Fixed a bug in xml loop to finish when the file reaches to end.
1.0.10
  • Added verbose mode.
  • Fixed a bug in the process extracting page number.
1.0.9
  • Updated: implemented new errors to handle invalid URLs.
1.0.8
  • Updated: The max retry time for saving PDF files has been increased.
1.0.7
  • Fix bugs: After converting to PDF, the program now waits until processing is complete.
1.0.4
  • Fixed bugs in get_pdf_info.
  • Made minor improvements.
1.0.3
1.0.2
  • Updated the Section module. content: String was replaced by content: Vec<TextBlock>.

Dependencies

~29–65MB
~888K SLoC