Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Native RDF parser for rdf4cpp #402

@bigerl

Description

@bigerl

Goals:

  • speed up parsing to >1GB/s turtle digestion including populating rdf4cpp node storage and serializing result in turtle to /dev/null.
  • eliminate serd dependency
  • full unicode support
  • cover all major formats (Turtle-like, JSON-based, Binary, XML)
  • make it easy to support additional formats

Design suggestions:

  • zero-copy (use span/string_view)
  • 2-stage parsing:
    1. non-branching, SIMD-friendly structural indexing; chunk-wise
    2. (parallelizable) actions, e.g.: instantiate RDF nodes, push a subject to the stack, etc. (e.g., prefix definitions are barriers for parallelism)
  • multi-source (mmap, c-stream, c++ stream, byte-buffer)

Library suggestions:

  • Use taskflow and highway for thread/data parallelism.

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions