Speed up semantic inference with cached sessions and batching #12

bravo1goingdark · 2025-11-19T17:04:16Z

Before

Every semanticize call reloaded tokenizer/session and semanticize_batch just looped, so batches still ran N single-document inferences.
Missing ONNX/tokenizer assets caused hard errors despite the docs claiming a deterministic stub fallback.

After

Introduced a per-thread cache that stores tokenizer + ONNX session handles keyed by model path.
ONNX mode now performs a single batched inference that pads inputs and reuses the cached session.
Errors resolving models/tokenizers fall back to the deterministic stub, matching the documented behavior, and docs/tests were updated accordingly.

Advantages

Huge reduction in latency and filesystem thrash for steady-state semantic inference.
Batching finally amortizes session setup costs, improving throughput for multi-doc workloads.
Pipelines keep running even when assets are missing or temporarily unavailable.

Testing

cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace --all-targets -- --nocapture

bravo1goingdark added 2 commits November 19, 2025 22:33

Add cached ONNX sessions with batched semantic inference

0570d1f

Fix clippy lint and rerun tests

4018f4d

bravo1goingdark self-assigned this Nov 19, 2025

bravo1goingdark added area:semantic Semantic embeddings, ONNX models, or API inference type:performance Profiling, batching, or throughput improvements labels Nov 19, 2025

bravo1goingdark linked an issue Nov 19, 2025 that may be closed by this pull request

Cache ONNX tokenizer/session handles #10

Closed

Document semantic caching and stub fallback

716c23f

bravo1goingdark merged commit 7e862aa into main Nov 19, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up semantic inference with cached sessions and batching #12

Speed up semantic inference with cached sessions and batching #12

Uh oh!

bravo1goingdark commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speed up semantic inference with cached sessions and batching #12

Speed up semantic inference with cached sessions and batching #12

Uh oh!

Conversation

bravo1goingdark commented Nov 19, 2025

Before

After

Advantages

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants