Codestin Search App

Kludex · 2026-06-04T15:18:13Z

What

When find cannot locate a whole boundary in part data, the tail may hold a partial boundary that completes in the next chunk. The old fallback seeded the per-byte matcher at the tail, which thrashed on CR/LF-dense bodies: \r\n matches the boundary prefix, fails at -, resets, reconsiders, and re-runs find for every \r\n pair near the chunk end.

Since self.boundary is \r\n-- + boundary and an RFC boundary cannot contain CR, the last CR in the tail is the only candidate prefix start. We anchor on it with rfind, confirm with startswith, carry the partial via index, and let the existing end-of-chunk flush emit the data and re-mark the lookbehind. This mirrors multer's lookbehind strategy.

Impact

Benchmarked against multipart_bench (1MB part body, 64KB chunks, sansio variant):

Scenario	before	after
worstcase_crlf	~2700 MB/s	~18500 MB/s (6.9x)
worstcase_lf	~21500 MB/s	unchanged
worstcase_bchar	~16400 MB/s	unchanged
file upload	unchanged	unchanged

worstcase_crlf was the slowest scenario across all pure-Python parsers; it is now the fastest, reaching parity with the LF case.

Correctness

Verified behaviorally identical to the previous parser across 82k differential comparisons of the full callback-event stream and error type/offset: every chunk-split strategy (whole, byte-by-byte, fixed-size, random) plus an exhaustive two-chunk sweep with the boundary edge landing on every byte offset. Zero mismatches.

Adds a CRLF-dense corpus case (crlf_dense_part_data) to the data-driven test suite, run both whole-write and byte-by-byte. 100% coverage maintained.

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

When `find` cannot locate a whole boundary in part data, the tail may hold a partial boundary that completes in the next chunk. The old fallback seeded the per-byte matcher at the tail, which thrashed on CR/LF-dense bodies: `\r\n` matches the boundary prefix, fails at `-`, resets, reconsiders, and re-runs `find` for every `\r\n` pair near the chunk end. Since `self.boundary` is `\r\n--` + boundary and an RFC boundary cannot contain CR, the last CR in the tail is the only candidate prefix start. Anchor on it with `rfind`, confirm with `startswith`, carry the partial via `index`, and let the existing end-of-chunk flush emit the data and re-mark the lookbehind. This mirrors multer's lookbehind strategy. worstcase_crlf goes from ~2700 to ~18500 MB/s (6.9x), reaching parity with the LF case; LF, boundary-char, and file-upload paths are unchanged. Verified behaviorally identical to the previous parser across 82k differential comparisons (every chunk-split strategy incl. byte-by-byte, with the boundary edge landing on every offset).

codspeed-hq · 2026-06-04T15:18:59Z

Merging this PR will not alter performance

✅ 5 untouched benchmarks

_{Comparing speed-up-crlf-dense-part-data (12cbd2b) with main (4cffc68)}

cubic-dev-ai

No issues found across 4 files

_{Re-trigger cubic}

Kludex · 2026-06-04T15:33:43Z

@codex review

chatgpt-codex-connector · 2026-06-04T15:37:15Z

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cubic-dev-ai Bot reviewed Jun 4, 2026

View reviewed changes

Kludex merged commit 8672979 into main Jun 4, 2026
15 checks passed

Kludex deleted the speed-up-crlf-dense-part-data branch June 4, 2026 15:42

Kludex mentioned this pull request Jun 4, 2026

Version 0.0.32 #302

Merged

tfoutrein mentioned this pull request Jun 4, 2026

Perf: a few small streaming-parser micro-opts that stack on top of 0.0.32 (#295/#296/#300) — would a PR be welcome? #305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace per-byte partial-boundary scan with rfind lookbehind#300

Replace per-byte partial-boundary scan with rfind lookbehind#300
Kludex merged 1 commit into
mainfrom
speed-up-crlf-dense-part-data

Kludex commented Jun 4, 2026

Uh oh!

codspeed-hq Bot commented Jun 4, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Kludex commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Kludex commented Jun 4, 2026

What

Impact

Correctness

AI Disclaimer

Uh oh!

codspeed-hq Bot commented Jun 4, 2026

Merging this PR will not alter performance

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kludex commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant