Codestin Search App

raullenchai · 2026-05-15T22:33:19Z

Summary

Cherry-picks the portable half of upstream waybarrios#525.

`lm-format-enforcer`'s JSON grammar permits `\\` followed by any codepoint as a valid escape. A model emitting structured JSON with non-ASCII content (CJK, emoji, …) can therefore produce strings like `"\\빠\\르\\게"` — valid JSON, but the decoded value carries literal backslashes that look like corruption to clients.

This PR adds a small helper in `routes/chat.py` that:

Recursively walks dict / list / str values returned by the JSON parser
Strips a single backslash placed immediately before any non-ASCII codepoint
Re-serializes with `ensure_ascii=False` so cleaned text keeps the characters as-is rather than re-escaping

What I did NOT port

The off-by-one fix in upstream's `vllm_mlx/constrained/json_schema_processor.py` — that module doesn't exist in our tree (we don't use `lm-format-enforcer`-driven token enforcement; our JSON output path is parse-after-the-fact in `routes/chat.py`).

Test plan

`python3.12 -m pytest tests/test_structured_output.py -v` — 5 new `TestStripBackslashBeforeUnicode` cases pass (CJK, valid escapes preserved, recursion through dict/list, non-string scalars pass through, emoji past U+FFFF).
`python3.12 -m pytest tests/ --ignore=tests/integrations --ignore=tests/test_event_loop.py --ignore=tests/test_mllm*.py --ignore=tests/test_video.py -q` — 3483 passed.
`ruff check && ruff format --check` clean on touched files.

Blast radius

Low — single helper, called only when `response_format` is set AND the model emits parseable JSON. No effect on tool-calling, streaming, or any non-JSON response path.

🤖 Generated with Claude Code

Cherry-pick the portable half of upstream waybarrios#525. ``lm-format-enforcer``'s JSON grammar permits ``\\`` followed by any codepoint as a valid escape, so a model emitting structured JSON with non-ASCII content (CJK, emoji, …) can produce strings like ``"\\빠\\르\\게"`` — valid JSON, but the decoded value carries literal backslashes that look like corruption to clients. Strip those spurious backslashes recursively across dict/list/str when finalizing the response_format payload in ``routes/chat.py``. Switch ``json.dumps`` to ``ensure_ascii=False`` so the cleaned text keeps the characters as-is rather than re-escaping them. The off-by-one half of #525 lives in upstream's ``constrained/json_schema_processor.py``, which we don't carry — not ported. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Codex review caught: ``_strip_backslash_before_unicode`` only walked dict *values*, not keys. ``lm-format-enforcer`` makes no distinction between JSON keys and values, so a key like ``"\\한\\글"`` would leak through to the client as ``"\\한\\글"`` even when the value was cleaned. Apply the helper to keys as well, plus a regression test using a non-ASCII key (the previous coverage used ASCII keys, which is why the bug wasn't caught originally). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ontract Codex review round 2 caught two issues: 1. ``_strip_backslash_before_unicode`` on a dict silently overwrote entries when two distinct dirty keys collapsed to the same clean key (e.g. ``"\한"`` and ``"한"`` both → ``"한"``). Switch to an explicit loop that keeps the first occurrence and logs a WARNING on collision so the data loss is visible. 2. ``ensure_ascii=False`` was a deliberate behavior change but not documented. Add a comment explaining the rationale (raw UTF-8 in JSON-over-HTTP is the standard recommendation; FastAPI emits UTF-8 anyway; smaller wire bytes; no double-decoding by clients). Codex's other finding (the regex doesn't match a literal backslash) is a false positive — ``r"\\"`` in a Python raw string IS the regex for a literal backslash; all 7 unit tests pass and a manual ``re.sub`` trial-substitutes correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… test Codex review round 2 asked for a smoke test that drives the actual chain ``routes/chat.py`` runs (parse_json_output → strip helper → json.dumps with ensure_ascii=False) rather than relying solely on the isolated helper unit tests. This guards against a future refactor that moves the helper to a different module or drops the wiring. The test is hermetic — no engine boot, no HTTP — but exercises the real production functions on a synthetic dirty input that mirrors what lm-format-enforcer would produce for a CJK-heavy response. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

The new TestStripBackslashBeforeUnicode class imports vllm_mlx.routes.chat, which transitively pulls in mlx. The no-MLX Linux CI matrix (test-matrix 3.10/3.11/3.12) doesn't have mlx installed, so these tests previously raised ModuleNotFoundError. Gate the class with @pytest.mark.skipif so it skips cleanly on Linux but still runs on test-apple-silicon (where mlx is available). This mirrors the existing -k "not Integration and not InjectJson" filter in .github/workflows/ci.yml for similar mlx-touching tests. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Codex review round 3 flagged the unconditional backslash-strip as a potential data-loss risk for legitimate JSON outputs that contain a backslash before a non-ASCII codepoint (e.g. Windows paths like ``"C:\\사용자\\file.txt"`` in a ``response_format=json_object`` reply). The tradeoff is intentional and matches upstream waybarrios#525: - The lm-format-enforcer bug is the overwhelming source of these sequences in JSON-output responses. - The file-path case is rare in practice. - Heuristic gating ("looks like enforcer output") would be fragile. - Clients needing raw backslash + non-ASCII can use response_format=text. Document the scope and tradeoff in code so future maintainers understand the choice. If a user hits the false-positive in practice, the right next step is a config flag (--no-strip-spurious-backslashes), not a heuristic. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* chore: bump version to 0.6.51 User-facing changes since v0.6.50: * fix(prefix-cache): slice 3D KV state along the right axis (#392) — inference correctness fix for prefix cache restore on KV tensors shaped (B, H, S) (was slicing the wrong axis, surfacing as silent first-token corruption on cache-hit prompts). * perf(scheduler): drop per-decode list() copy of output_token_ids (#391) — small per-token allocation removed from the decode hot path; benefit scales with output length on multi-stream batches. * feat(json-output): strip spurious backslashes before non-ASCII chars (#394) — JSON-emitting models occasionally emit `\\é` etc. during constrained generation; now stripped before delivery so `json.loads` doesn't choke on user output. Dev/docs (no user-visible runtime change): * docs(contributing): codify test precision policy correctness=8bit / perf=4bit (#396) * chore(pr_validate): swap Qwen3.6-27B-4bit → 8bit in stress matrix with smoke-tier 4-bit fallback for ≤32 GB hosts (#395) Pre-merge artifact SHA-256 (audit anchor; will not match post-publish SHAs because publish.yml rebuilds on Linux runners — see Release SOP §8): rapid_mlx-0.6.51-py3-none-any.whl 5affa5c527bf543b72ddab94e96f2c6308225e6f5facd229d8f21f0b9ede8ce7 rapid_mlx-0.6.51.tar.gz d248b8a7754cbe4d94b4708fa0fb88caa623f7fc71f13f156ece0a3b36224755 Release SOP gates: * §3 install size: 448 MB (vs 445 MB v0.6.48 baseline, +0.67% — well under 1.05× soft-warn threshold). * §7 supply chain: pip-audit clean on critical deps via OSV; recent uploads (HF hub 1.15.0 today, transformers 5.8.1 3 days) verified legit upstream (substantive changelogs, known maintainers / HF bot); no install.sh / workflows / pyproject.toml diffs since v0.6.50; OIDC scope minimal (id-token: write only on publish, contents: write only on auto-release); 3 third-party actions still moving-tag pinned (peter-evans, peaceiris, codecov) — pre-existing, not a release blocker but tracked for follow-up. * §5 perf: make full in flight (qwen3.5-35b done; qwen3.6-35b mid-run). Will report results in PR before merge. * §4 user-onboarding personas + §6 agent smoke: pending; will run serially after §5 (per CLAUDE.md "never in parallel" guidance for model-server workloads). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(api): detect text-only forks of multimodal architectures (#393) Issue #393 reports that ``rapid-mlx serve /path/to/Qwen3.6-35B-A3B-MLX-8bit`` crashes on startup because the model is routed to the MLLM batched engine even though the user's checkpoint is text-only. Reporter (Tylast) correctly identified the chain: the checkpoint's config.json declares ``vision_config`` (because the base ``Qwen3_5MoeForConditionalGeneration`` architecture is multimodal-capable), so ``is_mllm_model`` returns True, so the MLLM loader takes over, then hits the hybrid-backbone / ArraysCache incompatibility documented in the closed-as-spam #385. The text-only A3B fork ships zero vision tensors in its safetensors, even though its config.json carries the full vision_config block. Our detection trusted the config and ignored the actual weight presence. Fix: when config indicates VLM AND the path is a local directory AND that directory ships ``model.safetensors.index.json``, scan the index for tensor-name prefixes that indicate real vision/audio weights (``vision_tower``, ``visual.``, ``audio_tower``, ``mm_projector``, …). If none are present, override to text-only routing. The check fires only in the True → False direction; the False direction is preserved as-is to keep existing text-routed models stable. Conservative on edge cases: * Single-file safetensors (no sharded index) → return None from the probe and trust config. Wrong-True here means the text path errors clearly at first image request, whereas wrong-False would silently corrupt every text request on a real VLM. The bad-direction cost is much smaller. * Unreadable / oversized / malformed index → same. Fall back to config. * HF repo IDs (not local dirs) → unchanged; we'd need a network call to inspect remote tensors. Tests: * New ``TestIsMllmModelWeightsPresenceOverride`` class — 6 cases: - vision_config + no vision tensors → False (the #393 fix path) - vision_config + vision tensors → True (genuine VLM still works) - audio_config + audio tensors → True (audio branch covered) - missing index → fall back to config - malformed index → fall back to config - text-only config → never even probe weights Total tests: 102 pass (was 96). ruff clean. This change is bundled into the v0.6.51 bump because the user-facing fix is small + isolated and waiting for v0.6.52 would mean Tylast keeps hitting the crash for another release cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs(install): add brew 5.x homebrew/core pre-flight hint Brew 5.x's install sandbox cannot auto-tap homebrew/core mid-install when a third-party formula depends on core packages ([email protected], rust). Users on fresh brew installs (API-only, no homebrew/core tap cloned) see "Operation not permitted" on /opt/homebrew/Library/Taps/homebrew/. Pre-tapping with `brew tap homebrew/core --force` (one-time, ~1.3 GB) lets the install complete. Brew 4.x and earlier never needed this. --------- Co-authored-by: Your Name <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

Your Name and others added 6 commits May 15, 2026 15:33

raullenchai merged commit f8e18b0 into main May 15, 2026
7 checks passed

raullenchai deleted the feat/json-output-non-ascii-cleanup branch May 15, 2026 23:53

raullenchai mentioned this pull request May 16, 2026

chore: bump version to 0.6.51 #397

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(json-output): strip spurious backslashes before non-ASCII chars#394

feat(json-output): strip spurious backslashes before non-ASCII chars#394
raullenchai merged 6 commits into
mainfrom
feat/json-output-non-ascii-cleanup

raullenchai commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented May 15, 2026

Summary

What I did NOT port

Test plan

Blast radius

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant