Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(json-output): strip spurious backslashes before non-ASCII chars#394

Merged
raullenchai merged 6 commits into
mainfrom
feat/json-output-non-ascii-cleanup
May 15, 2026
Merged

feat(json-output): strip spurious backslashes before non-ASCII chars#394
raullenchai merged 6 commits into
mainfrom
feat/json-output-non-ascii-cleanup

Conversation

@raullenchai
Copy link
Copy Markdown
Owner

Summary

Cherry-picks the portable half of upstream waybarrios#525.

`lm-format-enforcer`'s JSON grammar permits `\\` followed by any codepoint as a valid escape. A model emitting structured JSON with non-ASCII content (CJK, emoji, …) can therefore produce strings like `"\\빠\\르\\게"` — valid JSON, but the decoded value carries literal backslashes that look like corruption to clients.

This PR adds a small helper in `routes/chat.py` that:

  • Recursively walks dict / list / str values returned by the JSON parser
  • Strips a single backslash placed immediately before any non-ASCII codepoint
  • Re-serializes with `ensure_ascii=False` so cleaned text keeps the characters as-is rather than re-escaping

What I did NOT port

The off-by-one fix in upstream's `vllm_mlx/constrained/json_schema_processor.py` — that module doesn't exist in our tree (we don't use `lm-format-enforcer`-driven token enforcement; our JSON output path is parse-after-the-fact in `routes/chat.py`).

Test plan

  • `python3.12 -m pytest tests/test_structured_output.py -v` — 5 new `TestStripBackslashBeforeUnicode` cases pass (CJK, valid escapes preserved, recursion through dict/list, non-string scalars pass through, emoji past U+FFFF).
  • `python3.12 -m pytest tests/ --ignore=tests/integrations --ignore=tests/test_event_loop.py --ignore=tests/test_mllm*.py --ignore=tests/test_video.py -q` — 3483 passed.
  • `ruff check && ruff format --check` clean on touched files.

Blast radius

Low — single helper, called only when `response_format` is set AND the model emits parseable JSON. No effect on tool-calling, streaming, or any non-JSON response path.

🤖 Generated with Claude Code

Your Name and others added 6 commits May 15, 2026 15:33
Cherry-pick the portable half of upstream waybarrios#525.

``lm-format-enforcer``'s JSON grammar permits ``\\`` followed by any
codepoint as a valid escape, so a model emitting structured JSON with
non-ASCII content (CJK, emoji, …) can produce strings like
``"\\빠\\르\\게"`` — valid JSON, but the decoded value carries literal
backslashes that look like corruption to clients.

Strip those spurious backslashes recursively across dict/list/str when
finalizing the response_format payload in ``routes/chat.py``. Switch
``json.dumps`` to ``ensure_ascii=False`` so the cleaned text keeps the
characters as-is rather than re-escaping them.

The off-by-one half of #525 lives in upstream's
``constrained/json_schema_processor.py``, which we don't carry — not
ported.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Codex review caught: ``_strip_backslash_before_unicode`` only walked
dict *values*, not keys. ``lm-format-enforcer`` makes no distinction
between JSON keys and values, so a key like ``"\\한\\글"`` would leak
through to the client as ``"\\한\\글"`` even when the value was cleaned.

Apply the helper to keys as well, plus a regression test using a
non-ASCII key (the previous coverage used ASCII keys, which is why
the bug wasn't caught originally).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ontract

Codex review round 2 caught two issues:

1. ``_strip_backslash_before_unicode`` on a dict silently overwrote
   entries when two distinct dirty keys collapsed to the same clean
   key (e.g. ``"\한"`` and ``"한"`` both → ``"한"``). Switch to an
   explicit loop that keeps the first occurrence and logs a WARNING
   on collision so the data loss is visible.

2. ``ensure_ascii=False`` was a deliberate behavior change but not
   documented. Add a comment explaining the rationale (raw UTF-8 in
   JSON-over-HTTP is the standard recommendation; FastAPI emits
   UTF-8 anyway; smaller wire bytes; no double-decoding by clients).

Codex's other finding (the regex doesn't match a literal backslash) is
a false positive — ``r"\\"`` in a Python raw string IS the regex for
a literal backslash; all 7 unit tests pass and a manual ``re.sub``
trial-substitutes correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… test

Codex review round 2 asked for a smoke test that drives the actual
chain ``routes/chat.py`` runs (parse_json_output → strip helper →
json.dumps with ensure_ascii=False) rather than relying solely on the
isolated helper unit tests. This guards against a future refactor that
moves the helper to a different module or drops the wiring.

The test is hermetic — no engine boot, no HTTP — but exercises the
real production functions on a synthetic dirty input that mirrors
what lm-format-enforcer would produce for a CJK-heavy response.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The new TestStripBackslashBeforeUnicode class imports
vllm_mlx.routes.chat, which transitively pulls in mlx. The no-MLX
Linux CI matrix (test-matrix 3.10/3.11/3.12) doesn't have mlx
installed, so these tests previously raised ModuleNotFoundError.

Gate the class with @pytest.mark.skipif so it skips cleanly on
Linux but still runs on test-apple-silicon (where mlx is available).
This mirrors the existing -k "not Integration and not InjectJson"
filter in .github/workflows/ci.yml for similar mlx-touching tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Codex review round 3 flagged the unconditional backslash-strip as a
potential data-loss risk for legitimate JSON outputs that contain a
backslash before a non-ASCII codepoint (e.g. Windows paths like
``"C:\\사용자\\file.txt"`` in a ``response_format=json_object`` reply).

The tradeoff is intentional and matches upstream waybarrios#525:
- The lm-format-enforcer bug is the overwhelming source of these
  sequences in JSON-output responses.
- The file-path case is rare in practice.
- Heuristic gating ("looks like enforcer output") would be fragile.
- Clients needing raw backslash + non-ASCII can use response_format=text.

Document the scope and tradeoff in code so future maintainers
understand the choice. If a user hits the false-positive in practice,
the right next step is a config flag (--no-strip-spurious-backslashes),
not a heuristic.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@raullenchai raullenchai merged commit f8e18b0 into main May 15, 2026
7 checks passed
@raullenchai raullenchai deleted the feat/json-output-non-ascii-cleanup branch May 15, 2026 23:53
@raullenchai raullenchai mentioned this pull request May 16, 2026
10 tasks
raullenchai added a commit that referenced this pull request May 16, 2026
* chore: bump version to 0.6.51

User-facing changes since v0.6.50:

* fix(prefix-cache): slice 3D KV state along the right axis (#392) —
  inference correctness fix for prefix cache restore on KV tensors
  shaped (B, H, S) (was slicing the wrong axis, surfacing as
  silent first-token corruption on cache-hit prompts).
* perf(scheduler): drop per-decode list() copy of output_token_ids
  (#391) — small per-token allocation removed from the decode hot
  path; benefit scales with output length on multi-stream batches.
* feat(json-output): strip spurious backslashes before non-ASCII
  chars (#394) — JSON-emitting models occasionally emit `\\é` etc.
  during constrained generation; now stripped before delivery so
  `json.loads` doesn't choke on user output.

Dev/docs (no user-visible runtime change):
* docs(contributing): codify test precision policy correctness=8bit /
  perf=4bit (#396)
* chore(pr_validate): swap Qwen3.6-27B-4bit → 8bit in stress matrix
  with smoke-tier 4-bit fallback for ≤32 GB hosts (#395)

Pre-merge artifact SHA-256 (audit anchor; will not match post-publish
SHAs because publish.yml rebuilds on Linux runners — see Release SOP §8):
  rapid_mlx-0.6.51-py3-none-any.whl
    5affa5c527bf543b72ddab94e96f2c6308225e6f5facd229d8f21f0b9ede8ce7
  rapid_mlx-0.6.51.tar.gz
    d248b8a7754cbe4d94b4708fa0fb88caa623f7fc71f13f156ece0a3b36224755

Release SOP gates:
* §3 install size: 448 MB (vs 445 MB v0.6.48 baseline, +0.67% — well
  under 1.05× soft-warn threshold).
* §7 supply chain: pip-audit clean on critical deps via OSV; recent
  uploads (HF hub 1.15.0 today, transformers 5.8.1 3 days) verified
  legit upstream (substantive changelogs, known maintainers / HF bot);
  no install.sh / workflows / pyproject.toml diffs since v0.6.50;
  OIDC scope minimal (id-token: write only on publish, contents: write
  only on auto-release); 3 third-party actions still moving-tag pinned
  (peter-evans, peaceiris, codecov) — pre-existing, not a release
  blocker but tracked for follow-up.
* §5 perf: make full in flight (qwen3.5-35b done; qwen3.6-35b mid-run).
  Will report results in PR before merge.
* §4 user-onboarding personas + §6 agent smoke: pending; will run
  serially after §5 (per CLAUDE.md "never in parallel" guidance for
  model-server workloads).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(api): detect text-only forks of multimodal architectures (#393)

Issue #393 reports that ``rapid-mlx serve /path/to/Qwen3.6-35B-A3B-MLX-8bit``
crashes on startup because the model is routed to the MLLM batched
engine even though the user's checkpoint is text-only. Reporter (Tylast)
correctly identified the chain: the checkpoint's config.json declares
``vision_config`` (because the base ``Qwen3_5MoeForConditionalGeneration``
architecture is multimodal-capable), so ``is_mllm_model`` returns True,
so the MLLM loader takes over, then hits the hybrid-backbone /
ArraysCache incompatibility documented in the closed-as-spam #385.

The text-only A3B fork ships zero vision tensors in its safetensors,
even though its config.json carries the full vision_config block. Our
detection trusted the config and ignored the actual weight presence.

Fix: when config indicates VLM AND the path is a local directory AND
that directory ships ``model.safetensors.index.json``, scan the index
for tensor-name prefixes that indicate real vision/audio weights
(``vision_tower``, ``visual.``, ``audio_tower``, ``mm_projector``, …).
If none are present, override to text-only routing. The check fires
only in the True → False direction; the False direction is preserved
as-is to keep existing text-routed models stable.

Conservative on edge cases:
* Single-file safetensors (no sharded index) → return None from the
  probe and trust config. Wrong-True here means the text path errors
  clearly at first image request, whereas wrong-False would silently
  corrupt every text request on a real VLM. The bad-direction cost
  is much smaller.
* Unreadable / oversized / malformed index → same. Fall back to config.
* HF repo IDs (not local dirs) → unchanged; we'd need a network call
  to inspect remote tensors.

Tests:
* New ``TestIsMllmModelWeightsPresenceOverride`` class — 6 cases:
  - vision_config + no vision tensors → False (the #393 fix path)
  - vision_config + vision tensors → True (genuine VLM still works)
  - audio_config + audio tensors → True (audio branch covered)
  - missing index → fall back to config
  - malformed index → fall back to config
  - text-only config → never even probe weights

Total tests: 102 pass (was 96). ruff clean.

This change is bundled into the v0.6.51 bump because the user-facing
fix is small + isolated and waiting for v0.6.52 would mean Tylast
keeps hitting the crash for another release cycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* docs(install): add brew 5.x homebrew/core pre-flight hint

Brew 5.x's install sandbox cannot auto-tap homebrew/core mid-install
when a third-party formula depends on core packages ([email protected], rust).
Users on fresh brew installs (API-only, no homebrew/core tap cloned)
see "Operation not permitted" on /opt/homebrew/Library/Taps/homebrew/.

Pre-tapping with `brew tap homebrew/core --force` (one-time, ~1.3 GB)
lets the install complete. Brew 4.x and earlier never needed this.

---------

Co-authored-by: Your Name <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant