Thanks to visit codestin.com
Credit goes to github.com

Skip to content

test(parser): #174 Mythos sweep — section-range invariant guard (NO FINDINGS)#178

Merged
avrabe merged 1 commit into
mainfrom
mythos/174-parse-core-module-range
May 22, 2026
Merged

test(parser): #174 Mythos sweep — section-range invariant guard (NO FINDINGS)#178
avrabe merged 1 commit into
mainfrom
mythos/174-parse-core-module-range

Conversation

@avrabe

@avrabe avrabe commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Resolves Step 5 of issue #174 — the v0.5 post-ship Mythos sweep's outstanding unverified hypothesis.

Verdict: NO FINDINGS.

The hypothesis

parse_core_module stores reader.range() for the element and data sections into element_section_range / data_section_range (parser.rs:1279 / :1287). parse_element_segments / parse_data_segments then slice module.bytes[start..end] from those ranges with no explicit bounds check (segments.rs:198 / :258). #174 asked: are these LS-P-5 siblings — could wasmparser yield a core-module section reader with a range past the buffer, the way ModuleSection::unchecked_range could?

Why it's NO FINDINGS

It cannot. The crux is the difference between Payload::ModuleSection and a core-module section:

  • ModuleSection is yielded eagerly with an explicitly unchecked range — the nested module isn't parsed yet. That's what made LS-P-5 exploitable.
  • A core-module element/data section is only framed once parse_all has its full declared content. A truncated section — size LEB claiming more bytes than remain — makes parse_all yield an Err, which parse_core_module's payload? propagates. The *_section_range field is never set.

So the downstream slice is defended by construction: every range that reaches it came from a section wasmparser successfully framed, and a framed section's range is in-bounds.

The oracle

Per the Mythos protocol, a NO FINDINGS verdict still wants an oracle. truncated_core_section_errors_rather_than_yielding_oob_range feeds truncated element- and data-section inputs (size LEB = 16, only 2 content bytes) and asserts wasmparser rejects each with an Err rather than handing back a section reader with an out-of-bounds range.

It's also a standing regression guard: a future wasmparser bump that changed the framing behaviour would fail this test and reopen the hypothesis — at which point 1279/1287 would need a checked_section_slice-style guard before the segments.rs slice.

Scope

  • No production code change — the slice sites are correct as-is given the invariant.
  • No LS-N entry (NO FINDINGS — LS-N entries are for confirmed findings).
  • Touches parser.rs (Tier-5) — the Mythos auto-runner will scan it.

Refs: #174 Step 5, LS-P-5.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Mythos delta-pass required

This PR modifies one or more Tier-5 source files (per
scripts/mythos/rank.md):

meld-core/src/parser.rs

Before merge, run the Mythos discover protocol on the
modified Tier-5 files:

  1. Follow scripts/mythos/discover.md
    — one fresh agent session per touched Tier-5 file.
  2. For each finding, the agent must produce both a Kani
    harness and a failing PoC test (per the protocol's
    "if you cannot produce both, do not report" rule).
  3. Attach a comment on this PR with either the findings
    (formatted per discover.md's output schema) or
    NO FINDINGS.
  4. Add the mythos-pass-done label to this PR.

Why this gate exists: LS-A-10
(CABI alignment padding in async-lift retptr writeback) was
found by the v0.8.0 pre-release Mythos pass — but it had
lived in the callback emitter since #128, across six
releases. A PR-time gate would have caught it at review
time instead of at the release boundary.

The gate check on this PR will pass once the label is
applied.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

LS-N verification gate

19/19 approved LS entries verified

count
Passed (≥1 test, all green) 19
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 0

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests

(none)

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Mythos delta-pass (auto)

1 finding(s) across 1 Tier-5 file(s)

File Verdict Hypothesis
meld-core/src/parser.rs ❌ FINDING The outer accumulators in return_area_byte_size (line 1487) and params_area_byte_size (line 1516) use plain += instead of saturating_add, so when the first result/param is a FixedSizeList(u8, u32::MAX) — whose canonical ABI size is correctly saturated to u32::MAX by the helper — adding the next result's 8 bytes wraps the accumulator to 4 in release mode, causing the adapter to allocate a 4-byte retptr buffer that the callee immediately overflows.

Auto-run via anthropics/claude-code-action@v1
(SHA-pinned) on the touched Tier-5 files, using the
maintainer's Max-plan OAuth token. See
.github/workflows/mythos-auto.yml and
scripts/mythos/discover.md.

@avrabe

avrabe commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

Mythos auto-runner finding — reviewed, dispositioned (not confirmed)

The mythos-auto scan of parser.rs returned a FINDING. Validated per scripts/mythos/validate.md:

The finding: flat_byte_size computes the payload width of result<T,E> / variant as max(flat_byte_size(arm)) rather than the Component Model's element-wise flatten_variant JOIN. For result<u64, string> it returns 12; the JOIN-correct value is 16 (joined flat sequence [i32, i64, i32]).

Arithmetic — correct. max() of the arms' byte totals underestimates whenever the arms flatten to different numbers of core values: u64[i64] (8 B, 1 value), string[i32,i32] (8 B, 2 values), max(8,8)=8, but the element-wise join [join(i64,i32), i32] = [i64,i32] = 12 B, +4 discriminant = 16. Real off-by-4.

Impact claim — rejected. The finding asserts "adapter code uses this to size a retptr return-area buffer → 4-byte underallocation → OOB write." That is a hallucination: flat_byte_size has zero consumers in meld-core/src/ — only its own recursion and one saturation test (parser.rs:4395). Retptr return areas are sized by return_area_byte_sizecanonical_abi_size_unpadded, a different function entirely. There is no execution path where the 12-vs-16 discrepancy reaches a buffer allocation.

Verdict: NOT a confirmed finding. Per discover.md's oracle rule, a confirmed finding needs a failing PoC — and no PoC can reach dead code. No LS-N entry. The discover step over-reached on impact; this is exactly the discover→validate split working.

Residual: the arithmetic is a latent correctness defect in a pub fn. Tracked for a separate hygiene PR — fix flat_byte_size's result/variant/option arms to the correct element-wise JOIN, regression test, no LS-N (no reachable hazard).

This PR

PR #178's own change is the #174-Step-5 NO-FINDINGS regression guard (truncated_core_section_errors_rather_than_yielding_oob_range) — test-only, unrelated to the flat_byte_size finding. 12 substantive checks green. The auto-runner flagged a pre-existing line of parser.rs because it scans the whole file.

Applying mythos-pass-done — this is the gate's designed human-review flow (a maintainer reviewed the finding and recorded the disposition), not a bypass.

@avrabe avrabe added the mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR label May 21, 2026
…INDINGS)

Issue #174's v0.5 post-ship Mythos sweep carried an unverified
hypothesis: parse_core_module stores reader.range() for the element
and data sections (parser.rs:1279 / :1287), and parse_element_segments
/ parse_data_segments slice module.bytes[start..end] from those ranges
with no explicit bounds check (segments.rs:198 / :258). The question
was whether 1279/1287 are LS-P-5 siblings — i.e. whether wasmparser
could yield a core-module section reader with a range past the buffer.

Mythos delta-pass verdict: NO FINDINGS.

Unlike Payload::ModuleSection — yielded eagerly with an explicitly
unchecked range before the nested module is parsed, which is what made
LS-P-5 exploitable — a core-module element/data section is only framed
once parse_all has its full declared content. A truncated section
(size LEB claiming more bytes than remain) makes parse_all yield an
Err; parse_core_module's `payload?` propagates it and the
*_section_range field is never set. The downstream slice is therefore
defended by construction: every range that reaches it came from a
section wasmparser successfully framed, and a framed section's range
is in-bounds.

Adds `truncated_core_section_errors_rather_than_yielding_oob_range`,
which feeds truncated element- and data-section inputs and asserts
wasmparser rejects each with an Err rather than handing back a section
reader with an out-of-bounds range. This is the oracle for the
NO FINDINGS verdict and a standing regression guard: a future
wasmparser bump that changed the framing behaviour would fail this
test and reopen the hypothesis.

No production code change — the slice sites are correct as-is given
the invariant. No LS-N entry (NO FINDINGS).

Refs: #174 Step 5, LS-P-5.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@avrabe avrabe force-pushed the mythos/174-parse-core-module-range branch from 743f075 to 297df0d Compare May 21, 2026 19:22
@avrabe avrabe merged commit 5ea8c0e into main May 22, 2026
13 of 14 checks passed
@avrabe avrabe deleted the mythos/174-parse-core-module-range branch May 22, 2026 04:12
avrabe added a commit that referenced this pull request May 24, 2026
…byte_size + LS-P-6/-7/-8/-9/-10/-11/-12/-13/-14/-15/-16/-17/-18/-19 (#179)

* fix(parser): flat_byte_size — element-wise variant JOIN, not max

The mythos-auto delta-pass on PR #178 flagged that flat_byte_size
computes the payload width of result<T,E> and variant as
`max(flat_byte_size(arm))` rather than the Component Model's
element-wise flatten_variant JOIN.

`max` of arm byte totals underestimates whenever the arms flatten
to a different *number* of core values. result<u64, string>: the
ok arm u64 flattens to [i64] (8 B), the err arm string to [i32,i32]
(8 B). The old form gave 4 + max(8,8) = 12, but the joined payload
is [i64, i32] (12 B) and the true flat size is 4 + 12 = 16.

Fix: flat_byte_size is rewritten over a new private flat_width_list
helper that materialises each type's flat core-value width list
and JOINs variant/result arms element-wise. Non-variant types are
byte-for-byte unchanged. flat_width_list caps its length at
FLAT_WIDTH_CAP (256); a type whose flattening exceeds the cap
yields None and flat_byte_size returns u32::MAX, preserving the
LS-P-4 saturation contract and bounding the helper's Vec against
the LS-P-4 OOM class. The LS-P-4 regression test still passes.

Disposition of the mythos-auto finding: the discover step claimed
an OOB-write hazard. Rejected on validation — flat_byte_size has
zero consumers in meld-core/src/; retptr return areas are sized by
return_area_byte_size, a different function. No reachable hazard,
no possible PoC, NOT a confirmed finding, no LS-N entry. This
commit fixes the underlying arithmetic anyway, as correctness
hygiene on a pub fn a future consumer could inherit.

Regression test flat_byte_size_result_uses_element_wise_join_not_max
pins result<u64,string>=16, an unequal-arity variant=16, the
equal-arms result<u32,u32>=8, and non-variant record/u64 unchanged.

Refs: mythos-auto finding on PR #178.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser): saturate area-size accumulators (LS-P-6)

A confirmed Mythos finding — surfaced by the mythos-auto delta-pass
when it re-scanned parser.rs on PR #179.

params_area_byte_size and return_area_byte_size accumulate a
component function's canonical-ABI memory size field by field with
a bare `size += canonical_abi_size_unpadded(ty)`.
canonical_abi_size_unpadded saturates to u32::MAX for a
pathologically large fixed-length-list (the LS-P-4 fix). But LS-P-4
did not reach these two cross-field accumulators: once a first
field saturates `size` to u32::MAX, the next field's `+=` overflows
— debug build panics, release build wraps u32::MAX down to a small
value. params for `(fixed-length-list<f64, 2^29>, u32)` wrap
params_area_byte_size from u32::MAX to ~3. The resolver stores that
as AdapterRequirements::params_area_byte_size; the FACT adapter
passes it to cabi_realloc, allocates a few-byte buffer, and copies
every parameter into it — an OOB write into callee linear memory.

The sibling Record/Tuple accumulators inside
canonical_abi_size_unpadded already use saturating_add — these two
area-size loops were missed by LS-P-4.

Fix: both `+=` sites become `size = size.saturating_add(...)`. A
saturated field keeps the area size near u32::MAX, an
un-allocatable value, so cabi_realloc fails safely instead of
under-allocating.

Mythos oracle: ls_p_6_area_byte_size_saturates_across_fields panics
today on the bare `+=` (debug-build overflow at parser.rs:1613) and
asserts a saturated result after the fix. Promoted to approved loss
scenario LS-P-6 (UCA-P-3, H-2/H-4/H-4.1); nearest primitive-layer
proof is LS-P-4's kani_fixed_size_list_size_no_overflow harness.

Second finding from the auto-runner's parser.rs scan; unlike the
flat_byte_size finding in the same PR (dead code, no reachable
hazard), LS-P-6's impact path is live and confirmed.

Refs: LS-P-6, LS-P-4, mythos-auto finding on PR #179.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser): per-leaf CopyLayout for conditional pointers (LS-P-7)

collect_conditional_pointers and collect_conditional_result_pointers
emit one ConditionalPointerPair per pointer leaf inside an
option/result/variant payload, but computed the CopyLayout once on the
whole payload type. copy_layout only special-cases bare string/list, so
any composite payload (record/tuple/fixed-list) fell to its
`_ => Bulk { byte_multiplier: 1 }` fallback — a list<u64> leaf was
tagged Bulk{1} instead of Bulk{8} (7/8 silent under-copy), and a
pointer-containing list<string> leaf collapsed from Elements to flat
Bulk, dropping recursive inner-pointer fixup.

Add collect_pointer_positions_with_layout / _byte_offsets_with_layout,
which carry each String/List leaf's own CopyLayout alongside its
position; remove the now-dead copy_layout_for_string_or_list_at shim.

Confirmed Mythos finding from the mythos-auto delta-pass on PR #179;
promoted to approved loss scenario LS-P-7. Regression pinned by
ls_p_7_conditional_pointer_layout_is_per_leaf_not_per_composite,
exercising both the flat-param and retptr byte-offset paths.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser): per-spec padded field-size in record/tuple walks (LS-P-8)

The Component Model canonical ABI lays out a record/tuple as:
  s = 0; for each field f:
      s = align_to(s, alignment(f))
      s += size(f)
where size(f) for an aggregate field is its full padded canonical
size. In this codebase that full padded size is
canonical_abi_element_size; canonical_abi_size_unpadded is the outer
type minus its own trailing align-up.

~25 field-walk sites — the Record/Tuple arms of
canonical_abi_size_unpadded, collect_pointer_byte_offsets,
collect_pointer_byte_offsets_with_layout (LS-P-7), the conditional
result/resource/slot/inner-pointer/inner-resource collectors, and the
top-level params/results walks in params_area_byte_size,
return_area_byte_size, pointer_pair_*_offsets/slots, and
resource_*_positions — advanced offset/size by
canonical_abi_size_unpadded(field) instead of
canonical_abi_element_size(field). The per-field align_up does NOT
re-absorb a preceding field's omitted trailing pad when the next
field's alignment is smaller, so a record/tuple containing a padded
aggregate followed by a lower-aligned field came out smaller than the
spec.

Concretely tuple<record{u32,u8}, u8> now computes element_size = 12
(spec) instead of 8; a list<u32> following record{u32,u8} now sits at
byte offset 8 instead of 5. The wrong offsets had been flowing into
the FACT adapter's pointer-pair loads, list-copy byte lengths, and
inner pointer-fixup walks; the area-size functions also under-sized
the cabi_realloc buffer (LS-P-6 hazard class via the per-field
primitive rather than the cross-field +=).

canonical_abi_size_unpadded itself still returns the outer size minus
its own trailing pad — that contract is unchanged; only the per-field
contribution is corrected.

Confirmed Mythos finding from the mythos-auto delta-pass on PR #179
(the auto-runner mis-located it as the option/variant/result payload
contribution, which is actually spec-correct — independent clean-room
verification corrected the location). Promoted to approved loss
scenario LS-P-8. Regression pinned by
ls_p_8_record_tuple_field_accumulation_uses_padded_field_size.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser): saturate-fold total_flat_params (LS-P-9)

total_flat_params picks the canonical-ABI calling convention from the
total flat param count: <= MAX_FLAT_PARAMS (16) → flat; > → params-ptr.
It summed per-param flat_count values with Iterator::sum::<u32>().
flat_count for a FixedSizeList is saturating (LS-P-4), so a nested
FixedSizeList can yield flat_count = u32::MAX; sum() then panics in
debug on u32::MAX + 1 and wraps to a small value in release. The
wrapped total compares <= 16 and the adapter selects the flat
convention for a function that genuinely needs params-ptr —
call-site lowering and callee-side lifting disagree on the ABI slot.

Sibling area-size accumulators (params_area_byte_size /
return_area_byte_size) already use saturating_add per LS-P-6 — this
calling-convention picker was simply missed.

Replaces .sum() with .fold(0u32, u32::saturating_add).

Confirmed Mythos finding from the mythos-auto delta-pass on PR #179.
Promoted to approved loss scenario LS-P-9. Regression pinned by
ls_p_9_total_flat_params_saturates_across_params.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser,adapter): outer-guard chain for nested conditional pointers (LS-P-10)

A ConditionalPointerPair for a pointer leaf inside a nested
option/result/variant payload — e.g. result<option<string>, u32>,
variant { a(option<string>), b(u32) }, option<option<string>> —
previously carried only the INNERMOST discriminant guard. The FACT
adapter processed each pair independently with a single (load,
compare, branch). When the runtime value sat in a sibling arm
(e.g. Err(some_u32) of the result), the byte at the option's
discriminant slot held unrelated payload bytes; if those bytes
happened to read as the inner discriminant value (1 = Some), the
adapter sampled the adjacent slots as a (ptr, len) string pair and
ran cabi_realloc + memory.copy with attacker-controlled source
pointer and length — an arbitrary cross-component memory read,
plus a forged string pointer handed to the callee.

Surfaced by the mythos-auto delta-pass on PR #179. Clean-room
independently verified as a real, exploitable memory-safety hazard
(validator traced the four fact.rs consumer loops and confirmed
each treats every pair's guard independently — no implicit AND
with any enclosing conditional).

Fix:
  * Add DiscriminantGuard struct + outer_guards: Vec<DiscriminantGuard>
    field to ConditionalPointerPair (innermost guard stays in the
    existing discriminant_* fields for backward compatibility — empty
    outer_guards behaves identically to the old single-guard path).
  * Thread outer_guards through collect_conditional_pointers and
    collect_conditional_result_pointers recursion: at each
    option/result/variant arm, build the current guard and append it
    to the chain before recursing into the payload; stamp each
    emitted pair with the prefix chain seen so far.
  * Two new fact-adapter helpers (emit_conditional_guard_chain_flat /
    emit_conditional_guard_chain_byte) emit each guard's (load disc,
    I32Const value, I32Eq) and I32And them all together before the
    existing If/copy block.
  * Update the four consumer loops in fact.rs (flat-param,
    flat-result, retptr-param, retptr-result) to call the helpers.

Promoted to approved loss scenario LS-P-10 (UCA-P-3, H-2/H-4/H-4.2).
Regression pinned by
ls_p_10_nested_conditional_pointer_carries_outer_guard_chain.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(resolver): reject duplicate flat-name exports across modules (LS-P-11)

resolve_via_flat_names populated its export index with a blind
HashMap::insert(key, …) where key is the flat export name. When two
core modules within one component both exported the same name, the
second silently overwrote the first (last-writer wins), routing any
importer of that name to the wrong module with no error or warning —
the fused module wired wrong but type-clean.

The instance-graph resolver (taken whenever the component has an
InstanceSection, which wit-component / wasm-tools always emit for
multi-module components) is immune. The vulnerable path is
practically unreachable for production components: defensive
hardening for the synthetic-fixture and legacy single-module
fallback shapes that take the flat-name path.

Fix: replace the blind insert with an explicit collision check that
returns a new Error::DuplicateModuleExport { component_idx,
export_name, first_module_idx, second_module_idx }, mirroring the
existing DuplicateModuleInstantiation pattern (resolver.rs:2115).

Confirmed Mythos finding from the mythos-auto delta-pass on PR #179,
clean-room verified. Promoted to approved loss scenario LS-P-11
(priority low — defensive hardening, not a security emergency).
Regression pinned by
ls_p_11_duplicate_flat_name_export_is_rejected.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser): refuse list<option/result/variant-with-pointer> (LS-P-12 mitigation)

element_inner_pointers's match has no arms for Option, Result, or
Variant (line 3203, `_ => {} // scalars, options, results — no
pointer pairs`). For list<option<string>> (and the Result/Variant
analogues), the helper returns an empty vector even though the
element type DOES contain a pointer. copy_layout(List(inner)) then
classifies as CopyLayout::Bulk { byte_multiplier: element_size } —
which the FACT adapter handles with a flat memory.copy and no
per-element walk. Every option's (ptr, len) pair was copied
byte-for-byte into the callee, with `ptr` still referencing the
source component's memory. The callee then dereferenced a wild
pointer per Some(...) element — a cross-memory dangling reference /
arbitrary read on every list use [H-4 / H-4.2].

Conservative mitigation: copy_layout's List(inner) arm now panics
with a clearly-labelled LS-P-12 message whenever
`type_contains_pointers(inner)` AND element_inner_pointers returns
empty — converting silent cross-memory dangling-reference into a
loud refusal at adapter-generation time.

The full structural fix requires (a) Option/Result/Variant arms on
element_inner_pointers that recurse into the payload at the payload
byte offset, AND (b) per-element DiscriminantGuard chains on the
inner-pointer descriptor (extending CopyLayout::Elements'
inner_pointers field), AND (c) FACT-side per-element guard
evaluation before each inner copy. That is structurally analogous
to LS-P-10 but on the list-element axis rather than the top-level
conditional axis — tracked as follow-up.

Confirmed Mythos finding from the mythos-auto delta-pass on PR #179,
clean-room verified. Promoted to approved loss scenario LS-P-12
(priority high). Regression pinned by
ls_p_12_list_of_option_string_refuses_rather_than_silently_corrupts,
ls_p_12_list_of_result_string_refuses, and the positive sanity test
ls_p_12_pure_scalar_option_list_is_still_bulk.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(adapter): async param-copy uses resolver positions, not (i32,i32) heuristic (LS-P-13)

emit_param_copy_step (the P3-async lift adapter's parameter copy
step, called from generate_async_callback_adapter and
generate_async_stackful_adapter) walked caller_type.params looking
for adjacent (i32, i32) slots and gated each match on
pointer_pair_positions.iter().any(|_| true) — semantically
!is_empty(). Every adjacent integer-pair argument was therefore
rewritten via cabi_realloc + cross-memory memory.copy as if it were
a (ptr, len) string/list, with one integer used as the source
pointer and the other as the byte count.

For `fn f(a: i32, s: string, b: i32, c: i32)` lowered to flat
[I32, I32, I32, I32, I32] with resolver positions [1], the buggy
code emitted positions = [0, 2]. It then ran:
  - cabi_realloc(0, 0, 1, ptr_s) — string ptr used as length;
  - memory.copy(new_ptr, a, ptr_s) — reading from caller address a;
  - cabi_realloc(0, 0, 1, b) + memory.copy(new_ptr, len_s, b).
The real string at flat index 1 was never copied. The callee saw
mangled integers, the original string contents weren't transferred,
and the copy could trap on the overflow guard or perform a
cross-memory read at an attacker-influenced address.

The resolver's pointer_pair_param_positions returns flat indices
computed by walking the function's params with flat_count.
Canonical lowering preserves param order between caller and callee
component types, so those flat indices apply equally to both sides.
The previous comment block claiming a "callee order vs caller
order" mismatch was misleading.

Replaces the heuristic walk with
site.requirements.pointer_pair_positions.clone(); the resolver
already produces the correct positions.

Confirmed Mythos finding from the mythos-auto delta-pass on PR
#179, clean-room verified. Promoted to approved loss scenario
LS-P-13 (priority high). Regression pinned by
ls_p_13_pointer_pair_param_positions_is_flat_indices_not_just_nonempty,
which asserts the resolver returns [1] for (a: u32, s: string, b:
u32, c: u32) and [1, 4] for the mixed (a, s1, b, s2, c) signature.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(adapter): overflow-guard nested-list inner buf_len multiplication (LS-P-14)

emit_patch_nested_indirections computes the inner-list buf_len for
each per-element cabi_realloc + memory.copy by loading the
callee-supplied len and multiplying by sub_elem_size with a bare
i32.mul. i32.mul is modulo 2^32, so a callee-controlled len near
u32::MAX / sub_elem_size wrapped buf_len to a small value. The
subsequent old_ptr + buf_len > mem_bytes bounds check used i32.add
(also wrapping) and was bypassed. The adapter allocated/copied
only the wrapped byte count while the caller-side bulk copy of the
outer (ptr, len) retained the original large len — silent
truncation of the inner list contents, plus OOB read/write into
adjacent caller-allocated memory on every dereference past the
truncated edge.

The emit_overflow_guard helper (added as the LS-A-7 leg (a) fix for
the outer copy paths) was never retrofitted to the inner copy. Fix
stashes the loaded len into the existing l_buf_len scratch local,
calls emit_overflow_guard(body, l_buf_len, sub_elem_size) which
traps via `unreachable` when the multiplication would wrap, then
re-fetches the local for the multiplication. The guard is a no-op
when sub_elem_size == 1.

Confirmed Mythos finding from the mythos-auto delta-pass on PR
#179. Promoted to approved loss scenario LS-P-14 (priority high).
Regression pinned by
ls_p_14_nested_list_inner_copy_emits_overflow_guard, which emits
a synthetic patch loop for `record { items: list<u32> }`
(sub_elem_size = 4) and asserts the encoded function body contains
an Unreachable opcode — the only place that opcode is emitted
along this path is inside emit_overflow_guard.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(resolver): drop OOB resource_type_id instead of misclassifying (LS-P-15)

resolve_resource_positions decided callee-vs-caller resource
ownership via:

    component.component_type_defs
        .get(pos.resource_type_id as usize)
        .map(|def| !matches!(def, ComponentTypeDef::Import(_)))
        .unwrap_or(true)

When `resource_type_id` exceeded `component_type_defs.len()` —
stale id, alias remap past the local table, malformed input —
`.get(...) → None` and `unwrap_or(true)` silently classified the
resource as callee-defined. The adapter then emitted a
[resource-rep] call where [resource-new] was correct (or vice
versa), swapping the two handle-conversion sides on every fused
cross-component call passing that handle. The handle type-checks
on both sides (both i32-shaped), so the validator doesn't catch
it; the error only surfaces when the handle is dereferenced at
runtime.

Reachability is bounded — the instance-graph path keys
resource_type_id through validated parser-produced indices, so
this is defensive hardening rather than a memory-safety emergency.

Fix replaces the unwrap_or(true) with an explicit match on
.get(...): Some(def) classifies by type; None emits a log::warn!
and `continue`, dropping the position. The downstream adapter
either finds no work for the unused slot or surfaces a loud
missing-fixup error at adapter generation — never silently swaps.

Confirmed Mythos finding from the mythos-auto delta-pass on PR
#179. Promoted to approved loss scenario LS-P-15 (priority low).
Regression pinned by
ls_p_15_out_of_bounds_resource_type_id_is_dropped_not_misclassified,
which builds a synthetic ResourceImportMap whose lookup succeeds
at resource_type_id 999, calls resolve_resource_positions against
an empty component_type_defs, and asserts the returned Vec is
empty (pre-fix: 1 mis-classified entry; post-fix: dropped).

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(adapter): bounds-check second I32Load16U for UTF-16 lone surrogate (LS-P-16)

emit_utf16_to_utf8_transcode's surrogate-pair If arm
unconditionally emitted a second I32Load16U at
mem16[ptr + (src_idx + 1) * 2] whenever the first code unit fell
into [0xD800, 0xDC00) (high surrogate). The loop's only bounds
check guarded the first code unit per iteration. For input whose
last code unit was a lone high surrogate, the second load read
2 bytes past the caller-supplied UTF-16 buffer; those bytes were
treated as the low surrogate (without validating they actually
were a low surrogate) and packed into a 4-byte UTF-8 sequence
written to callee memory — silent cross-memory leak of
attacker-adjacent caller bytes into the callee's transcoded
output per UTF-16→UTF-8 string transfer.

Reachable for any cross-memory UTF-16-caller / UTF-8-callee
fusion (line 614, 2999-3001) whose UTF-16 string ends on a high
surrogate, including malformed input from JS or trunctated
strings.

This is the conservative mitigation: inject a `src_idx + 1 >=
input_len` check inside the surrogate-pair If arm and
unreachable-trap on failure. The Canonical-ABI-correct behaviour
replaces the lone surrogate with U+FFFD (3-byte UTF-8 EF BF BD)
and continues; tracked as a structural follow-up.

Confirmed Mythos finding from the mythos-auto delta-pass on PR
#179, independently clean-room verified. Promoted to approved
loss scenario LS-P-16 (priority high). Regression pinned by
ls_p_16_utf16_lone_high_surrogate_oob_guard_emitted, a
structural test that requires the LS-P-16 marker AND an
Unreachable + I32GeU opcode pair to live inside the
surrogate-pair If arm before the second I32Load16U.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(resolver): warn before heuristic on mixed-encoding caller (LS-P-17)

Two structurally-identical caller-encoding lookup loops (primary
~2877-2910, fallback ~3175-3225) filtered from_component.imports
on ComponentTypeRef::Func(_) to find a caller's Lower options for
each resolved interface. WIT interface imports lower to
ComponentTypeRef::Instance(_), so the loops never matched for
typical wit-component / wasm-tools output and fell through to a
heuristic min_by_key over caller_lower_map — picking the
lowest-indexed Lower's encoding for every interface.

Single-encoding callers (the common case) get the right answer by
coincidence. Mixed-encoding callers (e.g. a JS/.NET host
component lowering UTF-16 to one import alongside Rust UTF-8 to
another) get string_transcoding silently miscalibrated for one or
more interfaces, producing scrambled strings at the call
boundary.

This is the conservative mitigation: detect mixed-encoding
callers (values() not all identical) before the heuristic fires
and emit a log::warn! with the LS-P-17 marker and interface name.
Single-encoding callers see no behavioural change. The full
structural per-interface attribution (resolve caller component
func index from the core import via a caller_core_import_to_comp_func
map, OR extend the filter to walk ComponentTypeRef::Instance
aliases) is tracked as follow-up.

Confirmed Mythos finding from the mythos-auto delta-pass on PR
#179, independently clean-room verified. Promoted to approved
loss scenario LS-P-17 (priority low — latent for single-encoding
callers). Regression pinned by
ls_p_17_mixed_caller_encoding_warns_before_heuristic_fallback,
a structural test that requires the LS-P-17 marker at both
sites and the warn-on-mixed pattern (all_same uniformity check
+ log::warn!) to be present.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(parser,adapter): LS-P-18 refinement of P-12 + LS-P-19 UTF-8 OOB

LS-P-18 — `copy_layout(List(inner))` LS-P-12 mitigation was
bypassed when a record mixed a covered pointer (bare string/list)
with a hidden conditional pointer (option<string>): the covered
field made `element_inner_pointers` non-empty, the emptiness-based
LS-P-12 panic didn't fire, and `CopyLayout::Elements.inner_pointers`
silently omitted the option-payload pointer. Adapter never fixed
up the conditional payload across memories — callee elements
retained source-memory string pointers per Some(_).

Replaces the emptiness check with deep recursive
`has_pointer_bearing_conditional(inner)` that walks Option /
Result / Variant arms through Records, Tuples, FixedSizeLists,
and Type aliases. Any pointer-bearing conditional anywhere in the
element layout now triggers a panic with the LS-P-18 marker
(message intentionally includes "LS-P-12" via the follow-up
phrasing so the existing should_panic(expected="LS-P-12") tests
continue to pass).

LS-P-19 — `emit_utf8_to_utf16_transcode` (mirror of LS-P-16, UTF-8
direction): the outer-loop bounds-check guarded only the lead
byte; each multi-byte branch (2/3/4-byte) unconditionally read
continuation bytes at src_idx + 1/+2/+3 via I32Load8U. A UTF-8
string ending on a truncated multi-byte lead caused the adapter
to read 1–3 bytes of attacker-adjacent caller memory and fold
them into a synthesized code point emitted as UTF-16 in the
callee. Conservative mitigation prepends an
`src_idx + N >= input_len` unreachable trap to each multi-byte
branch (N = 1, 2, 3).

Both confirmed Mythos findings from the mythos-auto delta-pass on
PR #179. Clean-room-verified disposition: LS-P-12 refinement
gap (real), UTF-8 OOB (real, mirror of LS-P-16). LS-P-18 promoted
to priority high; LS-P-19 promoted to priority high. Regression
pinned by:
  - ls_p_18_mixed_record_with_option_string_bypasses_p12_then_refuses
  - ls_p_18_pure_bare_pointer_record_still_works (positive sanity)
  - ls_p_19_utf8_to_utf16_continuation_byte_oob_guard_emitted

Co-Authored-By: Claude Opus 4.7 <[email protected]>

---------

Co-authored-by: Claude Opus 4.7 <[email protected]>
avrabe added a commit that referenced this pull request May 24, 2026
… file (#181)

The whole-file scan that landed with v0.9.0 (#162, #164, #170, #173,
#175) caused a treadmill across v0.10.0's #178 and #179: every
parser.rs / fact.rs / resolver.rs PR re-triggered every latent
canonical-ABI bug in the touched file, regardless of whether the PR
went near that code. PR #179 surfaced 4+ findings in successive
re-scans of parser.rs alone; each fix exposed the next, and one
finding (the auto-runner's claimed inversion of LS-P-8 against
canonical-abi.py::record_size) was an outright false positive.

This commit moves the scan to a diff-scoped model:

  1. The scan job's actions/checkout step now uses fetch-depth=0
     so both base.sha and head.sha are reachable.

  2. A new "Extract PR diff for ${matrix.file}" step writes
     `git diff --no-color BASE...HEAD -- $F` to a workspace file
     under mythos-diffs/. Triple-dot uses the merge-base so commits
     the base branch advanced past after PR open do not show up.

  3. The discover prompt now references the diff file by path
     (diff_path / diff_size step outputs) and tells the AI to
     report only findings *introduced* by the diff. Pre-existing
     bugs in unchanged regions are explicitly out of scope —
     they can be filed against main in their own dedicated PR.
     Full-file context remains readable for caller/callee
     understanding.

An empty diff (rename / mode / pure delete) is allowed — the AI
sees no introduced changes and reports NO_FINDINGS by construction;
no skip logic required at the workflow level.

Unblocks future Tier-5 PRs from being judged on bugs they did not
introduce. Latent bugs in the unchanged file body remain the
project's problem to fix proactively (the LS-N gate continues to
pin every approved scenario), but they no longer block unrelated
PRs from merging.

Co-authored-by: Claude Opus 4.7 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mythos-pass-done Mythos delta-pass completed on Tier-5 file changes; findings (or NO FINDINGS) attached to PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant