Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Kernels] Causal conv1d cleanup#6628

Open
ulmentflam wants to merge 11 commits into
modular:mainfrom
ulmentflam:causal-conv1d-cleanup
Open

[Kernels] Causal conv1d cleanup#6628
ulmentflam wants to merge 11 commits into
modular:mainfrom
ulmentflam:causal-conv1d-cleanup

Conversation

@ulmentflam

Copy link
Copy Markdown
Contributor

Linked issue

Part of #5772
Helps PR #6625

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • Performance improvement (includes benchmark results below)
  • Documentation update
  • New feature or public API (requires prior proposal or issue approval)
  • Refactor / internal cleanup (no user-visible change)
  • Build, CI, or tooling change

Motivation

Causal Conv had about 3k LOC that needed to be cleaned and deduplicated.

What changed

Causal Conv and the ops that call it.

Testing

Verified mamba1 has no regressions, and added proper tests for channel last.

Checklist

  • The linked issue above has been reviewed by a maintainer and is
    agreed-upon, or this is a trivial fix that does not need prior
    approval
  • PR is small and focused — I've split larger changes into a sequence of
    smaller PRs where possible (see
    pull request sizes)
  • I ran ./bazelw run format to format my changes
  • I added or updated tests to cover my changes
  • If AI tools assisted with this contribution, I have included an
    Assisted-by: trailer in my commit message or this PR description (see
    AI Tool Use Policy)

ulmentflam and others added 2 commits May 29, 2026 14:50
BEGIN_PUBLIC
[Kernels][GPU] Collapse causal_conv1d, add channel-last support + tests

Collapse the 20 near-duplicate causal_conv1d kernels into 7 parameterized
functions. Bias and packed-sequence (seq_idx) presence are now compile-time
Bool parameters on the CPU and runtime Int8 arguments on the GPU (mirroring the
varlen_causal_conv1d idiom), replacing the hand-copied {bias|no_bias} x
{seq_idx|none} variants. A single stride-driven CPU core serves both
channel-first and channel-last layouts; the two GPU kernels share a
width-generic scalar accumulation over the conv taps. Read-only inputs (x,
weight, bias, seq_idx) are immutable borrows.

Wire channel-last as a first-class registered op (causal_conv1d_channel_last)
and add CPU + GPU channel-last tests, including a seq_idx packed-sequence
masking case. Migrate the existing channel-first forward and update tests
(CPU + GPU) to the unified signatures. This shrinks causal_conv1d.mojo from
3769 to 918 lines with no change to the causal_conv1d / causal_conv1d_update
graph op interfaces.
END_PUBLIC

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Signed-off-by: Evan <[email protected]>
@ulmentflam ulmentflam requested a review from a team as a code owner May 29, 2026 20:21
@ulmentflam ulmentflam changed the title Causal conv1d cleanup [MAX] Causal conv1d cleanup May 29, 2026
@ulmentflam ulmentflam changed the title [MAX] Causal conv1d cleanup [Kernels] Causal conv1d cleanup May 29, 2026
@ulmentflam

Copy link
Copy Markdown
Contributor Author

@gabrieldemarmiesse This should be good to go and what you should continue your optimizations on.

ulmentflam and others added 4 commits June 11, 2026 22:06
…-conv1d-cleanup

# Conflicts:
#	max/kernels/src/state_space/causal_conv1d.mojo
BEGIN_PUBLIC
[Kernels] Apply mojo format to causal_conv1d.mojo

Reformat the output TileTensor parameter declarations to wrap across
multiple lines, matching the Mojo formatter output. Fixes the CI lint
(mblack) check that failed after the upstream merge.
END_PUBLIC

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@ulmentflam

Copy link
Copy Markdown
Contributor Author

@BradLarson Let me know when I can get a review on this. This is a refactor worth getting in sooner. It cuts about 3k lines of code from the previous permuted version and updates us to new and better syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant