Codestin Search App

gabrieldemarmiesse · 2026-05-29T10:56:44Z

Type of change

Bug fix (non-breaking change that fixes an issue)
Performance improvement (includes benchmark results below)
Documentation update
New feature or public API (requires prior proposal or issue approval)
Refactor / internal cleanup (no user-visible change)
Build, CI, or tooling change

Motivation

I want to improve the perf of the causal-conv1d implemented in MAX. I want to use some code I already wrote for my package. For this I need to show i have better numbers. So we must start with a benchmark.

What changed

Added a benchmark for causal-conv1d

Testing

run the benchmark

Checklist

The linked issue above has been reviewed by a maintainer and is
agreed-upon, or this is a trivial fix that does not need prior
approval
PR is small and focused — I've split larger changes into a sequence of
smaller PRs where possible (see
pull request sizes)
I ran ./bazelw run format to format my changes
I added or updated tests to cover my changes
If AI tools assisted with this contribution, I have included an
Assisted-by: trailer in my commit message or this PR description (see
AI Tool Use Policy)

Assisted by Claude

BEGIN_PUBLIC [Kernels][GPU] Add causal_conv1d forward GPU benchmark Adds a kernel-time benchmark for the channel-first causal_conv1d forward GPU kernel (state_space). It mirrors the validated test launch config (kNThreads=128, kNElts=4), times the kernel via the Bench/Bencher iter_custom harness, and reports achieved memory bandwidth (the op is memory-bound) as 2 * batch * dim * seqlen * sizeof(dtype). dtype and conv width are compile-time defines (default bfloat16, width=4 to match the common Mamba config); batch, dim, seqlen and the SiLU activation flag are runtime args. Since causal_conv1d lives in //max:state_space, which the globbed GPU benchmark deps don't include, the target is declared explicitly (and excluded from the glob) following the existing bench_conv2d/bench_conv3d pattern. END_PUBLIC Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Signed-off-by: Gabriel <[email protected]>

github-actions Bot added the waiting-on-review label May 29, 2026

gabrieldemarmiesse marked this pull request as ready for review May 29, 2026 12:34

gabrieldemarmiesse requested a review from a team as a code owner May 29, 2026 12:34

gabrieldemarmiesse changed the title ~~[Kernels][GPU] Add causal_conv1d forward GPU benchmark~~ [Kernels] Use causal_conv1d kernel from Tri Dao for 10-90% speedup May 29, 2026

gabrieldemarmiesse marked this pull request as draft May 29, 2026 12:36

gabrieldemarmiesse changed the title ~~[Kernels] Use causal_conv1d kernel from Tri Dao for 10-90% speedup~~ [Kernels] Add causal_conv1d kernel benchmark May 29, 2026

gabrieldemarmiesse marked this pull request as ready for review May 29, 2026 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernels] Add causal_conv1d kernel benchmark#6624

[Kernels] Add causal_conv1d kernel benchmark#6624
gabrieldemarmiesse wants to merge 1 commit into
modular:mainfrom
gabrieldemarmiesse:add-causal-conv1d-fwd-benchmark

gabrieldemarmiesse commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gabrieldemarmiesse commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Motivation

What changed

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gabrieldemarmiesse commented May 29, 2026 •

edited

Loading