Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix divide-by-zero in GroupNorm two-pass kernel for large batch sizes#1984

Merged
crcrpar merged 1 commit into
NVIDIA:masterfrom
yuantailing:fix-gn-div-0
Mar 5, 2026
Merged

Fix divide-by-zero in GroupNorm two-pass kernel for large batch sizes#1984
crcrpar merged 1 commit into
NVIDIA:masterfrom
yuantailing:fix-gn-div-0

Conversation

@yuantailing
Copy link
Copy Markdown
Contributor

When batch size N is large enough (e.g., N=512 with C=640), the heuristic blocks_per_act_slice = 256 / params.n truncates to 0 via integer division, causing a subsequent div_up(params.hw, blocks_per_act_slice) to divide by zero. Fix by clamping blocks_per_act_slice to at least 1 in both forward and backward two-pass setup functions.

Add regression test covering the exact repro case and all three heuristic branches.

When batch size N is large enough (e.g., N=512 with C=640), the heuristic
`blocks_per_act_slice = 256 / params.n` truncates to 0 via integer division,
causing a subsequent `div_up(params.hw, blocks_per_act_slice)` to divide by
zero. Fix by clamping blocks_per_act_slice to at least 1 in both forward and
backward two-pass setup functions.

Add regression test covering the exact repro case and all three heuristic
branches.

Signed-off-by: Tailing Yuan <[email protected]>
@yuantailing
Copy link
Copy Markdown
Contributor Author

Hi @crcrpar , could you help review this PR?

@crcrpar
Copy link
Copy Markdown
Collaborator

crcrpar commented Mar 5, 2026

Looks good

@crcrpar crcrpar added the contrib label Mar 5, 2026
@crcrpar crcrpar merged commit dbe421e into NVIDIA:master Mar 5, 2026
1 check passed
@yuantailing yuantailing deleted the fix-gn-div-0 branch March 5, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants