Commit dbe421e
authored
Fix divide-by-zero in GroupNorm two-pass kernel for large batch sizes (#1984)
When batch size N is large enough (e.g., N=512 with C=640), the heuristic
`blocks_per_act_slice = 256 / params.n` truncates to 0 via integer division,
causing a subsequent `div_up(params.hw, blocks_per_act_slice)` to divide by
zero. Fix by clamping blocks_per_act_slice to at least 1 in both forward and
backward two-pass setup functions.
Add regression test covering the exact repro case and all three heuristic
branches.
Signed-off-by: Tailing Yuan <[email protected]>1 parent 212061e commit dbe421e
3 files changed
Lines changed: 41 additions & 0 deletions
File tree
- apex/contrib
- csrc/group_norm
- test/group_norm
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
207 | 210 | | |
208 | 211 | | |
209 | 212 | | |
| |||
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
283 | 318 | | |
284 | 319 | | |
285 | 320 | | |
| |||
0 commit comments