[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition #151478

weifengpy · 2025-04-16T21:36:51Z

🚀 The feature, motivation and pitch

torch.layer_norm throws error when input and weight are in different dtypes. However, it runs fine with backend=aot_eager_decomp_partition, because of decomposation of torch.layer_norm into fp32 ops

we run into this because online job disable pt2, but offline training requires pt2. ideally we want the same behavior across eager and compile

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @chauhang @penguinwu @SherlockNoMad @bdhirsh

# python test_layer_norm.py
import torch

def forward(input):
    normalized_shape = (4, )
    weight = torch.ones(4, device="cuda")
    bias = torch.ones(4, device="cuda")
    eps = 0.1
    output = torch.layer_norm(
        input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled
    )
    return output


x = torch.tensor([[1.0, 2.0, 3.0, 4.0],
              [2.0, 4.0, 6.0, 8.0]], device="cuda")

# no error
forward_compiled = torch.compile(forward, backend="aot_eager_decomp_partition")
forward_compiled(x.to(torch.bfloat16))

# error
forward_compiled = torch.compile(forward, backend="aot_eager")
forward_compiled(x.to(torch.bfloat16))

# error
# forward(x.to(torch.bfloat16))

error

RuntimeError: expected scalar type BFloat16 but found Float

While executing %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%arg0_1, [4], %ones, %ones_1, 0.1), kwargs = {})
GraphModule: class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: "bf16[2, 4][4, 1]"):
         # File: /data/users/weif/pytorch/test_layer_norm.py:5 in forward, code: weight = torch.ones(4, device="cuda")
        ones: "f32[4][1]" = torch.ops.aten.ones.default([4], device = device(type='cuda'), pin_memory = False)

         # File: /data/users/weif/pytorch/test_layer_norm.py:6 in forward, code: bias = torch.ones(4, device="cuda")
        ones_1: "f32[4][1]" = torch.ops.aten.ones.default([4], device = device(type='cuda'), pin_memory = False)

         # File: /data/users/weif/pytorch/test_layer_norm.py:8 in forward, code: output = torch.layer_norm(
        native_layer_norm = torch.ops.aten.native_layer_norm.default(arg0_1, [4], ones, ones_1, 0.1);  arg0_1 = ones = ones_1 = None
        getitem: "bf16[2, 4][4, 1]" = native_layer_norm[0];  native_layer_norm = None
        return (getitem,)

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

bdhirsh · 2025-04-16T22:20:34Z

@weifengpy it sounds like there is a dtype assertion that runs in eager that we're missing in compile. We should probably fix that. Just to confirm - it sounds like you want compile to error here in the same way that eager errors?

weifengpy · 2025-04-16T23:16:37Z

it sounds like there is a dtype assertion

it is just dtype assertion? I thought eager cannot run because of mixed dtypes in cuda kernels

sounds like you want compile to error here in the same way that eager errors?

is warning an option? I am worried about breaking internal jobs by throwing hard errors. I am more worried about silent dtype casting

bdhirsh added module: error checking Bugs related to incorrect/lacking error checking oncall: pt2 module: decompositions Topics related to decomposition (excluding PrimTorch) labels Apr 16, 2025

mlazos added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix high priority labels Apr 22, 2025

pytorch-bot bot added the triage review label Apr 22, 2025

mlazos removed high priority triage review labels Apr 22, 2025

masnesral added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition #151478

[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition #151478

weifengpy commented Apr 16, 2025 •

edited by pytorch-bot bot

Loading

bdhirsh commented Apr 16, 2025

weifengpy commented Apr 16, 2025

[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition #151478

[PT2] torch.layer_norm errors in eager but runs fine in backend=aot_eager_decomp_partition #151478

Comments

weifengpy commented Apr 16, 2025 • edited by pytorch-bot bot Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

bdhirsh commented Apr 16, 2025

weifengpy commented Apr 16, 2025

weifengpy commented Apr 16, 2025 •

edited by pytorch-bot bot

Loading