Large output differences with facebook/bart-base model

Significant output differences when compiling and running the `facebook/bart-base` (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings.

Compare the output using the following code:

```python
import torch
from transformers import BartTokenizer, BartModel
import torch_tensorrt

# Set device and backend
backend = "torch_tensorrt"
device = "cuda:0"

# Load tokenizer and model
tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')
model = BartModel.from_pretrained('facebook/bart-base')
model.eval()
model = model.to(device)

# Prepare input
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()} 

# Run inference before Torch-TensorRT
outputs_before = model(**inputs)

# Apply Torch-TensorRT optimization
model = torch.compile(
    model,
    backend=backend,
    options={
        "truncate_long_and_double": True,
        "enabled_precisions": {torch.float16, torch.float32},
    },
    dynamic=False,
)

# Run inference after Torch-TensorRT
outputs_after = model(**inputs)

# Compare outputs
last_hidden_states_before = outputs_before.last_hidden_state
last_hidden_states_after = outputs_after.last_hidden_state

# Calculate the maximum absolute difference
max_diff = torch.max(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the mean absolute difference
mean_abs_diff = torch.mean(torch.abs(last_hidden_states_before - last_hidden_states_after)).item()

# Calculate the plain mean of the differences (not absolute)
mean_diff = torch.mean(last_hidden_states_before - last_hidden_states_after).item()

# Print the outputs, max difference, mean absolute difference, and plain mean difference
print("Outputs before Torch-TensorRT:")
print(last_hidden_states_before)
print("\nOutputs after Torch-TensorRT:")
print(last_hidden_states_after)

print(f"\nMaximum absolute difference: {max_diff}")
print(f"Mean absolute difference: {mean_abs_diff}")
print(f"Mean difference: {mean_diff}")
```

Here are the differences I'm seeing:
- **Maximum absolute difference**: 6.1822
- **Mean absolute difference**: 0.8487
- **Mean difference**: -0.0164

These values are much larger than expected.


## Additional Tests


1. I tried compiling the model with FP16 precision enabled using the following code, but the output differences remain significant:

    ```python
    model = BartModel.from_pretrained('facebook/bart-base', torch_dtype=torch.float16)
    ```

2. I also enabled `"use_fp32_acc"` and `"use_explicit_typing"`, but the differences persisted:

    ```python
    model = torch.compile(
        model,
        backend="torch_tensorrt",
        options={
            "truncate_long_and_double": True,
            "enabled_precisions": {torch.float16, torch.float32},
            "use_fp32_acc": True,
            "use_explicit_typing": True,
        },
        dynamic=False,
    )
    ```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large output differences with facebook/bart-base model #3252

Additional Tests

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Large output differences with facebook/bart-base model #3252

Description

Additional Tests

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions