Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Aznix07
Copy link

@Aznix07 Aznix07 commented Nov 4, 2025

What does this PR do?

Fixes the issue where autocast_adapter_dtype=False was being ignored when using quantized models with BitsAndBytes.

Fixes #2889

Problem

When a model is quantized using BitsAndBytes (e.g., 4-bit quantization), LoRA adapters were always initialized with float32 dtype, even when:

  • autocast_adapter_dtype=False was explicitly specified
  • The model's compute dtype was set to float16

This caused unexpected behavior and potential performance/memory issues.

Solution

Added a _get_weight_dtype() helper method to the LoraLayer class that:

  1. Checks for compute_dtype attribute (present in BitsAndBytes quantized layers)
  2. Falls back to weight.dtype for regular layers
  3. Uses this dtype when creating lora_A and lora_B Linear Layers in update_layer()

Testing

✅ Verified the fix works correctly with custom test script:

  • Quantized model (4-bit with compute_dtype=float16) -> LoRA params are float16
  • Non-quantized model (dtype=float16) -> LoRA params are float16
  • Default behavior (autocast_adapter_dtype=True) still works as expected

✅ Ran 131 LoRA config tests locally - all passed

@BenjaminBossan
Copy link
Member

Thanks for proposing this fix @Aznix07. However, to apply this broadly requires a lot more changes. I have worked on those in #2893. I think this PR can be closed. Still, your contribution is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

autocast_adapter_dtype=False doesn't work when the model is quantized

2 participants