Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ada-ggf25
Copy link

@ada-ggf25 ada-ggf25 commented Nov 29, 2025

Fix: Respect inference_mode when setting adapters with modules_to_save

Fixes #2928

Description

This PR fixes issue #2928 where modules_to_save had requires_grad=True even when inference_mode=True was passed to set_adapter() caused issues when using quantized models (e.g., with bitsandbytes) in inference mode, as the quantized layers require parameters to have requires_grad=False.

Problem

When calling model.set_adapter(adapter_name, inference_mode=True) with a model that has modules_to_save configured (e.g., a classifichead), the modules_to_save parameters would still have requires_grad=True despite being in inference mode. This happened because:

  1. _set_adapter() was calling module.enable_adapters(True) unconditionally
  2. ModulesToSaveWrapper.enable_adapters(True) sets requires_grad_(True) for all active adapters
  3. This occurred before set_adapter() was called with inference_mode, creating a conflict

Solution

The fix ensures that enable_adapters() is only called when not in inference mode:

  1. Modified _set_adapter() in src/peft/utils/other.py:

    • Added conditional check: only call enable_adapters(True) when inference_mode=False
    • This prevents setting gradients to True when inference mode should keep them False
  2. Updated PeftModel.set_adapter() in src/peft/peft_model.py:

    • Added inference_mode parameter to match the API of PeftMixedModel.set_adapter()
    • Passes inference_mode to both _set_adapter() and base_model.set_adapter()
  3. Added comprehensive tests in tests/test_other.py:

    • test_modules_to_save_inference_mode_requires_grad_false: Verifies requires_grad=False in inference mode
    • test_modules_to_save_training_mode_requires_grad_true: Verifies requires_grad=True in training mode
    • test_modules_to_save_inference_mode_with_torch_inference_mode: Verifies compatibility with torch.inference_mode()

Changes Made

Code Changes

src/peft/utils/other.py:

  • Modified _set_adapter() to conditionally call enable_adapters() based on inference_mode parameter

src/peft/peft_model.py:

  • Updated set_adapter() method signature to accept inference_mode parameter
  • Passes inference_mode to underlying adapter setting functions

tests/test_other.py:

  • Added TestModulesToSaveInferenceMode test class with 3 comprehensive tests

Testing

Test Results

All new tests pass:

  • test_modules_to_save_inference_mode_requires_grad_false - PASSED
  • test_modules_to_save_training_mode_requires_grad_true - PASSED
  • test_modules_to_save_inference_mode_with_torch_inference_mode - PASSED

All existing modules_to_save tests pass (11/11)

Related tests pass (71/74 - 3 failures are unrelated BOFT dependency issues)

Code quality checks pass (make quality)

Test Coverage

The new tests verify:

  • modules_to_save parameters correctly have requires_grad=False when inference_mode=True
  • modules_to_save parameters correctly have requires_grad=True when inference_mode=False (training mode)
  • Compatibility with torch.inference_mode() context manager

Example Usage

Before this fix, the following code would fail with quantized models:

model = PeftModel.from_pretrained(base_model, adapter_path)
model = convert_to_int8(model)  # Quantization
model.eval()

with torch.inference_mode():
    model.set_adapter("my_adapter", inference_mode=True)  #  modules_to_save still had requires_grad=True
    _ = model(batch)

After this fix:

model = PeftModel.from_pretrained(base_model, adapter_path)
model = convert_to_int8(model)  # Quantization
model.eval()

with torch.inference_mode():
    model.set_adapter("my_adapter", inference_mode=True)  #  modules_to_save correctly have requires_grad=False
    _ = model(batch)

ada-ggf25 added a commit to ada-ggf25/peft that referenced this pull request Dec 2, 2025
… in ModulesToSaveWrapper

Remove the lines that set requires_grad on original_module in
ModulesToSaveWrapper.enable_adapters() method. This change addresses
the maintainer's feedback that there is no reason to touch the
requires_grad of the original_module here, and it conflicts with
bitsandbytes quantization which requires gradients to be False at all
times.

The original_module's requires_grad is no longer manipulated by
enable_adapters(), only modules_to_save gradients are managed.

Updated test_requires_grad_modules_to_save_disabling to reflect this
change by removing expectations about original_module having gradients
when adapters are disabled.

Related to issue huggingface#2928 and PR huggingface#2931.
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@ada-ggf25
Copy link
Author

Still relevant, please don’t mark as stale/close.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I finally got around to revisit this topic. Please check the comments I added. Moreover, could you please merge with/rebase on the latest main branch?

# if the adapter is found in this module, set it as the active adapter, else disable the adapters of this
# module
if adapter_name_to_set in module._adapters:
if not inference_mode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using enable_adapters here is not the right way. We already pass inference_mode to the module.set_adapter call, it should be implemented there.

assert expected == modules


class TestModulesToSaveInferenceMode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this new test class, let's add the new tests to existing tests for requires_grad, which you already modified above. Just for reference, I mean this test class:

class TestRequiresGrad:

If I'm not mistaken, the following tests should cover what we need:

    def test_requires_grad_follows_inference_mode_modules_to_save(self):
        # check that passing inference_mode to set_adapter has the intended effect with LoRA and modules_to_save
        config0 = LoraConfig(target_modules=["lin0"], modules_to_save=["lin1"])
        peft_model = get_peft_model(MLP(), config0)

        config1 = LoraConfig(target_modules=["lin0"], modules_to_save=["lin1"])
        peft_model.add_adapter("adapter1", config1)

        # active adapter is still "default"
        self.check_requires_grad(
            peft_model,
            "base_model.model.lin1.modules_to_save.default.weight",
            "base_model.model.lin1.modules_to_save.default.bias",
            "base_model.model.lin0.lora_A.default.weight",
            "base_model.model.lin0.lora_B.default.weight",
        )

        # inference mode false (default)

        # set config0 as active, should not change anything
        peft_model.set_adapter("default", inference_mode=False)
        self.check_requires_grad(
            peft_model,
            "base_model.model.lin1.modules_to_save.default.weight",
            "base_model.model.lin1.modules_to_save.default.bias",
            "base_model.model.lin0.lora_A.default.weight",
            "base_model.model.lin0.lora_B.default.weight",
        )

        # set config1 as active, should lead to adapter1 requiring grad
        peft_model.set_adapter("adapter1", inference_mode=False)
        self.check_requires_grad(
            peft_model,
            "base_model.model.lin1.modules_to_save.adapter1.weight",
            "base_model.model.lin1.modules_to_save.adapter1.bias",
            "base_model.model.lin0.lora_A.adapter1.weight",
            "base_model.model.lin0.lora_B.adapter1.weight",
        )

        # inference mode true

        # set config0 as active but in inference mode, should result in no module requiring grad
        peft_model.set_adapter("default", inference_mode=True)
        self.check_requires_grad(peft_model)

        # set config1 as active but in inference mode, should result in no module requiring grad
        peft_model.set_adapter("adapter1", inference_mode=True)
        self.check_requires_grad(peft_model)

    def test_requires_grad_follows_inference_mode_trainable_token_indices(self):
        # check that passing inference_mode to set_adapter has the intended effect with LoRA and trainable tokens
        config0 = LoraConfig(target_modules=["conv1d"], trainable_token_indices={"emb": [0, 1, 2]})
        peft_model = get_peft_model(ModelEmbConv1D(), config0)

        config1 = LoraConfig(target_modules=["lin0"], trainable_token_indices={"emb": [0, 1, 2]})
        peft_model.add_adapter("adapter1", config1)

        # active adapter is still "default"
        self.check_requires_grad(
            peft_model,
            "base_model.model.emb.token_adapter.trainable_tokens_delta.default",
            "base_model.model.conv1d.lora_A.default.weight",
            "base_model.model.conv1d.lora_B.default.weight",
        )

        # inference mode false (default)

        # set config0 as active, should not change anything
        peft_model.set_adapter("default", inference_mode=False)
        self.check_requires_grad(
            peft_model,
            "base_model.model.emb.token_adapter.trainable_tokens_delta.default",
            "base_model.model.conv1d.lora_A.default.weight",
            "base_model.model.conv1d.lora_B.default.weight",
        )

        # set config1 as active, should lead to adapter1 requiring grad
        peft_model.set_adapter("adapter1", inference_mode=False)
        self.check_requires_grad(
            peft_model,
            "base_model.model.emb.token_adapter.trainable_tokens_delta.adapter1",
            "base_model.model.lin0.lora_A.adapter1.weight",
            "base_model.model.lin0.lora_B.adapter1.weight",
        )

        # inference mode true

        # set config0 as active but in inference mode, should result in no module requiring grad
        peft_model.set_adapter("default", inference_mode=True)
        self.check_requires_grad(peft_model)

        # set config1 as active but in inference mode, should result in no module requiring grad
        peft_model.set_adapter("adapter1", inference_mode=True)
        self.check_requires_grad(peft_model)

Please double check if this makes sense to you.

peft_model = get_peft_model(MLP(), config)

# no layer should have requires_grad
# when disabling the adapter, modules_to_save should have requires_grad=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes to this test can be reverted, right?

ada-ggf25 added a commit to ada-ggf25/peft that referenced this pull request Jan 7, 2026
Refactor _set_adapter and ModulesToSaveWrapper.set_adapter to address
maintainer feedback. The enable_adapters calls should not be conditional
in _set_adapter; instead, the inference_mode handling should be
implemented entirely within the set_adapter method.

Changes:
- Remove conditional enable_adapters(True/False) calls from _set_adapter
  function based on inference_mode parameter
- Move enable_adapters logic into ModulesToSaveWrapper.set_adapter method
  to handle inference_mode internally
- Call enable_adapters(not inference_mode) within set_adapter to ensure
  adapters are enabled/disabled correctly based on inference_mode
- Update set_adapter to handle both empty adapter list case and normal
  adapter setting case with proper enable_adapters calls

This refactoring ensures that inference_mode is handled entirely within
the set_adapter method implementation, as requested by the maintainer,
rather than conditionally calling enable_adapters in the _set_adapter
helper function.

Addresses maintainer feedback in PR huggingface#2931.
ada-ggf25 added a commit to ada-ggf25/peft that referenced this pull request Jan 7, 2026
Remove the TestModulesToSaveInferenceMode test class from test_other.py
as requested by the maintainer. The tests for inference_mode behaviour
with modules_to_save should be integrated into the existing
TestRequiresGrad class in test_custom_models.py instead of having a
separate test class.

Changes:
- Remove entire TestModulesToSaveInferenceMode class including:
  - test_modules_to_save_inference_mode_requires_grad_false
  - test_modules_to_save_training_mode_requires_grad_true
  - test_modules_to_save_inference_mode_with_torch_inference_mode
- Tests will be moved to TestRequiresGrad class in test_custom_models.py
  following the maintainer's specified test structure

This change addresses maintainer feedback in PR huggingface#2931 to consolidate
inference_mode tests into the existing requires_grad test suite.
ada-ggf25 added a commit to ada-ggf25/peft that referenced this pull request Jan 7, 2026
Add new tests for inference_mode behaviour and revert changes to
test_requires_grad_modules_to_save_disabling as requested by the
maintainer.

Changes:
- Add test_requires_grad_follows_inference_mode_modules_to_save to
  TestRequiresGrad class to verify that passing inference_mode to
  set_adapter has the intended effect with LoRA and modules_to_save
- Add test_requires_grad_follows_inference_mode_trainable_token_indices
  to TestRequiresGrad class to verify that passing inference_mode to
  set_adapter has the intended effect with LoRA and trainable tokens
- Revert test_requires_grad_modules_to_save_disabling to original
  version that checks for original_module.weight and original_module.bias
  having requires_grad=True when adapters are disabled

The new tests follow the maintainer's specified structure and verify:
- inference_mode=False (default) maintains requires_grad=True for
  active adapters and modules_to_save
- inference_mode=True results in no modules requiring gradients
- Tests cover both modules_to_save and trainable_token_indices scenarios

This addresses maintainer feedback in PR huggingface#2931 to integrate inference_mode
tests into the existing TestRequiresGrad class and restore the original
test expectations for modules_to_save disabling behaviour.
Add optional inference_mode parameter to PeftModel.set_adapter() method
to allow setting adapters in frozen state (requires_grad=False) directly
without manual parameter manipulation.

Changes:
- Add inference_mode parameter with default value False to maintain
  backwards compatibility
- Update method docstring to document the new parameter and clarify
  that adapters are set to trainable unless inference_mode is True
- Remove manual example code snippet showing how to set requires_grad=False
- Pass inference_mode parameter to base_model.set_adapter() and
  _set_adapter() helper function calls

This enhancement simplifies the workflow for users who want to set
adapters in inference mode, addressing the need to manually manipulate
requires_grad flags after setting an adapter.
Add comprehensive test suite to validate that modules_to_save correctly
respect the inference_mode parameter when set_adapter is called.

This test class addresses issue huggingface#2928 where modules_to_save had
requires_grad=True even when inference_mode=True was passed to set_adapter.

Test coverage:
- test_modules_to_save_inference_mode_requires_grad_false: Verifies that
  modules_to_save parameters have requires_grad=False when inference_mode=True
  is passed to set_adapter, ensuring parameters are frozen during inference
- test_modules_to_save_training_mode_requires_grad_true: Verifies that
  modules_to_save parameters have requires_grad=True when inference_mode=False
  is passed to set_adapter, ensuring parameters are trainable during training
- test_modules_to_save_inference_mode_with_torch_inference_mode: Validates
  that modules_to_save work correctly when used with torch.inference_mode()
  context manager and that forward passes still function correctly

All tests use AutoModelForSequenceClassification with LoRA configuration
targeting query and value modules, with classifier as modules_to_save to
provide realistic test scenarios.
Reformat the docstring comment in TestModulesToSaveInferenceMode class
to fit within line length limits by combining two lines into a single line.

This is a minor formatting change to improve code readability and
compliance with project style guidelines.
… in ModulesToSaveWrapper

Remove the lines that set requires_grad on original_module in
ModulesToSaveWrapper.enable_adapters() method. This change addresses
the maintainer's feedback that there is no reason to touch the
requires_grad of the original_module here, and it conflicts with
bitsandbytes quantization which requires gradients to be False at all
times.

The original_module's requires_grad is no longer manipulated by
enable_adapters(), only modules_to_save gradients are managed.

Updated test_requires_grad_modules_to_save_disabling to reflect this
change by removing expectations about original_module having gradients
when adapters are disabled.

Related to issue huggingface#2928 and PR huggingface#2931.
Refactor _set_adapter and ModulesToSaveWrapper.set_adapter to address
maintainer feedback. The enable_adapters calls should not be conditional
in _set_adapter; instead, the inference_mode handling should be
implemented entirely within the set_adapter method.

Changes:
- Remove conditional enable_adapters(True/False) calls from _set_adapter
  function based on inference_mode parameter
- Move enable_adapters logic into ModulesToSaveWrapper.set_adapter method
  to handle inference_mode internally
- Call enable_adapters(not inference_mode) within set_adapter to ensure
  adapters are enabled/disabled correctly based on inference_mode
- Update set_adapter to handle both empty adapter list case and normal
  adapter setting case with proper enable_adapters calls

This refactoring ensures that inference_mode is handled entirely within
the set_adapter method implementation, as requested by the maintainer,
rather than conditionally calling enable_adapters in the _set_adapter
helper function.

Addresses maintainer feedback in PR huggingface#2931.
Remove the TestModulesToSaveInferenceMode test class from test_other.py
as requested by the maintainer. The tests for inference_mode behaviour
with modules_to_save should be integrated into the existing
TestRequiresGrad class in test_custom_models.py instead of having a
separate test class.

Changes:
- Remove entire TestModulesToSaveInferenceMode class including:
  - test_modules_to_save_inference_mode_requires_grad_false
  - test_modules_to_save_training_mode_requires_grad_true
  - test_modules_to_save_inference_mode_with_torch_inference_mode
- Tests will be moved to TestRequiresGrad class in test_custom_models.py
  following the maintainer's specified test structure

This change addresses maintainer feedback in PR huggingface#2931 to consolidate
inference_mode tests into the existing requires_grad test suite.
Add new tests for inference_mode behaviour and revert changes to
test_requires_grad_modules_to_save_disabling as requested by the
maintainer.

Changes:
- Add test_requires_grad_follows_inference_mode_modules_to_save to
  TestRequiresGrad class to verify that passing inference_mode to
  set_adapter has the intended effect with LoRA and modules_to_save
- Add test_requires_grad_follows_inference_mode_trainable_token_indices
  to TestRequiresGrad class to verify that passing inference_mode to
  set_adapter has the intended effect with LoRA and trainable tokens
- Revert test_requires_grad_modules_to_save_disabling to original
  version that checks for original_module.weight and original_module.bias
  having requires_grad=True when adapters are disabled

The new tests follow the maintainer's specified structure and verify:
- inference_mode=False (default) maintains requires_grad=True for
  active adapters and modules_to_save
- inference_mode=True results in no modules requiring gradients
- Tests cover both modules_to_save and trainable_token_indices scenarios

This addresses maintainer feedback in PR huggingface#2931 to integrate inference_mode
tests into the existing TestRequiresGrad class and restore the original
test expectations for modules_to_save disabling behaviour.
@ada-ggf25
Copy link
Author

Thanks for the PR, I finally got around to revisit this topic. Please check the comments I added. Moreover, could you please merge with/rebase on the latest main branch?

Thank you for the detailed feedback! I've addressed all your comments:

Changes Made:

  1. Removed conditional enable_adapters calls from _set_adapter: The enable_adapters logic has been moved entirely into
    the ModulesToSaveWrapper.set_adapter method, so inference_mode is now handled within the set_adapter implementation as you
    suggested.

  2. Removed TestModulesToSaveInferenceMode class: The separate test class has been removed from test_other.py.

  3. Added tests to TestRequiresGrad class: I've added the two tests you specified
    (test_requires_grad_follows_inference_mode_modules_to_save and
    test_requires_grad_follows_inference_mode_trainable_token_indices) to the existing TestRequiresGrad class in
    test_custom_models.py, following your exact specification.

  4. Reverted test_requires_grad_modules_to_save_disabling: The test has been restored to its original version that checks
    for original_module.weight and original_module.bias having requires_grad=True when adapters are disabled.

  5. Rebased on latest main: The branch has been rebased to remove the merge commit and is now based on the latest main branch.

Please let me know if there's anything else that needs to be adjusted!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR, but I think it's overcomplicating things. I flagged the parts of the code that I think need changing.

# when calling model.add_adapter, the new adapter is not automatically active
self._active_adapter = []
# enable/disable adapters based on inference_mode
self.enable_adapters(not inference_mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, calling enable_adapters and requires_grad_(False) here and below should not be necessary. It should be enough to pass inference_mode to set_adapter above. Did you find a situation where making these calls was needed?

Remove redundant enable_adapters calls in ModulesToSaveWrapper.set_adapter()
method. The maintainer correctly identified that passing inference_mode to
set_adapter is sufficient, as the method already handles setting requires_grad
correctly via self.modules_to_save[adapter_name].requires_grad_(not inference_mode).

The enable_adapters calls were redundant and potentially causing issues.
Restore the test_requires_grad_modules_to_save_disabling test to check that
when adapters are disabled, no parameters should have requires_grad=True,
matching the intended behaviour from commit 3ea4e67.

This addresses the maintainer's concern about an incorrect merge conflict
resolution that reverted these test changes.
@ada-ggf25
Copy link
Author

Thanks for updating the PR, but I think it's overcomplicating things. I flagged the parts of the code that I think need changing.

Hi @BenjaminBossan,

Thanks for the review. I've addressed both points:

  1. Removed unnecessary enable_adapters calls: You're right, passing inference_mode to set_adapter is sufficient. The method already sets requires_grad via self.modules_to_save[adapter_name].requires_grad_(not inference_mode), so the enable_adapters calls were redundant. I've removed them.

  2. Reverted test changes: The test reversion was an incorrect merge conflict resolution. I've restored test_requires_grad_modules_to_save_disabling to match commit 3ea4e67, where disabling adapters results in no parameters having requires_grad=True.

The code is simpler and the tests match the intended behaviour. Please let me know if anything else needs adjustment.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Just a small issue left.

peft_model = get_peft_model(MLP(), config)

# no layer should have requires_grad
# when disabling the adapter, modules_to_save should have requires_grad=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert the change completely, including the comment and the formatting?

Revert `test_requires_grad_modules_to_save_disabling` to the version on
`upstream/main`, restoring the original comments and formatting requested by
the reviewer. This aligns the test semantics and style with the existing
suite and avoids over-explaining implementation details in the docstrings.
@ada-ggf25
Copy link
Author

Thanks for the update. Just a small issue left.

Done!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates.

As you can see, the CI is currently failing. The reason is the AdaptionPromptModel, which I honestly didn't think about when reviewing this PR. It also has a set_adapter method which requires adding the new argument.

As for what to do with this argument: I'd say it's not really that important. We can just say: if inference mode: raise ValueError(" ... is not supported").

Add inference_mode parameter to AdaptionPromptModel.set_adapter() method
to match the API signature expected by PeftModel.set_adapter(). When
inference_mode is True, raise a ValueError as this mode is not supported
for AdaptionPromptModel.

This fixes the CI failure where PeftModel.set_adapter() was calling
base_model.set_adapter() with inference_mode parameter, but
AdaptionPromptModel.set_adapter() did not accept this parameter.

Fixes issue reported by maintainer in PR huggingface#2931 review comments.
@ada-ggf25
Copy link
Author

Thanks for the updates.

As you can see, the CI is currently failing. The reason is the AdaptionPromptModel, which I honestly didn't think about when reviewing this PR. It also has a set_adapter method which requires adding the new argument.

As for what to do with this argument: I'd say it's not really that important. We can just say: if inference mode: raise ValueError(" ... is not supported").

It should be fine now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inference mode with Module_to_save LoRA

3 participants