Fix: Respect `inference_mode` when setting adapters with `modules_to_save` (Issue #2928) #2931

ada-ggf25 · 2025-11-29T13:09:00Z

Fix: Respect `inference_mode` when setting adapters with `modules_to_save`

Description

This PR fixes issue #2928 where modules_to_save had requires_grad=True even when inference_mode=True was passed to set_adapter() caused issues when using quantized models (e.g., with bitsandbytes) in inference mode, as the quantized layers require parameters to have requires_grad=False.

Problem

When calling model.set_adapter(adapter_name, inference_mode=True) with a model that has modules_to_save configured (e.g., a classifichead), the modules_to_save parameters would still have requires_grad=True despite being in inference mode. This happened because:

_set_adapter() was calling module.enable_adapters(True) unconditionally
ModulesToSaveWrapper.enable_adapters(True) sets requires_grad_(True) for all active adapters
This occurred before set_adapter() was called with inference_mode, creating a conflict

Solution

The fix ensures that enable_adapters() is only called when not in inference mode:

Modified _set_adapter() in src/peft/utils/other.py:
- Added conditional check: only call enable_adapters(True) when inference_mode=False
- This prevents setting gradients to True when inference mode should keep them False
Updated PeftModel.set_adapter() in src/peft/peft_model.py:
- Added inference_mode parameter to match the API of PeftMixedModel.set_adapter()
- Passes inference_mode to both _set_adapter() and base_model.set_adapter()
Added comprehensive tests in tests/test_other.py:
- test_modules_to_save_inference_mode_requires_grad_false: Verifies requires_grad=False in inference mode
- test_modules_to_save_training_mode_requires_grad_true: Verifies requires_grad=True in training mode
- test_modules_to_save_inference_mode_with_torch_inference_mode: Verifies compatibility with torch.inference_mode()

Changes Made

Code Changes

src/peft/utils/other.py:

Modified _set_adapter() to conditionally call enable_adapters() based on inference_mode parameter

src/peft/peft_model.py:

Updated set_adapter() method signature to accept inference_mode parameter
Passes inference_mode to underlying adapter setting functions

tests/test_other.py:

Added TestModulesToSaveInferenceMode test class with 3 comprehensive tests

Testing

Test Results

All new tests pass:

test_modules_to_save_inference_mode_requires_grad_false - PASSED
test_modules_to_save_training_mode_requires_grad_true - PASSED
test_modules_to_save_inference_mode_with_torch_inference_mode - PASSED

All existing modules_to_save tests pass (11/11)

Related tests pass (71/74 - 3 failures are unrelated BOFT dependency issues)

Code quality checks pass (make quality)

Test Coverage

The new tests verify:

modules_to_save parameters correctly have requires_grad=False when inference_mode=True
modules_to_save parameters correctly have requires_grad=True when inference_mode=False (training mode)
Compatibility with torch.inference_mode() context manager

Example Usage

Before this fix, the following code would fail with quantized models:

model = PeftModel.from_pretrained(base_model, adapter_path)
model = convert_to_int8(model)  # Quantization
model.eval()

with torch.inference_mode():
    model.set_adapter("my_adapter", inference_mode=True)  #  modules_to_save still had requires_grad=True
    _ = model(batch)

After this fix:

model = PeftModel.from_pretrained(base_model, adapter_path)
model = convert_to_int8(model)  # Quantization
model.eval()

with torch.inference_mode():
    model.set_adapter("my_adapter", inference_mode=True)  #  modules_to_save correctly have requires_grad=False
    _ = model(batch)

… in ModulesToSaveWrapper Remove the lines that set requires_grad on original_module in ModulesToSaveWrapper.enable_adapters() method. This change addresses the maintainer's feedback that there is no reason to touch the requires_grad of the original_module here, and it conflicts with bitsandbytes quantization which requires gradients to be False at all times. The original_module's requires_grad is no longer manipulated by enable_adapters(), only modules_to_save gradients are managed. Updated test_requires_grad_modules_to_save_disabling to reflect this change by removing expectations about original_module having gradients when adapters are disabled. Related to issue huggingface#2928 and PR huggingface#2931.

github-actions · 2025-12-29T15:05:04Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

ada-ggf25 · 2026-01-05T14:51:47Z

Still relevant, please don’t mark as stale/close.

BenjaminBossan

Thanks for the PR, I finally got around to revisit this topic. Please check the comments I added. Moreover, could you please merge with/rebase on the latest main branch?

BenjaminBossan · 2026-01-06T10:53:46Z

src/peft/utils/other.py

            # if the adapter is found in this module, set it as the active adapter, else disable the adapters of this
            # module
            if adapter_name_to_set in module._adapters:
+                if not inference_mode:


Using enable_adapters here is not the right way. We already pass inference_mode to the module.set_adapter call, it should be implemented there.

BenjaminBossan · 2026-01-06T10:55:48Z

tests/test_other.py

            assert expected == modules
+
+
+class TestModulesToSaveInferenceMode:


Instead of adding this new test class, let's add the new tests to existing tests for requires_grad, which you already modified above. Just for reference, I mean this test class:

peft/tests/test_custom_models.py

Line 3892 in 261366d

class TestRequiresGrad:

If I'm not mistaken, the following tests should cover what we need:

def test_requires_grad_follows_inference_mode_modules_to_save(self): # check that passing inference_mode to set_adapter has the intended effect with LoRA and modules_to_save config0 = LoraConfig(target_modules=["lin0"], modules_to_save=["lin1"]) peft_model = get_peft_model(MLP(), config0) config1 = LoraConfig(target_modules=["lin0"], modules_to_save=["lin1"]) peft_model.add_adapter("adapter1", config1) # active adapter is still "default" self.check_requires_grad( peft_model, "base_model.model.lin1.modules_to_save.default.weight", "base_model.model.lin1.modules_to_save.default.bias", "base_model.model.lin0.lora_A.default.weight", "base_model.model.lin0.lora_B.default.weight", ) # inference mode false (default) # set config0 as active, should not change anything peft_model.set_adapter("default", inference_mode=False) self.check_requires_grad( peft_model, "base_model.model.lin1.modules_to_save.default.weight", "base_model.model.lin1.modules_to_save.default.bias", "base_model.model.lin0.lora_A.default.weight", "base_model.model.lin0.lora_B.default.weight", ) # set config1 as active, should lead to adapter1 requiring grad peft_model.set_adapter("adapter1", inference_mode=False) self.check_requires_grad( peft_model, "base_model.model.lin1.modules_to_save.adapter1.weight", "base_model.model.lin1.modules_to_save.adapter1.bias", "base_model.model.lin0.lora_A.adapter1.weight", "base_model.model.lin0.lora_B.adapter1.weight", ) # inference mode true # set config0 as active but in inference mode, should result in no module requiring grad peft_model.set_adapter("default", inference_mode=True) self.check_requires_grad(peft_model) # set config1 as active but in inference mode, should result in no module requiring grad peft_model.set_adapter("adapter1", inference_mode=True) self.check_requires_grad(peft_model) def test_requires_grad_follows_inference_mode_trainable_token_indices(self): # check that passing inference_mode to set_adapter has the intended effect with LoRA and trainable tokens config0 = LoraConfig(target_modules=["conv1d"], trainable_token_indices={"emb": [0, 1, 2]}) peft_model = get_peft_model(ModelEmbConv1D(), config0) config1 = LoraConfig(target_modules=["lin0"], trainable_token_indices={"emb": [0, 1, 2]}) peft_model.add_adapter("adapter1", config1) # active adapter is still "default" self.check_requires_grad( peft_model, "base_model.model.emb.token_adapter.trainable_tokens_delta.default", "base_model.model.conv1d.lora_A.default.weight", "base_model.model.conv1d.lora_B.default.weight", ) # inference mode false (default) # set config0 as active, should not change anything peft_model.set_adapter("default", inference_mode=False) self.check_requires_grad( peft_model, "base_model.model.emb.token_adapter.trainable_tokens_delta.default", "base_model.model.conv1d.lora_A.default.weight", "base_model.model.conv1d.lora_B.default.weight", ) # set config1 as active, should lead to adapter1 requiring grad peft_model.set_adapter("adapter1", inference_mode=False) self.check_requires_grad( peft_model, "base_model.model.emb.token_adapter.trainable_tokens_delta.adapter1", "base_model.model.lin0.lora_A.adapter1.weight", "base_model.model.lin0.lora_B.adapter1.weight", ) # inference mode true # set config0 as active but in inference mode, should result in no module requiring grad peft_model.set_adapter("default", inference_mode=True) self.check_requires_grad(peft_model) # set config1 as active but in inference mode, should result in no module requiring grad peft_model.set_adapter("adapter1", inference_mode=True) self.check_requires_grad(peft_model)

Please double check if this makes sense to you.

BenjaminBossan · 2026-01-06T10:57:24Z

tests/test_custom_models.py

        peft_model = get_peft_model(MLP(), config)

-        # no layer should have requires_grad
+        # when disabling the adapter, modules_to_save should have requires_grad=False


I think the changes to this test can be reverted, right?

Refactor _set_adapter and ModulesToSaveWrapper.set_adapter to address maintainer feedback. The enable_adapters calls should not be conditional in _set_adapter; instead, the inference_mode handling should be implemented entirely within the set_adapter method. Changes: - Remove conditional enable_adapters(True/False) calls from _set_adapter function based on inference_mode parameter - Move enable_adapters logic into ModulesToSaveWrapper.set_adapter method to handle inference_mode internally - Call enable_adapters(not inference_mode) within set_adapter to ensure adapters are enabled/disabled correctly based on inference_mode - Update set_adapter to handle both empty adapter list case and normal adapter setting case with proper enable_adapters calls This refactoring ensures that inference_mode is handled entirely within the set_adapter method implementation, as requested by the maintainer, rather than conditionally calling enable_adapters in the _set_adapter helper function. Addresses maintainer feedback in PR huggingface#2931.

Remove the TestModulesToSaveInferenceMode test class from test_other.py as requested by the maintainer. The tests for inference_mode behaviour with modules_to_save should be integrated into the existing TestRequiresGrad class in test_custom_models.py instead of having a separate test class. Changes: - Remove entire TestModulesToSaveInferenceMode class including: - test_modules_to_save_inference_mode_requires_grad_false - test_modules_to_save_training_mode_requires_grad_true - test_modules_to_save_inference_mode_with_torch_inference_mode - Tests will be moved to TestRequiresGrad class in test_custom_models.py following the maintainer's specified test structure This change addresses maintainer feedback in PR huggingface#2931 to consolidate inference_mode tests into the existing requires_grad test suite.

Add new tests for inference_mode behaviour and revert changes to test_requires_grad_modules_to_save_disabling as requested by the maintainer. Changes: - Add test_requires_grad_follows_inference_mode_modules_to_save to TestRequiresGrad class to verify that passing inference_mode to set_adapter has the intended effect with LoRA and modules_to_save - Add test_requires_grad_follows_inference_mode_trainable_token_indices to TestRequiresGrad class to verify that passing inference_mode to set_adapter has the intended effect with LoRA and trainable tokens - Revert test_requires_grad_modules_to_save_disabling to original version that checks for original_module.weight and original_module.bias having requires_grad=True when adapters are disabled The new tests follow the maintainer's specified structure and verify: - inference_mode=False (default) maintains requires_grad=True for active adapters and modules_to_save - inference_mode=True results in no modules requiring gradients - Tests cover both modules_to_save and trainable_token_indices scenarios This addresses maintainer feedback in PR huggingface#2931 to integrate inference_mode tests into the existing TestRequiresGrad class and restore the original test expectations for modules_to_save disabling behaviour.

Add optional inference_mode parameter to PeftModel.set_adapter() method to allow setting adapters in frozen state (requires_grad=False) directly without manual parameter manipulation. Changes: - Add inference_mode parameter with default value False to maintain backwards compatibility - Update method docstring to document the new parameter and clarify that adapters are set to trainable unless inference_mode is True - Remove manual example code snippet showing how to set requires_grad=False - Pass inference_mode parameter to base_model.set_adapter() and _set_adapter() helper function calls This enhancement simplifies the workflow for users who want to set adapters in inference mode, addressing the need to manually manipulate requires_grad flags after setting an adapter.

Add comprehensive test suite to validate that modules_to_save correctly respect the inference_mode parameter when set_adapter is called. This test class addresses issue huggingface#2928 where modules_to_save had requires_grad=True even when inference_mode=True was passed to set_adapter. Test coverage: - test_modules_to_save_inference_mode_requires_grad_false: Verifies that modules_to_save parameters have requires_grad=False when inference_mode=True is passed to set_adapter, ensuring parameters are frozen during inference - test_modules_to_save_training_mode_requires_grad_true: Verifies that modules_to_save parameters have requires_grad=True when inference_mode=False is passed to set_adapter, ensuring parameters are trainable during training - test_modules_to_save_inference_mode_with_torch_inference_mode: Validates that modules_to_save work correctly when used with torch.inference_mode() context manager and that forward passes still function correctly All tests use AutoModelForSequenceClassification with LoRA configuration targeting query and value modules, with classifier as modules_to_save to provide realistic test scenarios.

Reformat the docstring comment in TestModulesToSaveInferenceMode class to fit within line length limits by combining two lines into a single line. This is a minor formatting change to improve code readability and compliance with project style guidelines.

… in ModulesToSaveWrapper Remove the lines that set requires_grad on original_module in ModulesToSaveWrapper.enable_adapters() method. This change addresses the maintainer's feedback that there is no reason to touch the requires_grad of the original_module here, and it conflicts with bitsandbytes quantization which requires gradients to be False at all times. The original_module's requires_grad is no longer manipulated by enable_adapters(), only modules_to_save gradients are managed. Updated test_requires_grad_modules_to_save_disabling to reflect this change by removing expectations about original_module having gradients when adapters are disabled. Related to issue huggingface#2928 and PR huggingface#2931.

Refactor _set_adapter and ModulesToSaveWrapper.set_adapter to address maintainer feedback. The enable_adapters calls should not be conditional in _set_adapter; instead, the inference_mode handling should be implemented entirely within the set_adapter method. Changes: - Remove conditional enable_adapters(True/False) calls from _set_adapter function based on inference_mode parameter - Move enable_adapters logic into ModulesToSaveWrapper.set_adapter method to handle inference_mode internally - Call enable_adapters(not inference_mode) within set_adapter to ensure adapters are enabled/disabled correctly based on inference_mode - Update set_adapter to handle both empty adapter list case and normal adapter setting case with proper enable_adapters calls This refactoring ensures that inference_mode is handled entirely within the set_adapter method implementation, as requested by the maintainer, rather than conditionally calling enable_adapters in the _set_adapter helper function. Addresses maintainer feedback in PR huggingface#2931.

Remove the TestModulesToSaveInferenceMode test class from test_other.py as requested by the maintainer. The tests for inference_mode behaviour with modules_to_save should be integrated into the existing TestRequiresGrad class in test_custom_models.py instead of having a separate test class. Changes: - Remove entire TestModulesToSaveInferenceMode class including: - test_modules_to_save_inference_mode_requires_grad_false - test_modules_to_save_training_mode_requires_grad_true - test_modules_to_save_inference_mode_with_torch_inference_mode - Tests will be moved to TestRequiresGrad class in test_custom_models.py following the maintainer's specified test structure This change addresses maintainer feedback in PR huggingface#2931 to consolidate inference_mode tests into the existing requires_grad test suite.

Add new tests for inference_mode behaviour and revert changes to test_requires_grad_modules_to_save_disabling as requested by the maintainer. Changes: - Add test_requires_grad_follows_inference_mode_modules_to_save to TestRequiresGrad class to verify that passing inference_mode to set_adapter has the intended effect with LoRA and modules_to_save - Add test_requires_grad_follows_inference_mode_trainable_token_indices to TestRequiresGrad class to verify that passing inference_mode to set_adapter has the intended effect with LoRA and trainable tokens - Revert test_requires_grad_modules_to_save_disabling to original version that checks for original_module.weight and original_module.bias having requires_grad=True when adapters are disabled The new tests follow the maintainer's specified structure and verify: - inference_mode=False (default) maintains requires_grad=True for active adapters and modules_to_save - inference_mode=True results in no modules requiring gradients - Tests cover both modules_to_save and trainable_token_indices scenarios This addresses maintainer feedback in PR huggingface#2931 to integrate inference_mode tests into the existing TestRequiresGrad class and restore the original test expectations for modules_to_save disabling behaviour.

ada-ggf25 · 2026-01-07T16:50:03Z

Thanks for the PR, I finally got around to revisit this topic. Please check the comments I added. Moreover, could you please merge with/rebase on the latest main branch?

Thank you for the detailed feedback! I've addressed all your comments:

Changes Made:

Removed conditional enable_adapters calls from _set_adapter: The enable_adapters logic has been moved entirely into
the ModulesToSaveWrapper.set_adapter method, so inference_mode is now handled within the set_adapter implementation as you
suggested.
Removed TestModulesToSaveInferenceMode class: The separate test class has been removed from test_other.py.
Added tests to TestRequiresGrad class: I've added the two tests you specified
(test_requires_grad_follows_inference_mode_modules_to_save and
test_requires_grad_follows_inference_mode_trainable_token_indices) to the existing TestRequiresGrad class in
test_custom_models.py, following your exact specification.
Reverted test_requires_grad_modules_to_save_disabling: The test has been restored to its original version that checks
for original_module.weight and original_module.bias having requires_grad=True when adapters are disabled.
Rebased on latest main: The branch has been rebased to remove the merge commit and is now based on the latest main branch.

Please let me know if there's anything else that needs to be adjusted!

BenjaminBossan

Thanks for updating the PR, but I think it's overcomplicating things. I flagged the parts of the code that I think need changing.

BenjaminBossan · 2026-01-12T11:31:26Z

src/peft/utils/other.py

            # when calling model.add_adapter, the new adapter is not automatically active
            self._active_adapter = []
+            # enable/disable adapters based on inference_mode
+            self.enable_adapters(not inference_mode)


If my understanding is correct, calling enable_adapters and requires_grad_(False) here and below should not be necessary. It should be enough to pass inference_mode to set_adapter above. Did you find a situation where making these calls was needed?

tests/test_custom_models.py

Remove redundant enable_adapters calls in ModulesToSaveWrapper.set_adapter() method. The maintainer correctly identified that passing inference_mode to set_adapter is sufficient, as the method already handles setting requires_grad correctly via self.modules_to_save[adapter_name].requires_grad_(not inference_mode). The enable_adapters calls were redundant and potentially causing issues.

Restore the test_requires_grad_modules_to_save_disabling test to check that when adapters are disabled, no parameters should have requires_grad=True, matching the intended behaviour from commit 3ea4e67. This addresses the maintainer's concern about an incorrect merge conflict resolution that reverted these test changes.

ada-ggf25 · 2026-01-13T08:31:44Z

Thanks for updating the PR, but I think it's overcomplicating things. I flagged the parts of the code that I think need changing.

Hi @BenjaminBossan,

Thanks for the review. I've addressed both points:

Removed unnecessary enable_adapters calls: You're right, passing inference_mode to set_adapter is sufficient. The method already sets requires_grad via self.modules_to_save[adapter_name].requires_grad_(not inference_mode), so the enable_adapters calls were redundant. I've removed them.
Reverted test changes: The test reversion was an incorrect merge conflict resolution. I've restored test_requires_grad_modules_to_save_disabling to match commit 3ea4e67, where disabling adapters results in no parameters having requires_grad=True.

The code is simpler and the tests match the intended behaviour. Please let me know if anything else needs adjustment.

BenjaminBossan

Thanks for the update. Just a small issue left.

BenjaminBossan · 2026-01-13T10:21:53Z

tests/test_custom_models.py

        peft_model = get_peft_model(MLP(), config)

-        # no layer should have requires_grad
+        # when disabling the adapter, modules_to_save should have requires_grad=False


Could you please revert the change completely, including the comment and the formatting?

Revert `test_requires_grad_modules_to_save_disabling` to the version on `upstream/main`, restoring the original comments and formatting requested by the reviewer. This aligns the test semantics and style with the existing suite and avoids over-explaining implementation details in the docstrings.

ada-ggf25 · 2026-01-14T07:12:15Z

Thanks for the update. Just a small issue left.

Done!

HuggingFaceDocBuilderDev · 2026-01-14T10:14:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for the updates.

As you can see, the CI is currently failing. The reason is the AdaptionPromptModel, which I honestly didn't think about when reviewing this PR. It also has a set_adapter method which requires adding the new argument.

As for what to do with this argument: I'd say it's not really that important. We can just say: if inference mode: raise ValueError(" ... is not supported").

Add inference_mode parameter to AdaptionPromptModel.set_adapter() method to match the API signature expected by PeftModel.set_adapter(). When inference_mode is True, raise a ValueError as this mode is not supported for AdaptionPromptModel. This fixes the CI failure where PeftModel.set_adapter() was calling base_model.set_adapter() with inference_mode parameter, but AdaptionPromptModel.set_adapter() did not accept this parameter. Fixes issue reported by maintainer in PR huggingface#2931 review comments.

ada-ggf25 · 2026-01-17T06:06:09Z

Thanks for the updates.

As you can see, the CI is currently failing. The reason is the AdaptionPromptModel, which I honestly didn't think about when reviewing this PR. It also has a set_adapter method which requires adding the new argument.

As for what to do with this argument: I'd say it's not really that important. We can just say: if inference mode: raise ValueError(" ... is not supported").

It should be fine now!

ada-ggf25 mentioned this pull request Nov 29, 2025

Inference mode with Module_to_save LoRA #2928

Open

BenjaminBossan requested changes Jan 6, 2026

View reviewed changes

ada-ggf25 force-pushed the Guilherme_Grancho branch from 5fdebb3 to d49985e Compare January 7, 2026 15:14

ada-ggf25 added 7 commits January 7, 2026 16:24

ada-ggf25 force-pushed the Guilherme_Grancho branch from 0b46c6d to 8d61c1b Compare January 7, 2026 16:34

BenjaminBossan requested changes Jan 12, 2026

View reviewed changes

ada-ggf25 added 2 commits January 13, 2026 08:21

BenjaminBossan requested changes Jan 13, 2026

View reviewed changes

BenjaminBossan requested changes Jan 14, 2026

View reviewed changes

		assert expected == modules


		class TestModulesToSaveInferenceMode:

Fix: Respect inference_mode when setting adapters with modules_to_save (Issue #2928) #2931

Are you sure you want to change the base?

Fix: Respect inference_mode when setting adapters with modules_to_save (Issue #2928) #2931

Conversation

ada-ggf25 commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix: Respect inference_mode when setting adapters with modules_to_save

Description

Problem

Solution

Changes Made

Code Changes

Testing

Test Results

Test Coverage

Example Usage

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

ada-ggf25 commented Jan 5, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

ada-ggf25 commented Jan 7, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ada-ggf25 commented Jan 13, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

ada-ggf25 commented Jan 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 14, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

ada-ggf25 commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Respect `inference_mode` when setting adapters with `modules_to_save` (Issue #2928) #2931

Fix: Respect `inference_mode` when setting adapters with `modules_to_save` (Issue #2928) #2931

ada-ggf25 commented Nov 29, 2025 •

edited

Loading

Fix: Respect `inference_mode` when setting adapters with `modules_to_save`