Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add trust_remote_code configuration option#31

Merged
p-e-w merged 6 commits intop-e-w:masterfrom
kldzj:trust-remote-code
Nov 24, 2025
Merged

Add trust_remote_code configuration option#31
p-e-w merged 6 commits intop-e-w:masterfrom
kldzj:trust-remote-code

Conversation

@kldzj
Copy link
Contributor

@kldzj kldzj commented Nov 20, 2025

Problem

Some models require running custom code. If we don't specify trust_remote_code, the process will be interrupted and we'll be asked everytime the tokenizer or model is reloaded, for whether we want to allow it.

Solution

Introduce a --trust-remote-code flag, which defaults to False, and pass it to all .from_pretrained() calls.

Result

The process will now run uninterrupted. If we didn't specify the flag and the model needs to run custom code, it will fail when trying to load the model the first time with a error that specifies that remote code must be trusted.

One minor inconvenience is that the flag needs to explicitly needs to be followed with a boolean value. I didn't want to overcomplicate the model arg parsing logic, which in its current iteration prevents us from setting cli_implicit_flags=True in the SettingsConfigDict, meaning heretic --trust-remote-code [...] isn't sufficient, instead a boolean value must be passed like heretic --trust-remote-code true [...].

@kldzj
Copy link
Contributor Author

kldzj commented Nov 20, 2025

I just saw your comment in another PR:

The trust_remote_code configuration parameter is also unnecessary, as in a terminal environment Transformers will prompt for that interactively when loading a model that requires it, which is a far better user experience than having the program crash and then needing to relaunch with an unsafe argument.

I have to vehemently disagree, the number of times you have to confirm is making the abliteration process anything else than smooth. Maybe the crashing behavior is unsatisfactory, but nowhere close to having to agree over and over again.

@p-e-w
Copy link
Owner

p-e-w commented Nov 21, 2025

You're right that having to agree every time is a bug. But a command-line flag is not the correct fix. This change creates the following workflow:

  1. Run Heretic with some model
  2. Heretic crashes while loading the model (because of the default trust_remote_code=False)
  3. Say "damn, I forgot about that"
  4. Re-run Heretic with --trust_remote_code True

The workflow we want is:

  1. Run Heretic with some model
  2. While loading the model, Heretic prompts the user for permission to run remote code
  3. The user agrees, and the program proceeds (without ever asking again, of course)

So we must find a way to remember the choice made by the user the first time. There is actually an extremely simple fix that does almost the right thing: Just add trust_remote_code=True to reload_model. That method is only called if the original load succeeded (which requires the user agreeing to execute remote code), so any subsequent loads are safe. But there is a small problem: reload_model is also used for evaluate_model, when the model path is overwritten, so this would load a model with the flag enabled that the user hasn't agreed to yet. This could be fixed e.g. by passing an optional argument to reload_model.

Edit: Actually, we can have the CLI flag, but if it is set to False the program should prompt and remember the choice, instead of crashing and forcing the user to re-run to enable remote code.

@Vinayyyy7
Copy link
Contributor

The thing I added was actually useful you mean to say the --trust-remote-code flag NOW hmm.

…sly `None` so the user wouldn't be asked multiple times
@kldzj
Copy link
Contributor Author

kldzj commented Nov 21, 2025

I think this is the cleanest way, by setting it to None by default and after the tokenizer loaded to True if it previously was None.

If the model requires custom code, loading the tokenizer would prompt the user to confirm allowing custom code. If the user declines the program will stop. If the user confirms, each subsequent model load will not prompt again.

Should I still implement this explicit check for loading the evaluate_model?

)

trust_remote_code: bool | None = Field(
default=None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That's definitely the correct default.

self.settings.model,
dtype=dtype,
device_map=self.settings.device_map,
trust_remote_code=self.settings.trust_remote_code,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't safe in this form because of evaluate_model (see my earlier comment).

)

if self.settings.trust_remote_code is None:
self.settings.trust_remote_code = True
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be careful here. Is it possible for the tokenizer to not require remote code, but for the model itself to require it? If so, this is unsafe, because no prompt would appear when loading the tokenizer, and then we flip it to True, loading the main model without a prompt, and the user was never given a choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, though I'd think this to not be the case, since they're being loaded from the same repository. In the cases I tested it was always asking for both, but I definitely can't say this is true in every case.

@kldzj
Copy link
Contributor Author

kldzj commented Nov 22, 2025

Thanks for taking another look. What do you think about this approach @p-e-w?

By the way, I tested your theory and you're definitely correct. If the tokenizer does not have an auto mapping it will not prompt, so the previous iteration would have been unsafe.

self.trusted_models = {settings.model: settings.trust_remote_code}

if self.settings.evaluate_model is not None:
self.trusted_models[settings.model] = settings.trust_remote_code
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line. Doesn't it do exactly the same thing as line 52?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoopsie, that should've been the evaluate_model obviously

self.settings.model,
dtype=dtype,
device_map=self.settings.device_map,
trust_remote_code=self.trusted_models.get(self.settings.model),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. If the user explicitly passes --trust-remote-code True, we'd expect this to also extend to the evaluated model, no?

Copy link
Contributor Author

@kldzj kldzj Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the other line was meant for, although I've referenced the wrong model name in line 55.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And settings.model is being overridden in main.py#L192, so with model.py#L55 corrected this should be working as intended.

Copy link
Owner

@p-e-w p-e-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this change is security-critical, I have done a very detailed review to convince myself that it is safe, and I now believe that it is indeed.

)

trust_remote_code: bool | None = Field(
default=None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Correct default, equivalent to not passing an argument to from_pretrained.

self.tokenizer: PreTrainedTokenizerBase = AutoTokenizer.from_pretrained(
settings.model
settings.model,
trust_remote_code=settings.trust_remote_code,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Completely safe so far, will simply follow the trust_remote_code setting, defaulting to None, which prompts the user.

self.tokenizer.padding_side = "left"

self.model = None
self.trusted_models = {settings.model: settings.trust_remote_code}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Completely safe. This just says "use the setting for the model".

self.trusted_models = {settings.model: settings.trust_remote_code}

if self.settings.evaluate_model is not None:
self.trusted_models[settings.evaluate_model] = settings.trust_remote_code
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Completely safe. This just says "use the setting for the evaluated model".

settings.model,
dtype=dtype,
device_map=settings.device_map,
trust_remote_code=self.trusted_models.get(settings.model),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks up the stored trust value. This is safe assuming said value is correct.

)

# If we reach this point and the model requires trust_remote_code,
# the user must have confirmed it.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or settings.trust_remote_code was set to True)

# If we reach this point and the model requires trust_remote_code,
# the user must have confirmed it.
if self.trusted_models.get(settings.model) is None:
self.trusted_models[settings.model] = True
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗ ✅ This is the critical part, and the comment correctly describes why it works. Declining the prompt for a model requiring trust_remote_code will raise an exception, so at this point, we know that the user has either confirmed the prompt or passed --trust-remote-code True to begin with.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Small imperfection: If loading the model fails because of the wrong dtype, this part will not be reached and the user will be prompted again for the next dtype in the cascade, even though they already agreed to trust remote code.

self.settings.model,
dtype=dtype,
device_map=self.settings.device_map,
trust_remote_code=self.trusted_models.get(self.settings.model),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks up the stored trust value. This is safe assuming said value is correct. Note that settings.model might have been changed to the value of settings.evaluate_model, but that doesn't make this unsafe as this model path would not have been set to True in trusted_models yet by the user confirming the initial prompt. If settings.trust_remote_code was set to True by the user, the code above would have set this lookup to True, so we won't get a prompt, as desired.

)

if self.trusted_models.get(self.settings.model) is None:
self.trusted_models[self.settings.model] = True
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Same logic as above. This currently doesn't do anything, as the evaluate_model is only loaded once, but it completes the trust logic as the user must have agreed to the prompt at this point.

@p-e-w p-e-w merged commit 452b35e into p-e-w:master Nov 24, 2025
4 checks passed
@p-e-w
Copy link
Owner

p-e-w commented Nov 24, 2025

Which model(s) requiring trust_remote_code did you use for testing?

@kldzj
Copy link
Contributor Author

kldzj commented Nov 24, 2025

Which model(s) requiring trust_remote_code did you use for testing?

kldzj/gemma-3-1b-it-remote-code so I can test locally. Most of the others I found, before cloning Gemma 3, are either too large for my local system or unsupported for different reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments