Fix LoRA module mapping for Phi models #1375

arnavgarg1 · 2024-01-19T18:34:31Z

Microsoft updated the modeling_phi.py file 1 day ago: https://huggingface.co/microsoft/phi-2/blob/main/modeling_phi.py. They did this for Phi-1, Phi-1_5 and Phi-2.

The net effect of this change is that they no longer have the same model architecture from the originally released model -they use GQA now: huggingface/transformers#28163. This means that Wqkv and out_proj are no longer valid target modules. Because of this, the current default LoRA target modules for Phi are incompatible with the latest versions of these models and the error is silent, leading to poor LoRA fine-tuning performance.

This PR updates the default target modules to match the new model architecture.

Using Transformers 4.36.2

>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.02s/it]
Some weights of the model checkpoint at microsoft/phi-2 were not used when initializing PhiForCausalLM: ['model.layers.7.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.30.self_attn.k_proj.bias', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.30.self_attn.v_proj.bias', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.31.self_attn.q_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.31.self_attn.v_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.30.self_attn.q_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.31.self_attn.k_proj.bias', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of PhiForCausalLM were not initialized from the model checkpoint at microsoft/phi-2 and are newly initialized: ['model.layers.13.self_attn.query_key_value.bias', 'model.layers.15.self_attn.query_key_value.weight', 'model.layers.12.self_attn.query_key_value.bias', 'model.layers.30.self_attn.query_key_value.bias', 'model.layers.25.self_attn.query_key_value.weight', 'model.layers.14.self_attn.query_key_value.bias', 'model.layers.24.self_attn.query_key_value.bias', 'model.layers.26.self_attn.query_key_value.weight', 'model.layers.21.self_attn.query_key_value.bias', 'model.layers.25.self_attn.query_key_value.bias', 'model.layers.14.self_attn.query_key_value.weight', 'model.layers.17.self_attn.query_key_value.weight', 'model.layers.11.self_attn.query_key_value.bias', 'model.layers.18.self_attn.query_key_value.bias', 'model.layers.23.self_attn.query_key_value.weight', 'model.layers.31.self_attn.query_key_value.bias', 'model.layers.2.self_attn.query_key_value.bias', 'model.layers.12.self_attn.query_key_value.weight', 'model.layers.9.self_attn.query_key_value.bias', 'model.layers.20.self_attn.query_key_value.weight', 'model.layers.26.self_attn.query_key_value.bias', 'model.layers.30.self_attn.query_key_value.weight', 'model.layers.7.self_attn.query_key_value.weight', 'model.layers.28.self_attn.query_key_value.weight', 'model.layers.22.self_attn.query_key_value.bias', 'model.layers.2.self_attn.query_key_value.weight', 'model.layers.8.self_attn.query_key_value.weight', 'model.layers.15.self_attn.query_key_value.bias', 'model.layers.1.self_attn.query_key_value.weight', 'model.layers.27.self_attn.query_key_value.bias', 'model.layers.10.self_attn.query_key_value.weight', 'model.layers.16.self_attn.query_key_value.bias', 'model.layers.28.self_attn.query_key_value.bias', 'model.layers.29.self_attn.query_key_value.weight', 'model.layers.3.self_attn.query_key_value.bias', 'model.layers.19.self_attn.query_key_value.bias', 'model.layers.4.self_attn.query_key_value.bias', 'model.layers.31.self_attn.query_key_value.weight', 'model.layers.18.self_attn.query_key_value.weight', 'model.layers.16.self_attn.query_key_value.weight', 'model.layers.21.self_attn.query_key_value.weight', 'model.layers.22.self_attn.query_key_value.weight', 'model.layers.29.self_attn.query_key_value.bias', 'model.layers.5.self_attn.query_key_value.bias', 'model.layers.8.self_attn.query_key_value.bias', 'model.layers.9.self_attn.query_key_value.weight', 'model.layers.3.self_attn.query_key_value.weight', 'model.layers.7.self_attn.query_key_value.bias', 'model.layers.27.self_attn.query_key_value.weight', 'model.layers.1.self_attn.query_key_value.bias', 'model.layers.6.self_attn.query_key_value.weight', 'model.layers.19.self_attn.query_key_value.weight', 'model.layers.0.self_attn.query_key_value.bias', 'model.layers.6.self_attn.query_key_value.bias', 'model.layers.0.self_attn.query_key_value.weight', 'model.layers.20.self_attn.query_key_value.bias', 'model.layers.23.self_attn.query_key_value.bias', 'model.layers.11.self_attn.query_key_value.weight', 'model.layers.24.self_attn.query_key_value.weight', 'model.layers.4.self_attn.query_key_value.weight', 'model.layers.5.self_attn.query_key_value.weight', 'model.layers.13.self_attn.query_key_value.weight', 'model.layers.17.self_attn.query_key_value.bias', 'model.layers.10.self_attn.query_key_value.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Using Transformers Master

>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.24s/it]
>>> model
PhiForCausalLM(
  (model): PhiModel(
    (embed_tokens): Embedding(51200, 2560)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x PhiDecoderLayer(
        (self_attn): PhiAttention(
          (q_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (k_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (v_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (dense): Linear(in_features=2560, out_features=2560, bias=True)
          (rotary_emb): PhiRotaryEmbedding()
        )
        (mlp): PhiMLP(
          (activation_fn): NewGELUActivation()
          (fc1): Linear(in_features=2560, out_features=10240, bias=True)
          (fc2): Linear(in_features=10240, out_features=2560, bias=True)
        )
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (final_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=2560, out_features=51200, bias=True)
)

With the existing defaults, you can see that the LoRA target modules only get applied to fc1 and fc2, while they get skipped (silently) for Wkqv and out_proj.

>>> from peft import LoraConfig, get_peft_model
>>> lora_config = LoraConfig()
>>> lora_config
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=None, inference_mode=False, r=8, target_modules=None, lora_alpha=8, lora_dropout=0.0, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={})
>>> get_peft_model(model, lora_config)
PeftModel(
  (base_model): LoraModel(
    (model): PhiForCausalLM(
      (model): PhiModel(
        (embed_tokens): Embedding(51200, 2560)
        (embed_dropout): Dropout(p=0.0, inplace=False)
        (layers): ModuleList(
          (0-31): 32 x PhiDecoderLayer(
            (self_attn): PhiAttention(
              (q_proj): Linear(in_features=2560, out_features=2560, bias=True)
              (k_proj): Linear(in_features=2560, out_features=2560, bias=True)
              (v_proj): Linear(in_features=2560, out_features=2560, bias=True)
              (dense): Linear(in_features=2560, out_features=2560, bias=True)
              (rotary_emb): PhiRotaryEmbedding()
            )
            (mlp): PhiMLP(
              (activation_fn): NewGELUActivation()
              (fc1): lora.Linear(
                (base_layer): Linear(in_features=2560, out_features=10240, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2560, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=10240, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (fc2): lora.Linear(
                (base_layer): Linear(in_features=10240, out_features=2560, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=10240, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2560, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
            (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
            (resid_dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (final_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=2560, out_features=51200, bias=True)
    )
  )
)
>>

However, with the proposed changes, this is what the output looks like:

>>> lora_config.target_modules = ["q_proj", "v_proj", "fc1", "fc2"]
>>> get_peft_model(model, lora_config)
PeftModel(
  (base_model): LoraModel(
    (model): PhiForCausalLM(
      (model): PhiModel(
        (embed_tokens): Embedding(51200, 2560)
        (embed_dropout): Dropout(p=0.0, inplace=False)
        (layers): ModuleList(
          (0-31): 32 x PhiDecoderLayer(
            (self_attn): PhiAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2560, out_features=2560, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2560, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2560, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): Linear(in_features=2560, out_features=2560, bias=True)
              (v_proj): lora.Linear(
                (base_layer): Linear(in_features=2560, out_features=2560, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2560, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2560, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear(in_features=2560, out_features=2560, bias=True)
              (rotary_emb): PhiRotaryEmbedding()
            )
            (mlp): PhiMLP(
              (activation_fn): NewGELUActivation()
              (fc1): lora.Linear(
                (base_layer): Linear(in_features=2560, out_features=10240, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2560, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=10240, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (fc2): lora.Linear(
                (base_layer): Linear(in_features=10240, out_features=2560, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=10240, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2560, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
            (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
            (resid_dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (final_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=2560, out_features=51200, bias=True)
    )
  )
)

younesbelkada

Makes sense thanks!

HuggingFaceDocBuilderDev · 2024-01-24T10:12:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Fix LoRA module mapping for Phi models

b7ac3fe

arnavgarg1 mentioned this pull request Jan 23, 2024

Remove target_module LoRA mapping for Phi-2 model ludwig-ai/ludwig#3910

Closed

younesbelkada approved these changes Jan 24, 2024

View reviewed changes

younesbelkada merged commit 1c1c7fd into huggingface:main Jan 24, 2024

BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Mar 14, 2024

Fix LoRA module mapping for Phi models (huggingface#1375)

582ddf1

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025

Fix LoRA module mapping for Phi models (huggingface#1375)

8044e58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix LoRA module mapping for Phi models #1375

Fix LoRA module mapping for Phi models #1375

Uh oh!

arnavgarg1 commented Jan 19, 2024 •

edited

Loading

Uh oh!

younesbelkada left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix LoRA module mapping for Phi models #1375

Fix LoRA module mapping for Phi models #1375

Uh oh!

Conversation

arnavgarg1 commented Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Using Transformers 4.36.2

Using Transformers Master

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arnavgarg1 commented Jan 19, 2024 •

edited

Loading