This is just my Observation: loading weights from safetensors (or maybe getting the weights from state_dict()) results in two weights even when the model uses tied_embedding, model.embed_tokens and lm_head, If we use these two weights in the forward call, we'll chagne both weights instead of just one, no?