Fix LoRA module mapping for Phi models #1375
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Microsoft updated the
modeling_phi.pyfile 1 day ago: https://huggingface.co/microsoft/phi-2/blob/main/modeling_phi.py. They did this for Phi-1, Phi-1_5 and Phi-2.The net effect of this change is that they no longer have the same model architecture from the originally released model -they use GQA now: huggingface/transformers#28163. This means that
Wqkvandout_projare no longer valid target modules. Because of this, the current default LoRA target modules for Phi are incompatible with the latest versions of these models and the error is silent, leading to poor LoRA fine-tuning performance.This PR updates the default target modules to match the new model architecture.
Using Transformers 4.36.2
Using Transformers Master
With the existing defaults, you can see that the LoRA target modules only get applied to
fc1andfc2, while they get skipped (silently) forWkqvandout_proj.However, with the proposed changes, this is what the output looks like: