Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Convenience method for learning rate factor#2888

Merged
alanakbik merged 3 commits intomasterfrom
learning_rate_factor
Aug 6, 2022
Merged

Convenience method for learning rate factor#2888
alanakbik merged 3 commits intomasterfrom
learning_rate_factor

Conversation

@alanakbik
Copy link
Collaborator

This PR adds a parameter to set a factor on the learning rate of the decoder, if fine-tuning a model.

Usage:

trainer.fine_tune(f"path/to/output/folder",
                  mini_batch_size=4,
                  learning_rate=5e-5,
                  decoder_lr_factor=10,
                  )

@helpmefindaname
Copy link
Member

Hi @alanakbik,
if I understand this PR correctly, this is to set a different LR for pretrained weights and randomly initialized weights on all DefaultClassifier's?

I suppose this could be extended even further to for example SequenceTagger, by filtering by embedding e.g:

embedding_parameters = [param for name, param in self.model.named_parameters() if "embedding" in name]
model_parameters = [param for name, param in self.model.named_parameters() if "embedding" not in name]

Such that we could train a Transformer-Bert model with a higher LT for the CRF part

@alanakbik
Copy link
Collaborator Author

alanakbik commented Aug 6, 2022

Yes, it is for training the non-pretrained (i.e. randomly initialized) parts with a higher LR. Since the decoder is always randomly initialized, it is handled here. Extending this to the LSTM-CRF of the SequenceTagger would be great, but some embeddings (like CharacterEmbeddings) are randomly initialized, while others are not. So I think it's not easy to come up with a good heuristic to identify those parts.

Edit: I'll merge this now for experimentation, but any ideas to improve are welcome!

@alanakbik alanakbik merged commit a927b30 into master Aug 6, 2022
alanakbik added a commit that referenced this pull request Aug 6, 2022
@alanakbik alanakbik deleted the learning_rate_factor branch August 6, 2022 20:22
alanakbik added a commit that referenced this pull request Aug 10, 2022
GH-2888: Experiment with alternative heuristic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments