Convenience method for learning rate factor#2888
Conversation
|
Hi @alanakbik, I suppose this could be extended even further to for example Such that we could train a Transformer-Bert model with a higher LT for the CRF part |
|
Yes, it is for training the non-pretrained (i.e. randomly initialized) parts with a higher LR. Since the decoder is always randomly initialized, it is handled here. Extending this to the LSTM-CRF of the SequenceTagger would be great, but some embeddings (like CharacterEmbeddings) are randomly initialized, while others are not. So I think it's not easy to come up with a good heuristic to identify those parts. Edit: I'll merge this now for experimentation, but any ideas to improve are welcome! |
GH-2888: Experiment with alternative heuristic
This PR adds a parameter to set a factor on the learning rate of the decoder, if fine-tuning a model.
Usage: