In this project, I finetune Qwen3-1.7B model using the Group Relative Policy Optimization (GRPO) algorithm. I also use LORA for finetuning. The dataset was taken from dataset
Both the installation and training are done in one script.
bash train.shAfter training, we merge the weights of the LORA adapter with the base model.
python3 post_train_merge.pyNow, we can evaluate the model on the test set.
bash eval.sh