Thank you for your valuable work. I am very interested in your project.
In your paper, you mentioned that you released all your codes. However, I was unable to locate the code for training and evaluating the Fin-o1 model.
Would it be possible for you to share the code? If it is not feasible to share the full code, could you please provide your implementation of the GRPO RL training part?
Thank you very much for your consideration.