You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you tested whether the model's performance on 'correcting wrong answers to right ones' changes after SFT? Is the model just learning to be more confident in its answers?"