You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Has anyone tried running Qwen3 (both 8B and 235B-A22B) evaluation on LiveCodeBench? I’m trying to reproduce the reported results, but it seems like Qwen3 tends to overthink indefinitely and never outputs the closing </think> token.
Do we need to manually insert </think> after a certain thinking budget, or is there a recommended workaround for this behavior?