SWE-smith
Scaling Data for Software Engineering Agents
April 30, 2025
Creating training data for software engineering agents is difficult. Until now.
Introducing SWE-smith: Generate 100s to 1000s of task instances for any GitHub repository.
We've generated 50k+ task instances for 128 popular GitHub repositories, then trained our own LM for SWE-agent.
The result? SWE-agent-LM-32B achieve 40% pass@1 on SWE-bench Verified.
Now, we've open-sourced everything, and we're excited to see what you build with it!
Check out the tutorial below to generate 100 task instances for any GitHub repository in 10 minutes.
Click here for an extended discussion.