Reinforcement learning for formal-math reasoning with Lean 4. We train a conjecturer to propose useful theorem statements for a prover, then use those generated samples to improve the prover via SFT and RL.
Formal proofs are machine-checkable: the Lean compiler verifies every step, removing ambiguity compared to informal math. This also enables synthetic data generation—you don’t need to hand-annotate ground truth when the compiler can validate it.
- Goal. Train a conjecturer that proposes conjectures neither too easy nor too hard for the prover—i.e., maximally useful for training.
- Approach. We use PPO with a reward shaped by pass rate, novelty/relatedness, and batch diversity.
We define a base term:
The
- Complexity : encourages statements that are non-trivial but not too difficult for the LLM.
- Novelty avoids near-duplicate statements.
- Relatedness : ensures a meaningful connection between the statement and its conjecturer.
- Synthetic correctness : guarantees that statements are syntactically valid and pass the Lean compiler with by sorry.
- Conjecturer RL: Train with PPO using the reward above.
- Data collection: Generate ~40k conjectures/proofs; deduplicate.
- Prover SFT: Start from DeepSeek-Prover-v1.5-SFT, run 1 epoch SFT on the collected data.
- Prover RL: Further fine-tune on LeanWorkbook using RL.
| Model | Pass@1 | Pass@32 |
|---|---|---|
| DeepSeek-Prover-v1.5-SFT | 30.74 | 47.95 |
| + SFT on conjecturer data | 32.79 | 49.18 |
| + SFT + RL (LeanWorkbook) | 39.75 | 49.18 |
Takeaway: +9 absolute improvement in Pass@1 over the base.
- RL framework: VERL
- SFT framework: Levanter
- Lean compiler service: kimina-server
- Base prover: DeepSeek-Prover-v1.5-SFT
- Conjecturer-generated dataset: ~40k samples (after deduplication) — Slim205/Lean_conjecturer_data_v01
- Base model: DeepSeek-Prover-v1.5-SFT
- Artifacts:
- Conjecturer: Slim205/Lean-conjecturer
- Prover: Slim205/Lean_prover_v1
Built on and inspired by: GodelLM, kimina-server, VERL, STP, Levanter, Lean Dojo ,and DeepSeek-Prover-v1.5. Thanks to the authors and maintainers of these projects.
For more details, please see the slides: [https://github.com/Slim205/RL-Lean/blob/master/Harvard%20Work.pdf]
