R-Zero: Self-Evolving Reasoning LLM from Zero Data
Published in ICLR 2026, 2026
R-Zero trains LLMs entirely without human-curated data by pitting two copies of the base model against each other in a self-evolving curriculum.
| arXiv | Code |
Recommended citation: Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu. (2026). "R-Zero: Self-Evolving Reasoning LLM from Zero Data." ICLR.
Download Paper
