R-Zero: Self-Evolving Reasoning LLM from Zero Data

Published in ICLR 2026, 2026

R-Zero trains LLMs entirely without human-curated data by pitting two copies of the base model against each other in a self-evolving curriculum.

arXiv

Code

Recommended citation: Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu. (2026). "R-Zero: Self-Evolving Reasoning LLM from Zero Data." ICLR.
Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)