VisPlay: Self-Evolving Vision-Language Models from Images
Published in CVPR 2026, 2026
VisPlay is a self-evolving RL framework that enables VLMs to autonomously improve reasoning abilities from large amounts of unlabeled image data.
| arXiv | Project Page |
Recommended citation: Yicheng He*, Chengsong Huang*, Zongxia Li*, Jiaxin Huang, Yonghui Yang. (2026). "VisPlay: Self-Evolving Vision-Language Models from Images." CVPR.
Download Paper
