Self-Rewarding Vision-Language Model via Reasoning Decomposition
Published in ICLR 2026, 2026
Vision-SR1 trains VLMs to self-reward by splitting reasoning into visual perception and language reasoning, enabling improvement without human labels or external rewards.
| arXiv | Code |
Recommended citation: Zongxia Li*, Wenhao Yu*, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Chen, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu. (2026). "Self-Rewarding Vision-Language Model via Reasoning Decomposition." ICLR.
Download Paper
