Self-Rewarding Vision-Language Model via Reasoning Decomposition

Published in ICLR 2026, 2026

Vision-SR1 trains VLMs to self-reward by splitting reasoning into visual perception and language reasoning, enabling improvement without human labels or external rewards.

arXivCode

Recommended citation: Zongxia Li*, Wenhao Yu*, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Chen, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu. (2026). "Self-Rewarding Vision-Language Model via Reasoning Decomposition." ICLR.
Download Paper