Publications

You can also find my articles on my Google Scholar profile.
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Xiyang Wu, Zongxia Li, Jihui Jin, Guangyao Shi, Gouthaman KV, Vishnu Raj, Nilotpal Sinha, Jingxi Chen, Fan Du, Dinesh Manocha
CVPR 2026
VisPlay: Self-Evolving Vision-Language Models from Images
Yicheng He*, Chengsong Huang*, Zongxia Li*, Jiaxin Huang, Yonghui Yang
CVPR 2026
First Frame Is the Place to Go for Video Content Customization
Jingxi Chen*, Zongxia Li*, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermüller, Brandon Y. Feng, Yiannis Aloimonos
CVPR 2026
An Agentic Harness for Skill-Evolving Image Generation Workflows
Zongxia Li*, Dawei Liu*, Jingxi Chen, Xiyang Wu, Lichao Sun, et al.
Preprint
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha
Preprint
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu
ICLR 2026
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Zongxia Li*, Wenhao Yu*, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Chen, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu
ICLR 2026
Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
Dawei Liu*, Zongxia Li*, Hongyang Du, Xiyang Wu, Lichao Sun
Preprint
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
Zongxia Li*, Hongyang Du*, Chengsong Huang*, Xiyang Wu, Lantao Yu, Yicheng He, Jing Xie, et al.
Preprint
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
Zongxia Li*, Xiyang Wu*, Yubin Qin, Hongyang Du, Guangyao Shi, Dinesh Manocha, Tianyi Zhou, Jordan Lee Boyd-Graber
NeurIPS 2025
Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs
Zongxia Li, Lorena Calvo-Bartolomé, Alexander Hoyle, Paiheng Xu, Daniel Stephens, Alden Dima, Juan Francisco Fung, Jordan Lee Boyd-Graber
ACL 2025
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
Zongxia Li, Yapei Chang, Yuhang Zhou, Xiyang Wu, Zichao Liang, Yoo Yeon Sung, Jordan Lee Boyd-Graber
Preprint
A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges
Zongxia Li*, Xiyang Wu*, Hongyang Du, Fuxiao Liu, Huy Nghiem, Guangyao Shi
CVPR Workshop 2025 (Oral)
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence
Zongxia Li, Ishani Mondal, et al.
EMNLP 2024