Publications

You can also find my articles on my Google Scholar profile.

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Xiyang Wu, Zongxia Li, Jihui Jin, Guangyao Shi, Gouthaman KV, Vishnu Raj, Nilotpal Sinha, Jingxi Chen, Fan Du, Dinesh Manocha

CVPR 2026

[paper]

VisPlay: Self-Evolving Vision-Language Models from Images

Yicheng He*, Chengsong Huang*, Zongxia Li*, Jiaxin Huang, Yonghui Yang

CVPR 2026

[paper] [webpage]

First Frame Is the Place to Go for Video Content Customization

Jingxi Chen*, Zongxia Li*, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermüller, Brandon Y. Feng, Yiannis Aloimonos

CVPR 2026

[paper] [webpage]

An Agentic Harness for Skill-Evolving Image Generation Workflows

Zongxia Li*, Dawei Liu*, Jingxi Chen, Xiyang Wu, Lichao Sun, et al.

Preprint

[paper] [code]

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

Preprint

[paper] [webpage]

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu

ICLR 2026

[paper] [code]

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Zongxia Li*, Wenhao Yu*, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Chen, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu

ICLR 2026

[paper] [code]

Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Dawei Liu*, Zongxia Li*, Hongyang Du, Xiyang Wu, Lichao Sun

Preprint

[paper] [code]

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Zongxia Li*, Hongyang Du*, Chengsong Huang*, Xiyang Wu, Lantao Yu, Yicheng He, Jing Xie, et al.

Preprint

[paper] [code]

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

Zongxia Li*, Xiyang Wu*, Yubin Qin, Hongyang Du, Guangyao Shi, Dinesh Manocha, Tianyi Zhou, Jordan Lee Boyd-Graber

NeurIPS 2025

[paper] [webpage] [code]

Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs

Zongxia Li, Lorena Calvo-Bartolomé, Alexander Hoyle, Paiheng Xu, Daniel Stephens, Alden Dima, Juan Francisco Fung, Jordan Lee Boyd-Graber

ACL 2025

[paper]

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation

Zongxia Li, Yapei Chang, Yuhang Zhou, Xiyang Wu, Zichao Liang, Yoo Yeon Sung, Jordan Lee Boyd-Graber

Preprint

[paper] [code]

A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges

Zongxia Li*, Xiyang Wu*, Hongyang Du, Fuxiao Liu, Huy Nghiem, Guangyao Shi

CVPR Workshop 2025 (Oral)

[paper] [code]

PEDANTS: Cheap but Effective and Interpretable Answer Equivalence

Zongxia Li, Ishani Mondal, et al.

EMNLP 2024

SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement

Ishani Mondal, Zongxia Li, et al.

EMNLP Findings 2024

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Tianrui Guan, Fuxiao Liu, Xiyang Wu, Zongxia Li, et al.

CVPR 2024

Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Zongxia Li, et al.

EACL 2024