About me

I am a fourth-year PhD candidate in Computer Science at the University of Maryland, College Park, in the CLIP Lab, advised by Jordan Boyd-Graber and co-advised by Lichao Sun. My research develops methods that make vision-language models and agents more capable through post-training, more autonomous through self-evolution, and more useful through tighter collaboration with humans.

Research Focus

Self-Evolving Agents — Autonomous agents that bootstrap their own capabilities from minimal or zero human-curated data, spanning self-evolving multimodal reasoning, co-evolving decision-and-skill agents for long-horizon tasks, and exploration-guided visual reasoning.
Model Post-Training — Reinforcement learning and reward design for vision-language models, including self-rewarding via reasoning decomposition, semantically-aware open-ended rewards, and hallucination-targeted alignment for image and video understanding.
Human-AI Collaboration — How humans and AI systems collaborate to optimize real-world workflows, from LLM-assisted annotation and interactive topic exploration to robust automatic evaluation metrics that faithfully reflect human judgment.

News

Jul 14, 2026	Harness Handbook is released! Making evolving agent harnesses readable, navigable, and editable.
Jul 9, 2026	Long-Horizon-Terminal-Bench (LHTB) is released! Testing the limits of agents on long-horizon terminal tasks with dense reward-based grading.
May 1, 2026	ComfyClaw is released! An agentic harness for skill-evolving image generation workflows.
Apr 29, 2026	COS-PLAY is released! Co-evolving LLM decision and skill bank agents for long-horizon tasks.
Apr 7, 2026	Graph-of-Skills is released! Dependency-aware structural retrieval for massive agent skills.
Mar 10, 2026	MM-Zero is released! Self-evolving VLMs from zero data using multi-role RL training.
Feb 20, 2026	FFGO, VisPlay, and MASS are accepted to CVPR 2026.
Feb 8, 2026	R-Zero and Vision-SR1 are accepted to ICLR 2026.
Nov 20, 2025	FFGO is released! Customize your video with our FFGO LoRA adapters.
Nov 19, 2025	VisPlay is released! Learn how to evolve VLMs with just images.
Sep 20, 2025	VideoHallu is accepted to NeurIPS 2025.
Aug 22, 2025	I finished my internship at Tencent AI Lab, Bellevue, mentored by Wenhao Yu, working on self-evolving VLMs and LLMs.
Jul 22, 2025	I received a research compute grant from Lambda Labs.
Aug 22, 2024	I finished my internship at Adobe Document Intelligence Lab, focusing on improving LLM automatic evaluations for downstream training.

Selected Publications

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

Zongxia Li*, Zhongzhi Li*, Yucheng Shi*, Ruhan Wang, Junyao Yang, Zhichao Liu, Xiyang Wu, Anhao Li, Yue Yu, Ninghao Liu, Lichao Sun, Haotao Mi, Leowei Liang

Preprint

[paper] [webpage] [code]

Harness Handbook: Making Evolving Agent Harnesses Readable, Navigable, and Editable

Ruhan Wang, Yucheng Shi, Zongxia Li, Zhongzhi Li, Yue Yu, Junyao Yang, Kishan Panaganti, Haitao Mi, Dongruo Zhou, Leoweiliang

Preprint

[paper] [webpage] [code]

An Agentic Harness for Skill-Evolving Image Generation Workflows

Zongxia Li*, Dawei Liu*, Jingxi Chen, Xiyang Wu, Lichao Sun, et al.

Preprint

[paper] [code]

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Zongxia Li*, Wenhao Yu*, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Chen, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu

ICLR 2026

[paper] [code]

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Zongxia Li*, Hongyang Du*, Chengsong Huang*, Xiyang Wu, Lantao Yu, Yicheng He, Jing Xie, et al.

Preprint

[paper] [code]

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

Zongxia Li*, Xiyang Wu*, Yubin Qin, Hongyang Du, Guangyao Shi, Dinesh Manocha, Tianyi Zhou, Jordan Lee Boyd-Graber

NeurIPS 2025

[paper] [webpage] [code]

First Frame Is the Place to Go for Video Content Customization

Jingxi Chen*, Zongxia Li*, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermüller, Brandon Y. Feng, Yiannis Aloimonos

CVPR 2026

[paper] [webpage]

Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Dawei Liu*, Zongxia Li*, Hongyang Du, Xiyang Wu, Lichao Sun

Preprint

[paper] [code]

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha

Preprint

[paper] [webpage]

See all publications »

Popular Community Projects

Dr. Claw: Your AI Research Assistant —

A super AI lab with massive AI doctors as assistants — the best IDE for research powered by AI.

Large Vision-Language Models: A Survey & Benchmark Collection —

A comprehensive, continuously updated collection of VLM benchmarks, RL alignment methods, and applications.