About me

I am a fourth-year PhD candidate in Computer Science at the University of Maryland, College Park, in the CLIP Lab, advised by Jordan Boyd-Graber and co-advised by Lichao Sun. My research develops methods that make vision-language models and agents more capable through post-training, more autonomous through self-evolution, and more useful through tighter collaboration with humans.

Research Focus

  1. Model Post-Training — Reinforcement learning and reward design for vision-language models, including self-rewarding via reasoning decomposition, semantically-aware open-ended rewards, and hallucination-targeted alignment for image and video understanding.

  2. Self-Evolving Agents — Autonomous agents that bootstrap their own capabilities from minimal or zero human-curated data, spanning self-evolving multimodal reasoning, co-evolving decision-and-skill agents for long-horizon tasks, and exploration-guided visual reasoning.

  3. Human-AI Collaboration — How humans and AI systems collaborate to optimize real-world workflows, from LLM-assisted annotation and interactive topic exploration to robust automatic evaluation metrics that faithfully reflect human judgment.

News

May 1, 2026ComfyClaw is released! An agentic harness for skill-evolving image generation workflows.
Apr 29, 2026COS-PLAY is released! Co-evolving LLM decision and skill bank agents for long-horizon tasks.
Apr 7, 2026Graph-of-Skills is released! Dependency-aware structural retrieval for massive agent skills.
Mar 10, 2026MM-Zero is released! Self-evolving VLMs from zero data using multi-role RL training.
Feb 20, 2026FFGO, VisPlay, and MASS are accepted to CVPR 2026.
Feb 8, 2026R-Zero and Vision-SR1 are accepted to ICLR 2026.
Nov 20, 2025FFGO is released! Customize your video with our FFGO LoRA adapters.
Nov 19, 2025VisPlay is released! Learn how to evolve VLMs with just images.
Sep 20, 2025VideoHallu is accepted to NeurIPS 2025.
Aug 22, 2025I finished my internship at Tencent AI Lab, Bellevue, mentored by Wenhao Yu, working on self-evolving VLMs and LLMs.
Jul 22, 2025I received a research compute grant from Lambda Labs.
Aug 22, 2024I finished my internship at Adobe Document Intelligence Lab, focusing on improving LLM automatic evaluations for downstream training.

Selected Publications

An Agentic Harness for Skill-Evolving Image Generation Workflows
Zongxia Li*, Dawei Liu*, Jingxi Chen, Xiyang Wu, Lichao Sun, et al.
Preprint
Self-Rewarding Vision-Language Model via Reasoning Decomposition
Zongxia Li*, Wenhao Yu*, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Chen, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu
ICLR 2026
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
Zongxia Li*, Hongyang Du*, Chengsong Huang*, Xiyang Wu, Lantao Yu, Yicheng He, Jing Xie, et al.
Preprint
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
Zongxia Li*, Xiyang Wu*, Yubin Qin, Hongyang Du, Guangyao Shi, Dinesh Manocha, Tianyi Zhou, Jordan Lee Boyd-Graber
NeurIPS 2025
First Frame Is the Place to Go for Video Content Customization
Jingxi Chen*, Zongxia Li*, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermüller, Brandon Y. Feng, Yiannis Aloimonos
CVPR 2026
Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
Dawei Liu*, Zongxia Li*, Hongyang Du, Xiyang Wu, Lichao Sun
Preprint
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
Xiyang Wu, Zongxia Li, Guangyao Shi, Alexander Duffy, Tyler Marques, Matthew Lyle Olson, Tianyi Zhou, Dinesh Manocha
Preprint

See all publications »

Popular Community Projects

Dr. Claw: Your AI Research Assistant

A super AI lab with massive AI doctors as assistants — the best IDE for research powered by AI.

Large Vision-Language Models: A Survey & Benchmark Collection

A comprehensive, continuously updated collection of VLM benchmarks, RL alignment methods, and applications.