Zongxia Li

Zongxia Li

Department of Computer Science

University of Maryland

Hi, I’m Zongxia

I am a third-year Ph.D. candidate at the Department of Computer Science at University of Maryland, College Park. I am fortunate to be advised by Jordan Boyd-Graber. I got a B.S. degree in Computer Science and Mathematics from University of Maryland. My current research lies in Human-Centeric NLP, Multimodal Models and Evaluation.


Current Research Focus

My research aims to develop AI systems and better evaluations that align closely with human needs. In the text-only domain, I aim to develop interactive systems that help humans explore and understand abstract concepts given large amount of data and improve the robustness of current automaticevaluation metrics. In the multimodal domain, I aim to analyze and evaluate multimodal models including question answering, hallucination, video generation and reasoning.

  1. Human-Centered AI: Creating interactive systems and evaluation frameworks to assess AI reliability.

  2. Evaluation: Improving trustworthiness and robustness of current evaluation metrics.

  3. Multimodality: Analyze and evaluate multimodal models including question answering, hallucination, video generation and reasoning.


Research Vision

The quick advancements in LLMs and LVLMs models and applications influence the relationship between humans and AI, and how humans use AI. I particularly value how AI can serve humans, not replace humans through interactive systems and better evaluation frameworks.

Papers

VLM Demo
(2025). Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs. arxiv.

PDF Data

VLM Demo
(2025). Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey. arxiv.

PDF Github

PEDANTS Demo
(2024). PEDANTS: Cheap but Effective and Interpretable Answer Equivalence. Empirical Methods in Natural Language Processing. Findings.

PDF Code

(2024). SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement. Empirical Methods in Natural Language Processing. Findings.

PDF

(2024). Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis . European Chapter of the Association for Computational Linguistics. Main.

PDF Dataset

(2024). HallusionBench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models . Conference on Computer Vision and Pattern Recognition.

PDF Dataset

, Colin Wang, Zongxia Li, Rachel Rudinger (2024). Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?. Association for Computational Linguistics.

PDF

, Jieyu Zhao, Rachel Rudinger (2023). SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models. European Chapter of the Association for Computational Linguistics. Main.

PDF