PhD student, Purdue University
2 papers at NeurIPS 2025
This paper presents TAI3, a stress testing framework that uses targeted input mutations to expose LLM agent errors that deviate from user intent
CoRe: a high-quality, multi-lingual benchmark for evaluating LLMs’ Code Reasoning capabilities with fundamental static analysis tasks.