Full Professor, Department of Computer Science, Tsinghua University
1 paper at NeurIPS 2025
We propose a benchmark to evaluate the large language models' instruction following ability in agentic scenarios.