MS student, University of Science and Technology of China
1 paper at NeurIPS 2025
HawkBench is a human-labeled, multi-domain benchmark with 1,600 samples for evaluating RAG systems on diverse queries, revealing limits in generalizability and the need for adaptive strategies.