MS student, The Hong Kong University of Science and Technology
1 paper at NeurIPS 2025
HawkBench is a human-labeled, multi-domain benchmark with 1,600 samples for evaluating RAG systems on diverse queries, revealing limits in generalizability and the need for adaptive strategies.