Researcher, Research, Microsoft
4 papers at NeurIPS 2025
We introduce InForage, an RL framework that enables LLMs to perform adaptive, multi-step retrieval by rewarding informative intermediate steps.
HawkBench is a human-labeled, multi-domain benchmark with 1,600 samples for evaluating RAG systems on diverse queries, revealing limits in generalizability and the need for adaptive strategies.