PhD student, University of California, Santa Cruz
3 papers at NeurIPS 2025
We introduce RAGuard, the first benchmark to evaluate RAG system robustness against naturally misleading evidence, revealing that even strong LLMs underperform when exposed to real-world retrieval noise.
We propose GenIR, a method using generative visual feedback to tackle the multi-round image retrieval task, and release a dataset.