Researcher, Allen Institute for Artificial Intelligence
1 paper at NeurIPS 2025
Measuring and improving the signal-to-noise ratio in language model benchmarks.