Researcher, Allen Institute for Artificial Intelligence
3 papers at NeurIPS 2025
Measuring and improving the signal-to-noise ratio in language model benchmarks.
ML conferences should establish a "refutations and critiques" track