Full Professor, Eberhard-Karls-Universität Tübingen
1 paper at NeurIPS 2025
Current AI benchmarks suffer from systematic flaws like data leakage and selective reporting. We propose PeerBench, a community-run eval platform with secret and live tests and reputation-weighted scoring to restore trust in AI performance claims.