Carsten Eickhoff

Full Professor, Eberhard-Karls-Universität Tübingen

1 paper at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Position: Benchmarking is Broken - Don't Let AI be Its Own Judge

#5500 · Zerui Cheng, Stella Wohnig, Ruchika Gupta, Samiul Alam, Tassallah Abdullahi, João Alves Ribeiro, Christian Nielsen-Garcia, Saif Mir, Siran Li, Jason Orender, Seyed Ali Bahrainian, Daniel Kirste, Aaron Gokaslan, Carsten Eickhoff, Ruben Wolff

Current AI benchmarks suffer from systematic flaws like data leakage and selective reporting. We propose PeerBench, a community-run eval platform with secret and live tests and reputation-weighted scoring to restore trust in AI performance claims.