PhD student, , Michigan State University
1 paper at NeurIPS 2025
Current AI benchmarks suffer from systematic flaws like data leakage and selective reporting. We propose PeerBench, a community-run eval platform with secret and live tests and reputation-weighted scoring to restore trust in AI performance claims.