measurement theory

2 papers across 2 sessions

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

#1313 · Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung, Golnoosh Farnadi

In this position paper we investigate the validity and reliability of LLMs as judges and highlight challenges inherent to their use and existing practices in NLG evaluation.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

#1110 · Alex Chouldechova, A. Feder Cooper, Solon Barocas, Abhinav Palia, Dan Vann, Hanna Wallach

We argue that conclusions drawn about relative system safety or attack method efficacy via AI red teaming are often not supported by evidence provided by attack success rate (ASR) comparisons.