LLM-as-Judge

2 papers across 2 sessions

Poster Session 5

1 paper

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Multi-Agent Debate for LLM Judges with Adaptive Stability Detection

#3412 · Tianyu Hu, Zhen Tan, Song Wang, Huaizhi Qu, Tianlong Chen

This paper utilizes multi-agent debate process for llm-as-judge, and employs an adaptive stopping mechanism.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

#1110 · Alex Chouldechova, A. Feder Cooper, Solon Barocas, Abhinav Palia, Dan Vann, Hanna Wallach

We argue that conclusions drawn about relative system safety or attack method efficacy via AI red teaming are often not supported by evidence provided by attack success rate (ASR) comparisons.