2 papers across 2 sessions
This paper utilizes multi-agent debate process for llm-as-judge, and employs an adaptive stopping mechanism.
We argue that conclusions drawn about relative system safety or attack method efficacy via AI red teaming are often not supported by evidence provided by attack success rate (ASR) comparisons.