3 papers across 2 sessions
This paper presents TAI3, a stress testing framework that uses targeted input mutations to expose LLM agent errors that deviate from user intent