1 paper across 1 session
This paper presents TAI3, a stress testing framework that uses targeted input mutations to expose LLM agent errors that deviate from user intent