1 paper across 1 session
A dataset of multi-agent system traces, and a systematic analysis of failures in multi-agent LLM systems, featuring a structured taxonomy and an automated evaluation pipeline.