3 papers across 3 sessions
GSO: SWE Agents Struggle at Reasoning and Engineering for Software Optimization
We automatically collect software engineering tasks from github at scale, build a decontaminated SWE agent benchmark out of them and discover contamination in some well-known LLMs.
A model-free speculative decoding method that accelerates agentic AI workloads using suffix trees. Achieves 5.3x speedup on multi-agent tasks.