14 papers across 3 sessions
We introduce a new benchmark for long-term spatial-temporal memory in 3D embodied agent. We propose a novel model with memory fusion technique for enhanced memory capabilities.
To resolve long video understanding task, we propose Deep Video Discovery agent to iterative reasoning and gather information from video content via an agentic search and tool use strategy..
We introduce GUI Exploration Lab, a flexible simulator for GUI agent navigation. Experiments show a staged SFT + RL approach (especially multi-turn RL) significantly boosts navigation and exploration capabilities.
MedChain, a dataset of 12,163 clinical cases that mimics real-world medical practice through personalization, interactivity, and sequentiality. MedChain-Agent, an AI system with case-based learning and feedback mechanisms.
Factorio Learning Environment is an evaluation for frontier models that offers exponentially scaling challenges.
Quantifying the Potential of Control Algorithms through LLM Agents
We present the first agent system for super-resolution that is capable of upscaling any image of arbitrary degradation to high-quality 4K resolution.
We propose a benchmark to evaluate the large language models' instruction following ability in agentic scenarios.