7 papers across 3 sessions
This paper introduce the LLM-PySC2 environment, an LLM decision-making environment with complete pysc2 action space and multi-agent system.
Transformer-based language models learn low-dimensional task manifolds across layers, with similar patterns/trends in intrinsic dimensions revealing similar compression strategies despite varying architectures/sizes.
A generative model imputes missing outcomes to drive Thompson Sampling decisions, yielding a flexible algorithm with regret tied to offline prediction quality.
We present the first long-term rehearsal learning approach, which demonstrates favorable properties such as variance reduction and optimality.