1 paper across 1 session
We provide a statistically rigorous guidelines for training interactive, multi-step LLM agents, exploring optimal compute allocation, generalization, and hyperparameter settings.