1 paper across 1 session
Staggered resets fix harmful nonstationarity in massively parallel RL's short synchronous rollouts by varying environment start times, improving state coverage and boosting sample efficiency, performance, and scalability.