1 paper across 1 session
We prove an ensemble of policies distilled on a diverse dataset improves generalisation in reinforcement learning.