1 paper across 1 session
We find that online multi-task RL with high-capacity value models leads to SOTA sample efficiency and performance