2 papers across 2 sessions
We find that online multi-task RL with high-capacity value models leads to SOTA sample efficiency and performance