PhD student, University of Warsaw
2 papers at NeurIPS 2025
We find that online multi-task RL with high-capacity value models leads to SOTA sample efficiency and performance