4 papers across 2 sessions
While most RL methods use shallow MLPs (~2–5 layers), we show that scaling up to 1000-layers for contrastive RL (CRL) can significantly boost performance, ranging from doubling performance to 50x on a diverse suite of robotic tasks.
C-MCTD enables diffusion planners to generate plans 10× longer than training examples by systematically stitching together shorter plans through tree search.
We propose a novel value function learning scheme for hierarchical policy in offline GCRL
Fast Monte Carlo Tree Diffusion (Fast-MCTD) achieves up to 100× speedup over MCTD through parallel rollouts and sparse trajectory planning, maintaining strong performance in complex long-horizon tasks while being computationally efficient.