1 paper across 1 session
We propose BOOM, a model-based RL that uses a soft value-weighted likelihood-free alignment loss to bootstrap the policy from non-parametric planner with world model, achieving state-of-the-art performance.