Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang, Xiangteng Zhang, Yinuo Wang, Guojian Zhan, Wenxuan Wang, Haoyu Gao, Jingliang Duan, Shengbo Eben Li

Tsinghua University· University of Science and Technology Beijing

Model-based reinforcement learning Model-based Exploration Generative model World model

Abstract

Exploration is crucial in Reinforcement Learning (RL) as it enables the agent to understand the environment for better decision-making. Existing exploration methods fall into two paradigms: active exploration, which injects stochasticity into the policy but struggles in high-dimensional environments, and passive exploration, which manages the replay buffer to prioritize under-explored regions but lacks sample diversity.

To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences. MoGE consists of two components:

a diffusion generator for critical states under the guidance of entropy and TD error, and
a one-step imagination world model for constructing critical transitions for agent learning.

Our method is simple to implement and seamlessly integrates with mainstream off-policy RL algorithms without structural modifications. Experiments on OpenAI Gym and DeepMind Control Suite demonstrate that MoGE, as an exploration augmentation, significantly enhances efficiency and performance in complex tasks.