Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#313
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Abstract
Exploration is crucial in Reinforcement Learning (RL) as it enables the agent to understand the environment for better decision-making. Existing exploration methods fall into two paradigms: active exploration, which injects stochasticity into the policy but struggles in high-dimensional environments, and passive exploration, which manages the replay buffer to prioritize under-explored regions but lacks sample diversity.
To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences. MoGE consists of two components:
- a diffusion generator for critical states under the guidance of entropy and TD error, and
- a one-step imagination world model for constructing critical transitions for agent learning.
Our method is simple to implement and seamlessly integrates with mainstream off-policy RL algorithms without structural modifications. Experiments on OpenAI Gym and DeepMind Control Suite demonstrate that MoGE, as an exploration augmentation, significantly enhances efficiency and performance in complex tasks.