2 papers across 2 sessions
In this paper, we propose Value-Guided Decision Transformer (VDT), which employs progressively optimized value functions to guide the Decision Transformer (DT) in making optimal decisions.
We reveal how return-coverage affects the performance of conditional sequence modeling policies in offline RL and propose an algorithm achieving new state-of-the-art results on D4RL.