3 papers across 2 sessions
A method for constructing an optimal behavior basis for the Option Keyboard, enabling zero-shot identification of optimal solutions for any linear-reward task.
Undocumented versions of Meta-World have clouded algorithmic performance. This work strives to disambiguate Meta-World results from the literature, while also providing insights into benchmark design.
This paper introduces CoPDT, a method of using one unified and adaptable DT model for multi-task (multi-budget or multi-constraint) offline safe RL.