3 papers across 2 sessions
For multi-agent offline safe reinforcement learning (MOSRL), we propose the first algorithm MOSDT, and the first dataset and benchmark MOSDB.
In this paper, we propose Value-Guided Decision Transformer (VDT), which employs progressively optimized value functions to guide the Decision Transformer (DT) in making optimal decisions.
We provide a scalable bandit architecture for prompt tuning of decision transformers for increased downstream performance.