1 paper across 1 session
We provide a scalable bandit architecture for prompt tuning of decision transformers for increased downstream performance.