1 paper across 1 session
We propose the first provably efficient and episode-wise safe RL algorithm for linear constrained MDPs.