4 papers across 2 sessions
Creating safe and reward maximization policies from offline data via min-max optimization formulation and solving it using no-regret algorithms