1 paper across 1 session
Creating safe and reward maximization policies from offline data via min-max optimization formulation and solving it using no-regret algorithms