logo
today local_bar
Poster Session 1 West
Wednesday, December 11, 2024 11:00 AM → 2:00 PM
Poster #6307

Worst-Case Offline Reinforcement Learning with Arbitrary Data Support

Kohei Miyaguchi
Poster

Abstract

We propose a method of offline reinforcement learning (RL) featuring the performance guarantee without any assumptions on the data support. Under such conditions, estimating or optimizing the conventional performance metric is generally infeasible due to the distributional discrepancy between data and target policy distributions. To address this issue, we employ a worst-case policy value as a new metric and constructively show that the sample complexity bound of $O(\epsilon^{−2})$ is attainable without any data-support conditions, where $\epsilon>0$ is the policy suboptimality in the new metric. Moreover, as the new metric generalizes the conventional one, the algorithm can address standard offline RL tasks without modification. In this context, our sample complexity bound can be seen as a strict improvement on the previous bounds under the single-policy concentrability and the single-policy realizability.