1 paper across 1 session
This work presents the first finite-time analysis of average-reward Q-learning with asynchronous updates based on a single trajectory of Markovian samples.