2 papers across 2 sessions
This work presents the first finite-time analysis of average-reward Q-learning with asynchronous updates based on a single trajectory of Markovian samples.