logo
today local_bar
Poster Session 6 · Friday, December 5, 2025 4:30 PM → 7:30 PM
#5109

Optimal Regret Bounds via Low-Rank Structured Variation in Non-Stationary Reinforcement Learning

NeurIPS Poster OpenReview

Abstract

We study reinforcement learning in non-stationary communicating MDPs whosetransition drift admits a low-rank plus sparse structure. We proposeSVUCRL (Structured Variation UCRL) and prove the dynamic-regret bound where is the number of states, the number of actions, the horizon, the MDP diameter, / the total reward/transition variationbudgets, and the rank of the structured drift.
The first term is the statistical price of learning in stationary problems; the second is the non-stationarity price, which scales with rather than when drift is low-rank. This matches the rate (uptologs) and improves on prior -type guarantees.
SVUCRL combines:
  1. online low-rank tracking with explicit Frobenius guarantees,
  2. incremental RPCA to separate structured drift from sparse shocks,
  3. adaptive confidence widening via a bias-corrected local-variationestimator, and
  4. factor forecasting with an optimal shrinkage center.
Poster