Poster Session 6 · Friday, December 5, 2025 4:30 PM → 7:30 PM
#5109
Optimal Regret Bounds via Low-Rank Structured Variation in Non-Stationary Reinforcement Learning
Abstract
We study reinforcement learning in non-stationary communicating MDPs whosetransition drift admits a low-rank plus sparse structure. We proposeSVUCRL (Structured Variation UCRL) and prove the dynamic-regret bound where is the number of states, the number of actions, the horizon, the MDP diameter, / the total reward/transition variationbudgets, and the rank of the structured drift.
The first term is the statistical price of learning in stationary problems; the second is the non-stationarity price, which scales with rather than when drift is low-rank. This matches the rate (uptologs) and improves on prior -type guarantees.
SVUCRL combines:
- online low-rank tracking with explicit Frobenius guarantees,
- incremental RPCA to separate structured drift from sparse shocks,
- adaptive confidence widening via a bias-corrected local-variationestimator, and
- factor forecasting with an optimal shrinkage center.