Tightening Regret Lower and Upper Bounds in Restless Rising Bandits

Cristiano Migali, Marco Mussi, Gianmarco Genalti, Alberto Maria Metelli

Restless Bandits Rising Bandits Regret Lower Bounds

Abstract

Restless Multi-Armed Bandits (MABs) are a general framework designed to handle real-world decision-making problems where the expected rewards evolve over time, such as in recommender systems and dynamic pricing. In this work, we investigate from a theoretical standpoint two well-known structured subclasses of restless MABs: the rising and the rising concave settings, where the expected reward of each arm evolves over time following an unknown non-decreasing and a non-decreasing concave function, respectively.

By providing a novel methodology of independent interest for general restless bandits, we establish new lower bounds on the expected cumulative regret for both settings. In the rising case, we prove a lower bound of order

Ω (T^{2/3})

, matching known upper bounds for restless bandits; whereas, in the rising concave case, we derive a lower bound of order

Ω (T^{3/5})

, proving for the first time that this setting is provably more challenging than stationary MABs.

Then, we introduce Rising Concave Budgeted Exploration (RC-BE(

α

)), a new regret minimization algorithm designed for the rising concave MABs. By devising a novel proof technique, we show that the expected cumulative regret of RC-BE(

α

) is in the order of

\overset{http://www.w3.org/2000/svg" width="100%" height="0.26em" viewBox="0 0 600 260" preserveAspectRatio="none">}{O} (T^{7/11})

These results collectively make a step towards closing the gap in rising concave MABs, positioning them between stationary and general restless bandit settings in terms of statistical complexity.