Improved Confidence Regions and Optimal Algorithms for Online and Offline Linear MNL Bandits

Yuxuan Han, Jose Blanchet, Zhengyuan Zhou

MNL Bandits Assortment Optimization SupCB

Abstract

In this work, we consider the data-driven assortment optimization problem under the linear multinomial logit(MNL) choice model. We first establish a improved confidence region for the maximum likelihood estimator (MLE) of the

d

-dimensional linear MNL likelihood function that removes the explicit dependency on a problem-dependent parameter

κ^{- 1}

in previous result (Oh and Iyengar, 2021), which scales exponentially with the radius of the parameter set.

Building on the confidence region result, we investigate the data-driven assortment optimization problem in both offline and online settings. In the offline setting, the previously best-known result scales as

\tilde{O} (\frac{d}{κ n _{S^{⋆}}} http://www.w3.org/2000/svg" width="400em" height="1.88em" viewBox="0 0 400000 1944" preserveAspectRatio="xMinYMin slice">)

, where

n_{S^{⋆}}

the number of times that optimal assortment

S^{⋆}

is observed (Dong et al., 2023). We propose a new pessimistic-based algorithm that, under a burn-in condition, removes the dependency on

d, κ^{- 1}

in the leading order bound and works under a more relaxed coverage condition, without requiring the exact observation of

S^{⋆}

In the online setting, we propose the first algorithm to achieve

\tilde{O} (d T http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)

regret without a multiplicative dependency on

κ^{- 1}

. In both settings, our results nearly achieve the corresponding lower bound when reduced to the canonical

N

-item MNL problem, demonstrating their optimality.