logo
today local_bar
Poster Session 2 · Wednesday, December 3, 2025 4:30 PM → 7:30 PM
#3013

Improved Confidence Regions and Optimal Algorithms for Online and Offline Linear MNL Bandits

NeurIPS OpenReview

Abstract

In this work, we consider the data-driven assortment optimization problem under the linear multinomial logit(MNL) choice model. We first establish a improved confidence region for the maximum likelihood estimator (MLE) of the -dimensional linear MNL likelihood function that removes the explicit dependency on a problem-dependent parameter in previous result (Oh and Iyengar, 2021), which scales exponentially with the radius of the parameter set.
Building on the confidence region result, we investigate the data-driven assortment optimization problem in both offline and online settings. In the offline setting, the previously best-known result scales as , where the number of times that optimal assortment is observed (Dong et al., 2023). We propose a new pessimistic-based algorithm that, under a burn-in condition, removes the dependency on in the leading order bound and works under a more relaxed coverage condition, without requiring the exact observation of .
In the online setting, we propose the first algorithm to achieve regret without a multiplicative dependency on . In both settings, our results nearly achieve the corresponding lower bound when reduced to the canonical -item MNL problem, demonstrating their optimality.