Postdoc, University of Michigan - Ann Arbor
1 paper at NeurIPS 2025
We propose a Stochastic-Programming-based (SP-based) policy for finite-horizon RMABs that achieves an optimality gap of $\tilde{\mathcal{O}}(1/N)$, addressing the limitations of Linear-Programming-based (LP-based) policies in degenerate settings.