4 papers across 2 sessions
We propose a Stochastic-Programming-based (SP-based) policy for finite-horizon RMABs that achieves an optimality gap of $\tilde{\mathcal{O}}(1/N)$, addressing the limitations of Linear-Programming-based (LP-based) policies in degenerate settings.
We prove quantitative convergence estimates for single layer neural networks in the NTK regime to gaussian processes at positive training time