2 papers across 2 sessions
We propose an algorithm for estimating the best mean reward in a multi-armed bandit with asymptotically optimal, instance-adaptive sample complexity.
A computationally efficient algorithm for identifying the exact Pareto optimal set with fixed confidence and any preference cone in a vector-valued Bandit. FraPPE is provably asymptotically optimal and numerically achieves the least sample complexity