5 papers across 3 sessions
We imrove sample complexity of single time scale actor critic to $O(\epsilon^{-3})$ from $O(\epsilon^{-4})$ for obtaining $\epsilon$-close global optimal policy.
A general recipe for constructing tuning-free, asymptotically exact variational flows from general involutive MCMC kernels.
Convergence analysis and experiments of a new label model.