PhD student, Columbia University
1 paper at NeurIPS 2025
We imrove sample complexity of single time scale actor critic to $O(\epsilon^{-3})$ from $O(\epsilon^{-4})$ for obtaining $\epsilon$-close global optimal policy.