1 paper across 1 session
We develop policy gradient algorithms with global optimality and convergence guarantees for reinforcement learning with PID control policy parameterization.