Assistant Professor, Indian Institute of Technology, Kanpur
2 papers at NeurIPS 2025
We present the first finite-sample analysis for policy evaluation in robust average-reward reinforcement learning using semi-norm contractions.
Global Convergence with Order-Optimal rate for Average Reward Constrained MDPs with Primal-Dual Natural Actor Critic Algorithm