PhD student, Institute of automation, Chinese academy of science, Chinese Academy of Sciences
1 paper at NeurIPS 2025
An efficient PbRL method that mitigates overfitting and overestimation via dual regularization, enhancing feedback efficiency in both online and offline settings