Principal Researcher, Tencent Robotics X
2 papers at NeurIPS 2025
An efficient PbRL method that mitigates overfitting and overestimation via dual regularization, enhancing feedback efficiency in both online and offline settings