1 paper across 1 session
An efficient PbRL method that mitigates overfitting and overestimation via dual regularization, enhancing feedback efficiency in both online and offline settings