Wenya Wei

Researcher, Tencent

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning

#409 · Yiwen Zhu, Jinyi Liu, Pengjie Gu, Yifu Yuan, Zhenxing Ge, Wenya Wei, Zhou Fang, Yujing Hu, Bo An

To enhance the reliability of the reward model for current policy improvement, we have developed the Proximal Policy Exploration (PPE) algorithm to increase the coverage of the preference buffer in areas close to the near-policy distribution.