Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning

Sep 26, 2025·
Yiwen Zhu
Jinyi Liu
Jinyi Liu
,
Pengjie Gu
,
Yifu Yuan
,
Zhenxing Ge
,
Wenya Wei
,
Zhou Fang
,
Yujing Hu
,
Bo An
· 0 min read
Type
Publication
NeurIPS 2025