Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning
Sep 26, 2025·,
,,,,,,,·
0 min read
Yiwen Zhu
*Jinyi Liu
*Pengjie Gu
Yifu Yuan
Zhenxing Ge
Wenya Wei
Zhou Fang
Yujing Hu
Bo An
Overview
A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.
Venue. NeurIPS 2025