Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning

Sep 26, 2025·
Yiwen Zhu
*
,
Jinyi Liu
Jinyi Liu
*
,
Pengjie Gu
,
Yifu Yuan
,
Zhenxing Ge
,
Wenya Wei
,
Zhou Fang
,
Yujing Hu
,
Bo An
· 0 min read
Type
Publication
NeurIPS 2025

Overview

A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.

Venue. NeurIPS 2025