Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning
Jul 1, 2024·,
,,,,,,,·
0 min read
Yiwen Zhu
Jinyi Liu
Yifu Yuan
Wenya Wei
Zhenxing Ge
Zhou Fang
Yujing Hu
Bo An
Others
Overview
A preference-based reinforcement learning study on improving reward models with proximal policy exploration.
Venue. NeurIPS 2024 Workshop on Behavioral Machine Learning