Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

Jul 1, 2024·
Yiwen Zhu
Jinyi Liu
Jinyi Liu
,
Yifu Yuan
,
Wenya Wei
,
Zhenxing Ge
,
Zhou Fang
,
Yujing Hu
,
Bo An
,
Others
· 0 min read
Type
Publication
NeurIPS 2024 Workshop on Behavioral Machine Learning