Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement LearningSep 26, 2025·Yiwen ZhuJinyi Liu,Pengjie Gu,Yifu Yuan,Zhenxing Ge,Wenya Wei,Zhou Fang,Yujing Hu,Bo An· 0 min read CiteTypeConference paperPublicationNeurIPS 2025Last updated on Sep 26, 2025DRL Preference Learning AuthorsJinyi LiuPh.D. Candidate From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models Aug 1, 2025 →