Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement LearningJul 1, 2024·Yiwen ZhuJinyi Liu,Yifu Yuan,Wenya Wei,Zhenxing Ge,Zhou Fang,Yujing Hu,Bo An,Others· 0 min read PDF Cite Source DocumentTypeConference paperPublicationNeurIPS 2024 Workshop on Behavioral Machine LearningLast updated on Jul 1, 2024DRL PbRL AuthorsJinyi LiuPh.D. Candidate ← Cellagent: An llm-driven multi-agent framework for automated single-cell data analysis Aug 1, 2024A trajectory perspective on the role of data sampling techniques in offline reinforcement learning May 1, 2024 →