Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

Jul 1, 2024·

Yiwen Zhu

,

Jinyi Liu

Jinyi Liu

,

Yifu Yuan

,

Wenya Wei

,

Zhenxing Ge

,

Zhou Fang

,

Yujing Hu

,

Bo An

,

Others

· 0 min read

PDF Cite Source Document

Type

Conference paper

Publication

NeurIPS 2024 Workshop on Behavioral Machine Learning

Overview

A preference-based reinforcement learning study on improving reward models with proximal policy exploration.

Venue. NeurIPS 2024 Workshop on Behavioral Machine Learning

Last updated on Apr 26, 2025

DRL PbRL Exploration

Jinyi Liu

Authors

Ph.D. Candidate Reinforcement Learning and LLM Systems

← SheetAgent: towards a generalist agent for spreadsheet reasoning and manipulation via large language models Jan 8, 2025

A trajectory perspective on the role of data sampling techniques in offline reinforcement learning May 1, 2024 →

Back to Publications

More Publications

Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model Jan 2026 CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis Jan 2026 From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Jan 2026