Jinyi Liu (刘金毅)
  • Bio
  • Papers
  • News
  • Experience
  • Projects
  • Recent & Upcoming Talks
    • Example Talk
  • Publications
    • Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning
    • From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
    • Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
    • Unlocking Multi-Agent Debate Potential: Enhancing Effective Scaling through Role Allocation Strategies
    • DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering
    • War of Thoughts: Competition Stimulates Stronger Reasoning in Large Language Models
    • From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
    • SheetAgent: towards a generalist agent for spreadsheet reasoning and manipulation via large language models
    • Cellagent: An llm-driven multi-agent framework for automated single-cell data analysis
    • Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning
    • A trajectory perspective on the role of data sampling techniques in offline reinforcement learning
    • Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments
    • Enhancing robotic manipulation with AI feedback from multimodal large language models
    • ENOTO: improving offline-to-online reinforcement learning with Q-ensembles
    • Kisa: A unified keyframe identifier and skill annotator for long-horizon robotics demonstrations
    • Peria: Perceive, reason, imagine, act via holistic language and vision planning for manipulation
    • Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
    • vMFER: Von Mises-Fisher experience resampling based on uncertainty of gradient directions for policy improvement
    • EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model
    • Exploration in deep reinforcement learning: From single-agent to multiagent domain
    • OSCAR: OOD State-Conservative Offline Reinforcement Learning for Sequential Decision Making
    • Figcps: Effective failure-inducing input generation for cyber-physical systems with deep reinforcement learning
  • Projects
  • Projects
    • CellAgent
    • Uni-RLHF
  • Blog
    • 🎉 Easily create your own simple yet highly customizable blog
    • 🧠 Sharpen your thinking with a second brain
    • 📈 Communicate your results effectively with the best data visualizations
    • 👩🏼‍🏫 Teach academic courses
    • ✅ Manage your projects
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python

Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning

Sep 26, 2025·
Yiwen Zhu
Jinyi Liu
Jinyi Liu
,
Pengjie Gu
,
Yifu Yuan
,
Zhenxing Ge
,
Wenya Wei
,
Zhou Fang
,
Yujing Hu
,
Bo An
· 0 min read
Cite
Type
Conference paper
Publication
NeurIPS 2025
Last updated on Sep 26, 2025
DRL Preference Learning
Jinyi Liu
Authors
Jinyi Liu
Ph.D. Candidate

From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models Aug 1, 2025 →

© 2025 Jinyi Liu. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.