A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.
Sep 26, 2025
A preference-based reinforcement learning study on improving reward models with proximal policy exploration.
Jul 1, 2024

We propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control.
May 1, 2024
Organizing samples in a trajective manner can improve the learning efficiency for offline RL algorithms.
May 1, 2024
A study of how multimodal LLM feedback can improve robotic manipulation planning and execution.
Feb 1, 2024
An experience resampling method that uses gradient-direction uncertainty for more stable policy improvement.
Jan 1, 2024
A unified platform and benchmark suite for reinforcement learning with diverse human feedback.
Jan 1, 2024
An offline-to-online reinforcement learning method that improves transition efficiency with Q-ensembles.
Jan 1, 2024
An offline reinforcement learning method that stays conservative on out-of-distribution states for sequential decision-making.
Jan 1, 2023
A survey of exploration methods in deep reinforcement learning, spanning single-agent and multi-agent settings.
Jan 1, 2023