A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.
Sep 26, 2025
A preference-based reinforcement learning study on improving reward models with proximal policy exploration.
Jul 1, 2024

We propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control.
May 1, 2024
A survey of exploration methods in deep reinforcement learning, spanning single-agent and multi-agent settings.
Jan 1, 2023