A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.
Sep 26, 2025