A preference-based reinforcement learning study on improving reward models with proximal policy exploration.
Jul 1, 2024
A unified platform and benchmark suite for reinforcement learning with diverse human feedback.
Jan 1, 2024

Public platform page for Uni-RLHF, emphasizing the benchmark, interface, and reproducible workflow for RLHF experimentation.
Jan 1, 2024