PbRL

Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

A preference-based reinforcement learning study on improving reward models with proximal policy exploration.

Jul 1, 2024

Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

A unified platform and benchmark suite for reinforcement learning with diverse human feedback.

Jan 1, 2024

Uni-RLHF

Public platform page for Uni-RLHF, emphasizing the benchmark, interface, and reproducible workflow for RLHF experimentation.

Jan 1, 2024