Sep 26, 2025
This paper introduces an efficient method for finetuning Large Language Models (LLMs) using off-policy reinforcement learning, aiming to improve performance while minimizing computational resources.
Jul 9, 2025
Jul 1, 2024

We propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control.
May 1, 2024
Organizing samples in a trajective manner can improve the learning efficiency for offline RL algorithms.
May 1, 2024
Feb 1, 2024
Jan 1, 2024
Jan 1, 2024
Jan 1, 2024
Jan 1, 2023