Paper-Conference

Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning

Sep 26, 2025

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Analyzes mediator roles and decisive voices within multi-agent debate frameworks, revealing how influence shifts throughout deliberation.

Feb 15, 2025

SheetAgent: towards a generalist agent for spreadsheet reasoning and manipulation via large language models
SheetAgent: towards a generalist agent for spreadsheet reasoning and manipulation via large language models

SheetAgent, an novel autonomous agent that utilizes the power of LLMs.

Jan 8, 2025

Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

Jul 1, 2024

Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments
Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments

We propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control.

May 1, 2024

A trajectory perspective on the role of data sampling techniques in offline reinforcement learning

Organizing samples in a trajective manner can improve the learning efficiency for offline RL algorithms.

May 1, 2024

vMFER: Von Mises-Fisher experience resampling based on uncertainty of gradient directions for policy improvement

Jan 1, 2024

Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

Jan 1, 2024

Kisa: A unified keyframe identifier and skill annotator for long-horizon robotics demonstrations

Jan 1, 2024

ENOTO: improving offline-to-online reinforcement learning with Q-ensembles

Jan 1, 2024