ReMix brings off-policy reinforcement finetuning to LLM post-training by reusing rollout data from past policies, dramatically reducing training cost while staying competitive on math reasoning benchmarks.
Jan 5, 2026

An LLM-driven multi-agent system for end-to-end single-cell and spatial transcriptomics analysis through natural language, combining hierarchical planning, expert tools, and self-reflective optimization.
Jan 4, 2026
FSD connects spatial visual reasoning with robotic action by generating structured intermediate representations that improve generalization on unseen manipulation tasks.
Jan 3, 2026
AgentMemoryBench is a unified benchmark for continual agent memory that measures improvement, retention, forgetting, transfer, and conflict resolution over time, together with a multi-memory baseline called MEMs.
Jan 2, 2026
DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.
Jan 1, 2026
A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.
Sep 26, 2025
Analyzes mediator roles and decisive voices within multi-agent debate frameworks, revealing how influence shifts throughout deliberation.
Feb 15, 2025

SheetAgent is a generalist LLM agent for spreadsheet reasoning and manipulation across realistic multi-step spreadsheet tasks.
Jan 8, 2025
A preference-based reinforcement learning study on improving reward models with proximal policy exploration.
Jul 1, 2024

We propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control.
May 1, 2024