Publications

28 papers · * denotes equal contribution

2026

SPOT@ICLR 2026

Beyond Scalar Critics: A Distributional Perspective on Reinforcement Learning with Verifiable Rewards for LLMs

Jinyi Liu*, Yiboyun Chen, Hongyao Tang, Yi Ma, Shuyue Hu, Yang Chen, Fei Ni, Qiaosheng Zhang, Lei Bai, Yan Zheng, Jianye Hao

DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.

arXiv / OpenReview

LLM Post-training (RL Tuning)RLVR

ICLR 2026

CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis

Yihang Xiao*, Jinyi Liu*, Yan Zheng*, Shaoqing Jiao*, Jianye Hao, Xiaohan Xie, Mingzhi Li, Ruitao Wang, Fei Ni, Yuxiao Li, Zhen Wang, Xuequn Shang, Zhijie Bao, Changxiao Yang, Jiajie Peng

An LLM-driven multi-agent system for end-to-end single-cell and spatial transcriptomics analysis through natural language, combining hierarchical planning, expert tools, and self-reflective optimization.

PDF

LLM AgentAI4S

ICLR 2026

Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Models

Jing Liang*, Jinyi Liu*, Yi Ma*, Hongyao Tang, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao

ReMix brings off-policy reinforcement finetuning to LLM post-training by reusing rollout data from past policies, dramatically reducing training cost while staying competitive on math reasoning benchmarks.

PDF arXiv / OpenReview

LLM Post-training (RL Tuning)RLVR

ICLR 2026

From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation

Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, Jianye Hao

FSD connects spatial visual reasoning with robotic action by generating structured intermediate representations that improve generalization on unseen manipulation tasks.

PDF arXiv / OpenReview

Embodied AILLM Agent

LLA@ICLR 2026

Benchmarking Continual Agent Memory for Online Learning, Transfer, and Forgetting

Zihang Ma*, Jinyi Liu*, Hongyao Tang, Yi Ma, Ruitao Wang, Yifu Yuan, Yan Zheng, Jianye Hao

AgentMemoryBench is a unified benchmark for continual agent memory that measures improvement, retention, forgetting, transfer, and conflict resolution over time, together with a multi-memory baseline called MEMs.

LLM AgentMemory

WWW 2026

AFE-Master: Enhancing LLM-Driven Autonomous Feature Engineering with Domain-Specific Language Parsing and Guided Local Search

Hebin Liang, Jianye Hao, Jinyi Liu, Yi Ma, Zilin Cao, Jing Liang, Kun Shao, Zhaocheng Du, Fei Ni, Yifu Yuan, Yan Zheng

AFE-Master introduces domain-specific language parsing with guided local search to strengthen LLM-driven autonomous feature engineering.

LLM Agent

KDD 2026

PACE: Unleashing the Power of Code Embeddings to Boost AutoML Agents

Gangyi Zhao, Hebin Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Zhaocheng Du, Yan Zheng, Chenjun Xiao, Jianye Hao

PACE leverages code embeddings to enhance LLM-driven AutoML agents, improving feature engineering and model selection through structured code representations.

LLM Agent

2025

NeurIPS 2025

Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning

Yiwen Zhu*, Jinyi Liu*, Pengjie Gu, Yifu Yuan, Zhenxing Ge, Wenya Wei, Zhou Fang, Yujing Hu, Bo An

A reward-modeling approach for preference-based reinforcement learning built on proximal policy exploration.

DRLPreference Learning

ICCV 2025

RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration

Longxin Kou, Fei Ni, Jianye Hao, Peilong Han, Jinyi Liu, Haiqin Cui, Rui Liu, Yan Zheng

RoboAnnotatorX presents a comprehensive and universal framework for annotating long-horizon robot demonstrations to enable accurate understanding.

Embodied AI

TechRxiv 2025

Hands-on LLM-based Agents: A Tutorial for General Audiences

Shuyue Hu, Siyue Ren, Yang Chen, Chunjiang Mu, Jinyi Liu, Zhiyao Cui, Yiqun Zhang, Hao Li, Dongzhan Zhou, Jia Xu, et al.

Beginner-friendly tutorial covering foundations, nine hands-on examples, practical tips, and frontier roadmap for LLM-based agents.

PDF arXiv / OpenReview

LLM Agent

SCALR@COLM 2025

From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models

Jinyi Liu, Yan Zheng, Rong Cheng, Qiyu Wu, Wei Guo, Fei Ni, Hebin Liang, Yifu Yuan, Hangyu Mao, Fuzheng Zhang, et al.

A lightweight LLM inference framework that performs structured and fine-grained natural language reasoning without complex search or external tools.

PDF arXiv / OpenReview

LLM Post-training (TTS)LLM Agent

ACL 2025

DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-hop Question Answering

Rong Cheng, Jinyi Liu, Yan Zheng, Fei Ni, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang, Jianye Hao

A dual-process framework that combines retrieval and reasoning to improve multi-hop question answering.

LLM AgentLLM Post-training (TTS)

ACL 2025

War of Thoughts: Competition Stimulates Stronger Reasoning in Large Language Models

Yibin Chen*, Jinyi Liu*, Yan Zheng, Yifu Yuan, Jianye Hao

Investigates how competitive mechanisms enhance the reasoning capabilities of LLMs, leading to improved performance on complex tasks.

LLM Post-training (TTS)

AAAI 2025

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Qian Zhang, Jinyi Liu, Yan Zheng, Hebin Liang, Lanjun Wang

Analyzes mediator roles and decisive voices within multi-agent debate frameworks, revealing how influence shifts throughout deliberation.

LLM Post-training (TTS)LLM Agent

WWW 2025

SheetAgent: Towards a Generalist Agent for Spreadsheet Reasoning and Manipulation

Yibin Chen, Jinyi Liu, Yan Zheng, Jianye Hao, et al.

SheetAgent is a generalist LLM agent for spreadsheet reasoning and manipulation across realistic multi-step tasks.

LLM Agent

2024

IJCAI 2024

vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Optimization

Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan

An experience resampling method that uses gradient-direction uncertainty for more stable policy improvement.

DRL

IJCAI 2024

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

Kai Zhao, Jianye Hao, Yi Ma, Jinyi Liu, Yan Zheng, Zhaopeng Meng

An offline-to-online reinforcement learning method that improves transition efficiency with Q-ensembles.

DRLOffline RL

AAMAS 2024

A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning

Jinyi Liu, Yi Ma, Jianye Hao, Yujing Hu, Yan Zheng, Tangjie Lv, Changjie Fan

Organizing samples in a trajectory manner can substantially improve the learning efficiency for offline RL algorithms.

PDF

DRLOffline RL

AAAI 2024

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun

OVD-Explorer achieves noise-aware optimistic exploration by separating useful uncertainty from stochastic noise for continuous control.

PDF

DRLExploration

NeurIPS 2024

PERIA: Perceive, Reason, Imagine, Act via Holistic Language and Vision Planning for Manipulation

Fei Ni, Jianye Hao, Shiguang Wu, Longxin Kou, Yifu Yuan, Zibin Dong, Jinyi Liu, Mingzhi Li, Yuzheng Zhuang, Yan Zheng

A holistic language-and-vision planning framework that unifies perception, reasoning, imagination, and action for manipulation.

Embodied AILLM Agent

ICML 2024

KISA: A Unified Keyframe Identifier and Skill Annotator for Long-horizon Robotics Demonstrations

Longxin Kou, Fei Ni, Yan Zheng, Jinyi Liu, Yifu Yuan, Zibin Dong, Jianye Hao

A unified keyframe identification and skill annotation method for long-horizon robot demonstrations.

Embodied AI

NeurIPS 2024 Workshop

Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

Yiwen Zhu, Jinyi Liu, Yifu Yuan, Wenya Wei, Zhenxing Ge, Zhou Fang, Yujing Hu, Bo An, et al.

A preference-based reinforcement learning study on improving reward models with proximal policy exploration.

PDF arXiv / OpenReview

DRLPreference Learning

arXiv 2024

Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models

Jinyi Liu, Yifu Yuan, Jianye Hao, Fei Ni, Lingzhi Fu, Yibin Chen, Yan Zheng

A study of how multimodal LLM feedback can improve robotic manipulation planning and execution.

PDF arXiv / OpenReview

Embodied AILLM Agent

ICLR 2024

Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng

A unified platform and benchmark suite for reinforcement learning with diverse human feedback.

DRLPreference Learning

2023

TNNLS 2023

Exploration in Deep Reinforcement Learning: From Single-Agent to Multi-Agent Domain

Jianye Hao, Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Zhaopeng Meng, Peng Liu, Zhen Wang

A survey of exploration methods in deep reinforcement learning, spanning single-agent and multi-agent settings.

DRLExploration

CAAI AIR 2023

OSCAR: OOD State-Conservative Offline Reinforcement Learning for Sequential Decision Making

Yi Ma, Chao Wang, Chen Chen, Jinyi Liu, Zhaopeng Meng, Yan Zheng, Jianye Hao

An offline reinforcement learning method that stays conservative on out-of-distribution states for sequential decision-making.

DRLOffline RL

ICLR 2023

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Jinyi Liu, Yingfeng Chen, Changjie Fan

An unsupervised reinforcement learning method that improves efficiency with a multi-choice dynamics model.

DRL

2021

ASE 2021

FIGCPS: Effective Failure-Inducing Input Generation for Cyber-Physical Systems with Deep Reinforcement Learning

Shaohua Zhang, Shuang Liu, Jun Sun, Yuqi Chen, Wenzhi Huang, Jinyi Liu, Jian Liu, Jianye Hao

A deep reinforcement learning approach for generating failure-inducing inputs in cyber-physical systems testing.

DRL