Publications

Use the tag controls below to browse publications by DRL, LLM Agent, LLM Post-training (RL Tuning), LLM Post-training (TTS), or Embodied AI focus areas.

RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration

Longxin Kou, Fei Ni, Jianye HAO, Peilong Han, Jinyi Liu, Haiqin Cui, Rui Liu, YAN ZHENG

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2025 · Jul 2025

This paper presents RoboAnnotatorX, a comprehensive and universal framework for annotating long-horizon robot demonstrations to enable accurate understanding.

Embodied AI Robotics Computer Vision Annotation Robot Demonstration

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, YAN ZHENG, Shuyue Hu, Lei Bai, Jianye HAO

arXiv preprint arXiv:2507.06892 · Jul 2025

This paper introduces an efficient method for finetuning Large Language Models (LLMs) using off-policy reinforcement learning, aiming to improve performance while minimizing computational resources.

LLM Post-training (RL Tuning) LLM Post-training DRL Fine-tuning Off-policy

War of Thoughts: Competition Stimulates Stronger Reasoning in Large Language Models

Yibin Chen, Jinyi Liu, YAN ZHENG, Yifu Yuan, Jianye HAO

Findings of the Association for Computational Linguistics: ACL 2025 · May 2025

This paper investigates how competitive mechanisms can enhance the reasoning capabilities of Large Language Models (LLMs), leading to improved performance on complex tasks.

LLM Post-training (TTS) ACL 2025 ACL Findings Competition

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Qian Zhang, YAN ZHENG, Jinyi Liu, Hebin Liang, Lanjun Wang

AAAI Conference on Artificial Intelligence, 2025 (Poster) · Feb 2025

Analyzes mediator roles and decisive voices within multi-agent debate frameworks, revealing how influence shifts throughout deliberation.

LLM Post-training (TTS) LLM Agent