Reliable reasoning and decision-making
with LLM post-training, reinforcement learning, and agents.

I am a Ph.D. candidate at Tianjin University and a member of the TJU DRL Lab. I work with Jianye Hao, Yan Zheng, and Hongyao Tang on reinforcement learning, LLM post-training, and agentic systems that make language models reason more reliably, act more effectively, and support scientific discovery.
Current focus

Scaling post-training for language models, building reliable reasoning frameworks, and designing agentic systems that connect decision-making with real-world scientific and practical workflows.

Open to
Open to collaborations, research internships, and conversations around LLM post-training, RL, agents, and AI for science. The TJU DRL Lab is also welcoming interns and prospective MS/PhD students.

Signature Themes

Research Pillars

Three directions that define how I think about reliable language-model systems and their real-world use.

Reliable reasoning

I design reasoning frameworks that make language models more consistent, more controllable, and more trustworthy on complex tasks.

Fine-grained reasoning Reliability Decision-making

LLM post-training

I study reinforcement-learning-based post-training methods that improve capability while reducing cost and instability.

RL tuning Efficiency Verifiable rewards

Agentic systems for science

I build agent systems that translate language-model reasoning into useful workflows for research, analysis, and discovery.

AI for science Multi-agent systems Real-world tools

Systems

Selected Projects

Selected systems and learning artifacts that turn research ideas into tools people can actually use.

View All Projects
LLM Agent Tutorial

LLM Agent Tutorial

Project page for the LLM Agent Tutorial website, highlighting the public learning resource, companion materials, and entry points for readers.

LLM Agent Tutorial
CellAgent

CellAgent

An LLM-driven multi-agent system that lets researchers run end-to-end single-cell analysis through natural language while preserving high-quality scientific outputs.

LLM Agent CellAgent
Uni-RLHF

Uni-RLHF

An integrated RLHF platform that makes it easier to study, compare, and operationalize reinforcement learning from diverse human and synthetic feedback.

PbRL RLHF

Updates

Recent News

Recent milestones across papers, systems, tutorials, and community work.

Trajectory

Experience Snapshot

A brief view of research training, industry collaboration, and selected recognition.

Full Experience

Experience

Aug 2025 - Present

Algorithm Research Intern

Shanghai AI Lab (advised by Shuyue Hu)

Oct 2024 - Aug 2025

Algorithm Research Intern (Project Collaboration)

Kuaishou (advised by Hangyu Mao)

Jun 2022 - Mar 2024

Algorithm Research Intern

NetEase (advised by Yujing Hu)

Selected Honors

2025

CSIG Science and Technology Progress Award, First Prize (2025年度CSIG科技进步奖一等奖)

CSIG

2025

Distinguished PC Members in AAMAS 2025

AAMAS 2025

2024

Academic First-class Scholarship (Top 10%)

Tianjin University