Publications

(2025). Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning. NeurIPS 2025.
(2025). From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models. SCALR@COLM 2025.
(2025). Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model. arXiv preprint arXiv:2507.06892.
(2025). Unlocking Multi-Agent Debate Potential: Enhancing Effective Scaling through Role Allocation Strategies. ICML 2025 Workshop on Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures.
(2025). War of Thoughts: Competition Stimulates Stronger Reasoning in Large Language Models. Findings of the Association for Computational Linguistics: ACL 2025.
(2025). DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering. Proceedings of the Association for Computational Linguistics: ACL 2025.
(2025). From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation. arXiv preprint arXiv:2505.08548.
(2025). SheetAgent: towards a generalist agent for spreadsheet reasoning and manipulation via large language models. Proceedings of the ACM on Web Conference 2025.
(2024). Cellagent: An llm-driven multi-agent framework for automated single-cell data analysis. BioRxiv.
(2024). Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning. NeurIPS 2024 Workshop on Behavioral Machine Learning.