Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language ModelJul 9, 2025·Jing Liang,Hongyao Tang,Yi MaJinyi Liu,Yan Zheng,Shuyue Hu,Lei Bai,Jianye Hao· 0 min read PDF Cite Source DocumentTypeJournal articlePublicationarXiv preprint arXiv:2507.06892Last updated on Jul 9, 2025LLM Post-Training Reinforcement Learning Fine-Tuning Off-Policy AuthorsJinyi LiuPh.D. Candidate ← From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models Aug 1, 2025Unlocking Multi-Agent Debate Potential: Enhancing Effective Scaling through Role Allocation Strategies Jun 15, 2025 →