Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language ModelJul 9, 2025·Jing Liang,Hongyao Tang,Yi MaJinyi Liu,Yan Zheng,Shuyue Hu,Lei Bai,Jianye Hao· 0 min read PDF Cite Source DocumentTypeJournal articlePublicationarXiv preprint arXiv:2507.06892Last updated on Jul 9, 2025LLM Post-Training (RL Tuning) LLM Post-Training DRL Fine-Tuning Off-Policy AuthorsJinyi LiuPh.D. Candidate ← RoboAnnotatorX: A Comprehensive and Universal Annotation Framework for Accurate Understanding of Long-horizon Robot Demonstration Jul 10, 2025Unlocking Multi-Agent Debate Potential: Enhancing Effective Scaling through Role Allocation Strategies Jun 15, 2025 →