Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Jul 9, 2025·
Jing Liang
,
Hongyao Tang
,
Yi Ma
Jinyi Liu
Jinyi Liu
,
Yan Zheng
,
Shuyue Hu
,
Lei Bai
,
Jianye Hao
· 0 min read
Type
Publication
arXiv preprint arXiv:2507.06892