Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
Jan 5, 2026·,
,,,,,,·
0 min read
Jing Liang
*Jinyi Liu
*Yi Ma
*Hongyao Tang
Yan Zheng
Shuyue Hu
Lei Bai
Jianye Hao
Type
Publication
The Fourteenth International Conference on Learning Representations (ICLR 2026 Poster)
Overview
ReMix brings off-policy reinforcement finetuning to LLM post-training by reusing rollout data from past policies, dramatically reducing training cost while staying competitive on math reasoning benchmarks.
Venue. The Fourteenth International Conference on Learning Representations (ICLR 2026 Poster)