Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model

Jan 5, 2026·
Jing Liang
*
,
Jinyi Liu
Jinyi Liu
*
,
Yi Ma
*
,
Hongyao Tang
,
Yan Zheng
,
Shuyue Hu
,
Lei Bai
,
Jianye Hao
· 0 min read
Type
Publication
The Fourteenth International Conference on Learning Representations (ICLR 2026 Poster)

Overview

ReMix brings off-policy reinforcement finetuning to LLM post-training by reusing rollout data from past policies, dramatically reducing training cost while staying competitive on math reasoning benchmarks.

Venue. The Fourteenth International Conference on Learning Representations (ICLR 2026 Poster)