Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model

Jan 5, 2026·

Jing Liang

Jinyi Liu

Yi Ma

Hongyao Tang

Yan Zheng

Shuyue Hu

Lei Bai

Jianye Hao

· 0 min read

PDF Cite Project Source Document

Type

Conference paper

Publication

The Fourteenth International Conference on Learning Representations (ICLR 2026 Poster)

Overview

ReMix brings off-policy reinforcement finetuning to LLM post-training by reusing rollout data from past policies, dramatically reducing training cost while staying competitive on math reasoning benchmarks.

Venue. The Fourteenth International Conference on Learning Representations (ICLR 2026 Poster)

Last updated on Jan 5, 2026