Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model
This paper introduces an efficient method for finetuning Large Language Models (LLMs) using off-policy reinforcement learning, aiming to improve performance while minimizing computational resources.
Jul 9, 2025