Beyond Scalar Critics: A Distributional Perspective on Reinforcement Learning with Verifiable Rewards for LLMs
Jan 1, 2026·
,,,,,,,,,,·
0 min read
Jinyi Liu
Yiboyun Chen
Hongyao Tang
Yi Ma
Shuyue Hu
Yang Chen
Fei Ni
Qiaosheng Zhang
Lei Bai
Yan Zheng
Jianye Hao
Overview
DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.
Venue. SPOT@ICLR 2026