Beyond Scalar Critics: A Distributional Perspective on Reinforcement Learning with Verifiable Rewards for LLMs

Jan 1, 2026·
Jinyi Liu
Jinyi Liu
,
Yiboyun Chen
,
Hongyao Tang
,
Yi Ma
,
Shuyue Hu
,
Yang Chen
,
Fei Ni
,
Qiaosheng Zhang
,
Lei Bai
,
Yan Zheng
,
Jianye Hao
· 0 min read
Type
Publication
SPOT@ICLR 2026

Overview

DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.

Venue. SPOT@ICLR 2026