Beyond Scalar Critics: A Distributional Perspective on Reinforcement Learning with Verifiable Rewards for LLMs

Jan 1, 2026·

Jinyi Liu

Yiboyun Chen

Hongyao Tang

Yi Ma

Shuyue Hu

Yang Chen

Fei Ni

Qiaosheng Zhang

Lei Bai

Yan Zheng

Jianye Hao

· 0 min read

Cite

Type

Conference paper

Publication

SPOT@ICLR 2026

Overview

DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.

Venue. SPOT@ICLR 2026

Last updated on Jan 1, 2026