DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.
Jan 1, 2026

A lightweight large language model inference framework that performs structured and fine-grained natural language reasoning without the need for complex search and external tools.
Aug 1, 2025
A dual-process framework that combines retrieval and reasoning to improve multi-hop question answering.
May 31, 2025