Reasoning

Beyond Scalar Critics: A Distributional Perspective on Reinforcement Learning with Verifiable Rewards for LLMs

DistRLVR is a distributional RL framework for LLM post-training with verifiable rewards that models token-level return distributions and uses tail-aware advantages to improve sample efficiency and reasoning performance.

Jan 1, 2026

From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models

A lightweight large language model inference framework that performs structured and fine-grained natural language reasoning without the need for complex search and external tools.

Aug 1, 2025

DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering

A dual-process framework that combines retrieval and reasoning to improve multi-hop question answering.

May 31, 2025