DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learn- ing

Daya Guo et al · 2025 · DOI 10.1038/s41586-025-09422-

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

cs.AI · 2026-05-31 · conditional · novelty 6.0

LRMs show a large production-evaluation gap on the VAIR dataset with valid answers but invalid reasoning, driven by answer confirmation bias as evidenced by CoT analysis, linear probes, and causal patching.

citing papers explorer

Showing 1 of 1 citing paper.

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models cs.AI · 2026-05-31 · conditional · none · ref 15
LRMs show a large production-evaluation gap on the VAIR dataset with valid answers but invalid reasoning, driven by answer confirmation bias as evidenced by CoT analysis, linear probes, and causal patching.

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learn- ing

fields

years

verdicts

representative citing papers

citing papers explorer