pith. sign in

Mixed citations

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

Mixed citation behavior. Most common role is background (68%).

232 Pith papers citing it
539 external citations · Crossref
Background 68% of classified citations

citation-role summary

background 26 method 5 dataset 3 baseline 2 other 1

citation-polarity summary

co-cited works

representative citing papers

Efficient Training on Multiple Consumer GPUs with RoundPipe

cs.DC · 2026-04-29 · conditional · novelty 8.0

RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

Spurious Rewards: Rethinking Training Signals in RLVR

cs.AI · 2025-06-12 · accept · novelty 8.0

Spurious rewards in RLVR can produce large gains in mathematical reasoning for certain language models via GRPO's clipping bias amplifying pretraining behaviors like code reasoning.

Fork-Think with Confidence

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

Fork-think with confidence identifies forking points via model confidence in a single path before sampling continuations, cutting tokens up to 30% and runtime up to 57% on reasoning benchmarks while matching or exceeding parallel thinking performance.

Predictable GRPO: A Closed-Form Model of Training Dynamics

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

A closed-form inertial model of GRPO dynamics that subsumes single-exponential saturation as its overdamped limit and predicts group-size invariance, stability thresholds, and overdamped-to-oscillatory transitions.

What Drives Interactive Improvement from Feedback?

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

Controlled student-teacher experiments across four benchmarks show interactive gains are driven more by the student's ability to use feedback than by teacher quality, with self-feedback adding little beyond unguided retries.

citing papers explorer

Showing 50 of 232 citing papers.