pith. sign in

hub

Llms can easily learn to reason from demonstrations structure, not content, is what matters!

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 4 2025 7

roles

background 1

polarities

background 1

representative citing papers

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

cs.CL · 2025-07-21 · unverdicted · novelty 6.0

Archer introduces response-level entropy normalization and differentiated clipping/KL regularization in RLVR to encourage exploration on reasoning tokens while stabilizing knowledge tokens, yielding gains in pass@1 and pass@K on reasoning benchmarks.

citing papers explorer

Showing 11 of 11 citing papers.