StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.
Advances in Neural Information Processing Systems , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
citing papers explorer
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.