StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.
Advances in Neural Information Processing Systems , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.
citing papers explorer
-
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning
StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.
-
Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition
Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.