Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.
arXiv preprint arXiv:2602.13517 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9verdicts
UNVERDICTED 9roles
background 2polarities
background 2representative citing papers
GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
The Tutoring Effectiveness Index (TEI) uses four signals from LLM conversations to select math tutoring responses, raising student improvement rates from 59.0% to 81.9% at N=8 on a frozen DeepSeek-R1-8B model without training or judges.
InsightReplay improves long CoT reasoning by extracting critical insights from the trace and replaying them near the active frontier, delivering +1.65 average accuracy gain across 24 model-benchmark settings.
Large reasoning models show measurable hidden-state dynamics that a new statistic can use to distinguish correct reasoning trajectories without labels.
Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.
Proposes IAR framework using MIP token isolation, DTR overlap analysis, and Jaccard stability to interpret reasoning patterns in Qwen and Llama models across math, code, logic, and commonsense domains.
Reasoning budget in LRMs functions as a generation ceiling rather than a real-time dial, leaving cognitive cost alignment with humans invariant across effort levels and supporting a training-time compiled account.
citing papers explorer
-
GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions
GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.