pith. machine review for the scientific record. sign in

arxiv: 2604.17614 · v1 · submitted 2026-04-19 · 💻 cs.AI · cs.CL· cs.LG

Recognition: unknown

Characterizing Model-Native Skills

Feiyang Kang, Mahavir Dabas, Myeongseob Ko, Ruoxi Jia

Pith reviewed 2026-05-10 05:39 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG
keywords model-native skillsactivation spaceorthogonal basisdata selectioninference steeringreasoning post-trainingsafety alignmentlanguage model interventions
0
0 comments X

The pith

Recovering an orthogonal basis from model activations identifies skill directions that improve data selection and steering more effectively than human taxonomies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that skill descriptions for language models should derive from the model's internal representations rather than external human-defined categories. It extracts a compact orthogonal basis directly from sequence-level activations to reveal axes along which the model varies its behavior. These directions then guide both the selection of post-training data and interventions at inference time. Experiments show that this method raises accuracy on math reasoning tasks beyond what human skill labels achieve, and it supports safety alignment by focusing training data on model-native coverage. A sympathetic reader would see this as evidence that aligning interventions with the model's own structure yields more reliable control over its outputs.

Core claim

A compact orthogonal basis recovered from sequence-level activations captures axes of behavioral variation that the model organizes around, independent of any human ontology. When these directions inform SFT data selection for reasoning, they produce Pass@1 gains of up to 20 percent on MATH and 41 percent on AMC, exceeding results from human-characterized skills. The same directions function as steering vectors during inference, lifting Pass@8 by up to 4.8 percent on MATH. The approach also improves sample efficiency in safety alignment by selecting adversarial data according to model-native skill coverage instead of textual diversity.

What carries the argument

A compact orthogonal basis extracted from sequence-level activations that identifies model-native axes of behavioral variation for data selection and inference steering.

If this is right

  • Post-training data chosen along the recovered directions raises Pass@1 accuracy on math benchmarks more than selection based on human skill descriptions.
  • The same activation-space directions serve as steering vectors that improve inference-time performance on reasoning tasks where human skill labels offer no equivalent mechanism.
  • Prioritizing coverage of model-native skills when choosing adversarial examples leads to more sample-efficient gains during safety alignment.
  • Interventions grounded in the model's internal representations provide a unified foundation for both training data curation and runtime control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other domains such as code generation or multimodal tasks by repeating the activation-basis extraction on appropriate datasets.
  • These directions might offer a starting point for building more stable internal maps of model capabilities that persist across different scales or continued training.
  • Combining the basis with other activation-analysis techniques could produce finer-grained skill decompositions for targeted editing.
  • One testable extension is whether the recovered directions remain effective when transferred to models of substantially different architecture or training history.

Load-bearing premise

The orthogonal basis recovered from activations genuinely reflects axes of variation the model uses to organize its behavior rather than arising as an artifact of the extraction procedure or chosen dataset.

What would settle it

If selecting data or applying steering vectors along the recovered directions produces no accuracy improvement or even lower scores than random selection or human-skill baselines on the same reasoning and alignment tasks, the central claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.17614 by Feiyang Kang, Mahavir Dabas, Myeongseob Ko, Ruoxi Jia.

Figure 1
Figure 1. Figure 1: Comparison between human-defined skills and model-native skills. Existing skill-based approaches manually define skills based on task knowledge. The definitions are often subjective, ad hoc, and overlapping, leading to information loss during translations between model capabilities and human concepts. With Principal Component Analysis (PCA), AUTOSKILL extracts skills as prominent modes in the model’s repre… view at source ↗
Figure 2
Figure 2. Figure 2: 2-dim PCA projection of layer-15 ac￾tivations for 1,494 jailbreak prompts across 32 attack families, showing substantial overlap in activation space despite textual differences. Latent overlap across jailbreak families. To examine the mismatch between textual diver￾sity and latent coverage, we curate 32 rep￾resentative jailbreak papers spanning a two￾year period. This yields a set of 1,494 jailbreak prompt… view at source ↗
Figure 3
Figure 3. Figure 3: Vector norm at each layer for each principal direction extracted by AUTOSKILL. The vector norm gradually increases from the upper to the lower layers, suggesting that the lower layers take larger weights in the principal directions [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Correlation map between principal directions extracted from all layers and principal directions from top/middle/bottom layers. The brighter colors around the primary diagonal line are clearly visible, suggesting a high correlation between the same principal direction extracted from different layers. Principal directions from the top layers, which have the lowest vector norm, still show high correlation to … view at source ↗
read the original abstract

Skills are a natural unit for describing what a language model can do and how its behavior can be changed. However, existing characterizations rely on human-written taxonomies, textual descriptions, or manual profiling pipelines--all external hypotheses about what matters that need not align with the model's internal representations. We argue that when the goal is to intervene on model behavior, skill characterization should be *model-native*: grounded in the model's own representations rather than imposed through external ontologies. We instantiate this view by recovering a compact orthogonal basis from sequence-level activations. The resulting basis is semantically interpretable but need not correspond to any predefined human ontology; instead, it captures axes of behavioral variation that the model itself organizes around. We validate this characterization on reasoning post-training, using the recovered basis for both SFT data selection and inference-time steering. We develop lightweight proxy interventions to identify which directions are most useful for a given model. Across Llama3-8B and Qwen2.5-3B, selecting data along those directions improves Pass@1 by up to 20% on MATH and 41% on AMC, outperforming data selection based on human-characterized skills. Because the basis lives in activation space, the same directions also serve as steering vectors at inference time, improving Pass@8 by up to 4.8% on MATH--an intervention that human-characterized skills cannot support. We further validate the characterization on safety alignment, where selecting adversarial training data for model-native skill coverage rather than textual diversity yields more sample-efficient learning. These results suggest that recovering skills from the model's own representations, rather than imposing them externally, provides a more effective foundation for intervening on model behavior. Codes are open-sourced.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes recovering a compact orthogonal basis from sequence-level activations to characterize model-native skills, arguing this is preferable to human-imposed taxonomies. The basis is applied to select SFT data for reasoning post-training and as steering vectors at inference, with reported gains of up to 20% Pass@1 on MATH and 41% on AMC across Llama3-8B and Qwen2.5-3B (outperforming human skill baselines), plus up to 4.8% Pass@8 improvement via steering; a similar approach is shown for safety alignment data selection.

Significance. If the empirical claims hold under stronger controls, the work offers a concrete, representation-grounded alternative to external skill ontologies, with the dual utility for data selection and steering as a clear strength. Open-sourcing the code is a positive for reproducibility. The results could influence post-training and alignment practices by prioritizing internal model structure, though the significance depends on confirming the basis reflects genuine behavioral axes rather than extraction artifacts.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (empirical results): the central outperformance claims (up to 20% Pass@1 on MATH, 41% on AMC) are presented without statistical significance tests, variance across seeds, or error bars, which is load-bearing for the comparison to human-characterized baselines.
  2. [§3] §3 (basis extraction): no ablation or sensitivity analysis is reported on key choices such as layer selection, sequence pooling method, or the number of retained directions, leaving open whether the gains depend on specific hyperparameters rather than intrinsic model properties.
  3. [§4.2 and §5] §4.2 and §5 (proxy interventions and validation): the tests for direction utility do not include stability checks of the recovered basis across alternative datasets, layers, or extraction variants, so the results do not yet rule out that the directions are corpus-specific correlations rather than model-organized axes.
minor comments (3)
  1. [Abstract] Abstract: the models (Llama3-8B, Qwen2.5-3B) and exact benchmark splits could be named earlier for immediate clarity.
  2. [§3] Notation: the definition of the orthogonal basis would benefit from an explicit equation showing the extraction objective and orthogonality constraint.
  3. [Figures 3-5] Figures: axis labels and legend text in the steering and data-selection plots are small; increasing font size would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the empirical rigor of our work. We address each major comment below and commit to revisions that incorporate additional statistical analyses and sensitivity checks.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (empirical results): the central outperformance claims (up to 20% Pass@1 on MATH, 41% on AMC) are presented without statistical significance tests, variance across seeds, or error bars, which is load-bearing for the comparison to human-characterized baselines.

    Authors: We agree that the absence of statistical significance testing and variance reporting weakens the comparison to baselines. In the revised manuscript, we will rerun the key experiments across multiple random seeds (e.g., 5 seeds), report mean and standard deviation with error bars in the figures and tables of §4, and include p-values from appropriate statistical tests (such as paired t-tests) to confirm the significance of the outperformance over human-characterized skill baselines. The abstract will be updated to note these controls if space permits. revision: yes

  2. Referee: [§3] §3 (basis extraction): no ablation or sensitivity analysis is reported on key choices such as layer selection, sequence pooling method, or the number of retained directions, leaving open whether the gains depend on specific hyperparameters rather than intrinsic model properties.

    Authors: We acknowledge that additional ablations would better demonstrate robustness. While our layer and pooling choices were selected based on initial validation for semantic interpretability and computational efficiency, we will include a new subsection in §3 with sensitivity analyses. Specifically, we will vary the extraction layer (comparing middle vs. late layers), pooling strategies (mean pooling vs. last-token), and the number of retained orthogonal directions (e.g., 8, 16, 32), reporting the impact on downstream data selection performance to show that gains are not overly sensitive to these choices. revision: yes

  3. Referee: [§4.2 and §5] §4.2 and §5 (proxy interventions and validation): the tests for direction utility do not include stability checks of the recovered basis across alternative datasets, layers, or extraction variants, so the results do not yet rule out that the directions are corpus-specific correlations rather than model-organized axes.

    Authors: We agree that stability across variations is crucial to support the model-native claim. In the revision, we will add experiments in §4.2 and §5 that extract bases from alternative datasets (such as a held-out reasoning corpus and general web text), different layers, and minor variants of the orthogonalization procedure. We will quantify stability via metrics like cosine similarity between direction sets and re-evaluate the proxy interventions and downstream gains to confirm consistency, thereby addressing the concern that directions may be corpus-specific. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical gains measured on external benchmarks.

full rationale

The derivation extracts an orthogonal basis from sequence-level activations via standard dimensionality reduction, then applies the directions to data selection and steering. Reported improvements (up to 20% Pass@1 on MATH, 41% on AMC, 4.8% Pass@8 via steering) are quantified on independent test sets and compared against a separate human-characterized baseline. No equation or step reduces these gains to quantities defined by the basis construction itself; the proxy interventions for direction utility are lightweight and their success is evaluated externally. No self-citations, ansatzes, or uniqueness claims are load-bearing in a way that collapses the argument to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method assumes that a low-dimensional orthogonal basis extracted from activations meaningfully organizes behavioral variation and that this basis transfers to data selection and steering without additional fitting.

axioms (1)
  • domain assumption Sequence-level activations contain linearly independent directions that correspond to axes of behavioral variation organized by the model
    Invoked when recovering the compact orthogonal basis from activations.

pith-pipeline@v0.9.0 · 5617 in / 1340 out tokens · 46601 ms · 2026-05-10T05:39:07.932943+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

104 extracted references · 65 canonical work pages · 22 internal anchors

  1. [1]

    Attention is All you Need , url =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

  2. [2]

    A theory for emergence of complex skills in language models

    A theory for emergence of complex skills in language models , author=. arXiv preprint arXiv:2307.15936 , year=

  3. [3]

    2025 , month = oct, day =

    Zhang, Barry and Lazuka, Keith and Murag, Mahesh , title =. 2025 , month = oct, day =

  4. [4]

    Evaltree: Pro- filing language model weaknesses via hierarchical capability trees.arXiv preprint arXiv:2503.08893,

    Evaltree: Profiling language model weaknesses via hierarchical capability trees , author=. arXiv preprint arXiv:2503.08893 , year=

  5. [5]

    arXiv preprint arXiv:2512.01775 , year=

    How Does RL Post-training Induce Skill Composition? A Case Study on Countdown , author=. arXiv preprint arXiv:2512.01775 , year=

  6. [6]

    AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy, June 2025

    AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy , author=. arXiv preprint arXiv:2506.13284 , year=

  7. [7]

    Metacognitive reuse: Turning recurring llm reasoning into concise behaviors.ArXiv preprint arXiv:2509.13237,

    Metacognitive reuse: Turning recurring llm reasoning into concise behaviors , author=. arXiv preprint arXiv:2509.13237 , year=

  8. [8]

    arXiv preprint arXiv:2503.17195 , year=

    TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning , author=. arXiv preprint arXiv:2503.17195 , year=

  9. [9]

    Llama-nemotron: Efficient reasoning models

    Llama-nemotron: Efficient reasoning models , author=. arXiv preprint arXiv:2505.00949 , year=

  10. [10]

    arXiv preprint arXiv:2510.10023 , year=

    Skill-Targeted Adaptive Training , author=. arXiv preprint arXiv:2510.10023 , year=

  11. [11]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

  12. [12]

    Advances in Neural Information Processing Systems , volume=

    Can models learn skill composition from examples? , author=. Advances in Neural Information Processing Systems , volume=

  13. [13]

    The Linear Representation Hypothesis and the Geometry of Large Language Models

    The linear representation hypothesis and the geometry of large language models , author=. arXiv preprint arXiv:2311.03658 , year=

  14. [14]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Steering llama 2 via contrastive activation addition , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  15. [15]

    Steering Language Models With Activation Engineering

    Steering language models with activation engineering , author=. arXiv preprint arXiv:2308.10248 , year=

  16. [16]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Tulu 3: Pushing frontiers in open language model post-training , author=. arXiv preprint arXiv:2411.15124 , year=

  17. [17]

    doi:10.48550/arXiv.2410.05295 , abstract =

    Autodan-turbo: A lifelong agent for strategy self-exploration to jailbreak llms , author=. arXiv preprint arXiv:2410.05295 , year=

  18. [18]

    Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, King Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin, Zhenlin Wei, et al

    h4rm3l: A language for composable jailbreak attack synthesis , author=. arXiv preprint arXiv:2408.04811 , year=

  19. [19]

    arXiv preprint arXiv:2505.20259 , year=

    Lifelong Safety Alignment for Language Models , author=. arXiv preprint arXiv:2505.20259 , year=

  20. [20]

    arXiv preprint arXiv:2510.21910 , year=

    Adversarial D 'ej a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks , author=. arXiv preprint arXiv:2510.21910 , year=

  21. [21]

    Advances in Neural Information Processing Systems , volume=

    Refusal in language models is mediated by a single direction , author=. Advances in Neural Information Processing Systems , volume=

  22. [22]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    SCANS: Mitigating the exaggerated safety for llms via safety-conscious activation steering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  23. [23]

    Just enough shifts: Mitigating over-refusal in aligned language models with targeted representation fine-tuning

    Just enough shifts: Mitigating over-refusal in aligned language models with targeted representation fine-tuning , author=. arXiv preprint arXiv:2507.04250 , year=

  24. [24]

    Advances in Neural Information Processing Systems , volume=

    A strongreject for empty jailbreaks , author=. Advances in Neural Information Processing Systems , volume=

  25. [25]

    2024 , month = apr, howpublished =

    Introducing GPT-4.1 , author =. 2024 , month = apr, howpublished =

  26. [26]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  27. [27]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  28. [28]

    M. J. Kearns , title =

  29. [29]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  30. [30]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  31. [31]

    Suppressed for Anonymity , author=

  32. [32]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  33. [33]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

  34. [34]

    11 Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Hee- woo Jun, Tom B

    Language models scale reliably with over-training and on downstream tasks , author=. arXiv preprint arXiv:2403.08540 , year=

  35. [35]

    Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning , author=. arXiv preprint arXiv:2506.01939 , year=

  36. [36]

    Advances in Neural Information Processing Systems , volume=

    Paloma: A benchmark for evaluating language model fit , author=. Advances in Neural Information Processing Systems , volume=

  37. [37]

    Scaling Learning Algorithms Towards

    Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

  38. [38]

    and Osindero, Simon and Teh, Yee Whye , journal =

    Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

  39. [39]

    2016 , publisher=

    Deep learning , author=. 2016 , publisher=

  40. [40]

    Llama-Nemotron: Efficient Reasoning Models , author=

  41. [41]

    AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning.arXiv preprint arXiv:2505.16400,

    Acereason-nemotron: Advancing math and code reasoning through reinforcement learning , author=. arXiv preprint arXiv:2505.16400 , year=

  42. [42]

    2025 , journal=

    DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL , author=. 2025 , journal=

  43. [43]

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    Llamafactory: Unified efficient fine-tuning of 100+ language models , author=. arXiv preprint arXiv:2403.13372 , year=

  44. [44]

    2024 , journal =

    HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

  45. [45]

    Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

    Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

  46. [46]

    Representation Engineering: A Top-Down Approach to AI Transparency

    Representation engineering: A top-down approach to ai transparency , author=. arXiv preprint arXiv:2310.01405 , year=

  47. [47]

    arXiv preprint arXiv:2510.01624 , year=

    Quagmires in sft-rl post-training: When high sft scores mislead and what to use instead , author=. arXiv preprint arXiv:2510.01624 , year=

  48. [48]

    Subliminal learning: Language models transmit behavioral traits via hidden signals in data.arXiv preprint arXiv:2507.14805,

    Subliminal learning: Language models transmit behavioral traits via hidden signals in data , author=. arXiv preprint arXiv:2507.14805 , year=

  49. [49]

    Math-Verify: Math Verification Library , author=

  50. [50]

    Measuring Mathematical Problem Solving With the MATH Dataset

    Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

  51. [51]

    2023 , publisher =

    Hemish Veeraboina , title =. 2023 , publisher =

  52. [52]

    Training Verifiers to Solve Math Word Problems

    Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

  53. [53]

    2025 , publisher =

    Mathematical Association of America , title =. 2025 , publisher =

  54. [54]

    2025 , url =

    American Mathematics Competitions , title =. 2025 , url =

  55. [55]

    OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

    Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems , author=. arXiv preprint arXiv:2402.14008 , year=

  56. [56]

    Advances in neural information processing systems , volume=

    Solving quantitative reasoning problems with language models , author=. Advances in neural information processing systems , volume=

  57. [57]

    DeepSeek-V3 Technical Report

    Deepseek-v3 technical report , author=. arXiv preprint arXiv:2412.19437 , year=

  58. [58]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

  59. [59]

    Qwen3 Technical Report

    Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

  60. [60]

    QwQ-32B: Embracing the Power of Reinforcement Learning , url =

    Qwen Team , month =. QwQ-32B: Embracing the Power of Reinforcement Learning , url =

  61. [61]

    OpenAI GPT-5 System Card

    Openai gpt-5 system card , author=. arXiv preprint arXiv:2601.03267 , year=

  62. [62]

    OpenAI o1 System Card

    Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=

  63. [63]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

  64. [64]

    Qwen2.5 Technical Report

    Qwen2.5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

  65. [65]

    https://ai

    The llama 4 herd: The beginning of a new era of natively multimodal ai innovation , author=. https://ai. meta. com/blog/llama-4-multimodal-intelligence/, checked on , volume=

  66. [66]

    The Llama 3 Herd of Models

    The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  67. [67]

    https://mistral.ai/news/mistral-nemo, accessed on 2025-09-25 , year=

    Mistral NeMo , author=. https://mistral.ai/news/mistral-nemo, accessed on 2025-09-25 , year=

  68. [68]

    arXiv preprint arXiv:2502.13124 , year=

    Naturalreasoning: Reasoning in the wild with 2.8 m challenging questions , author=. arXiv preprint arXiv:2502.13124 , year=

  69. [69]

    s1: Simple test-time scaling

    s1: Simple test-time scaling , author=. arXiv preprint arXiv:2501.19393 , year=

  70. [70]

    arXiv preprint arXiv:2502.04194 , year=

    The best instruction-tuning data are those that fit , author=. arXiv preprint arXiv:2502.04194 , year=

  71. [71]

    arXiv preprint arXiv:2501.18578 , year=

    Rip: Better models by survival of the fittest prompts , author=. arXiv preprint arXiv:2501.18578 , year=

  72. [72]

    arXiv preprint arXiv:2502.03387 , year=

    Limo: Less is more for reasoning , author=. arXiv preprint arXiv:2502.03387 , year=

  73. [73]

    Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond

    Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond , author=. arXiv preprint arXiv:2503.10460 , year=

  74. [74]

    Learning to reason without external rewards

    Learning to reason without external rewards , author=. arXiv preprint arXiv:2505.19590 , year=

  75. [75]

    Spurious rewards: Rethinking training signals in rlvr.arXiv preprint arXiv:2506.10947,

    Spurious rewards: Rethinking training signals in rlvr , author=. arXiv preprint arXiv:2506.10947 , year=

  76. [76]

    Sft or rl? an early investigation into training r1-like reasoning large vision-language models

    Sft or rl? an early investigation into training r1-like reasoning large vision-language models , author=. arXiv preprint arXiv:2504.11468 , year=

  77. [77]

    TTRL: Test-time reinforcement learning.arXiv preprint arXiv:2504.16084, 2025

    Ttrl: Test-time reinforcement learning , author=. arXiv preprint arXiv:2504.16084 , year=

  78. [78]

    Octothinker: Mid-training incentivizes reinforcement learning scaling.arXiv preprint arXiv:2506.20512, 2025

    Octothinker: Mid-training incentivizes reinforcement learning scaling , author=. arXiv preprint arXiv:2506.20512 , year=

  79. [79]

    Does math reasoning improve general llm capabilities? understanding transferability of llm reasoning, 2025

    Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning , author=. arXiv preprint arXiv:2507.00432 , year=

  80. [80]

    Behavior injection: Preparing language models for reinforcement learning

    Behavior Injection: Preparing Language Models for Reinforcement Learning , author=. arXiv preprint arXiv:2505.18917 , year=

Showing first 80 references.