arxiv: 2604.17614 · v1 · submitted 2026-04-19 · 💻 cs.AI · cs.CL· cs.LG

Recognition: unknown

Characterizing Model-Native Skills

Feiyang Kang, Mahavir Dabas, Myeongseob Ko, Ruoxi Jia

Pith reviewed 2026-05-10 05:39 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords model-native skillsactivation spaceorthogonal basisdata selectioninference steeringreasoning post-trainingsafety alignmentlanguage model interventions

0 comments

The pith

Recovering an orthogonal basis from model activations identifies skill directions that improve data selection and steering more effectively than human taxonomies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that skill descriptions for language models should derive from the model's internal representations rather than external human-defined categories. It extracts a compact orthogonal basis directly from sequence-level activations to reveal axes along which the model varies its behavior. These directions then guide both the selection of post-training data and interventions at inference time. Experiments show that this method raises accuracy on math reasoning tasks beyond what human skill labels achieve, and it supports safety alignment by focusing training data on model-native coverage. A sympathetic reader would see this as evidence that aligning interventions with the model's own structure yields more reliable control over its outputs.

Core claim

A compact orthogonal basis recovered from sequence-level activations captures axes of behavioral variation that the model organizes around, independent of any human ontology. When these directions inform SFT data selection for reasoning, they produce Pass@1 gains of up to 20 percent on MATH and 41 percent on AMC, exceeding results from human-characterized skills. The same directions function as steering vectors during inference, lifting Pass@8 by up to 4.8 percent on MATH. The approach also improves sample efficiency in safety alignment by selecting adversarial data according to model-native skill coverage instead of textual diversity.

What carries the argument

A compact orthogonal basis extracted from sequence-level activations that identifies model-native axes of behavioral variation for data selection and inference steering.

If this is right

Post-training data chosen along the recovered directions raises Pass@1 accuracy on math benchmarks more than selection based on human skill descriptions.
The same activation-space directions serve as steering vectors that improve inference-time performance on reasoning tasks where human skill labels offer no equivalent mechanism.
Prioritizing coverage of model-native skills when choosing adversarial examples leads to more sample-efficient gains during safety alignment.
Interventions grounded in the model's internal representations provide a unified foundation for both training data curation and runtime control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to other domains such as code generation or multimodal tasks by repeating the activation-basis extraction on appropriate datasets.
These directions might offer a starting point for building more stable internal maps of model capabilities that persist across different scales or continued training.
Combining the basis with other activation-analysis techniques could produce finer-grained skill decompositions for targeted editing.
One testable extension is whether the recovered directions remain effective when transferred to models of substantially different architecture or training history.

Load-bearing premise

The orthogonal basis recovered from activations genuinely reflects axes of variation the model uses to organize its behavior rather than arising as an artifact of the extraction procedure or chosen dataset.

What would settle it

If selecting data or applying steering vectors along the recovered directions produces no accuracy improvement or even lower scores than random selection or human-skill baselines on the same reasoning and alignment tasks, the central claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.17614 by Feiyang Kang, Mahavir Dabas, Myeongseob Ko, Ruoxi Jia.

**Figure 1.** Figure 1: Comparison between human-defined skills and model-native skills. Existing skill-based approaches manually define skills based on task knowledge. The definitions are often subjective, ad hoc, and overlapping, leading to information loss during translations between model capabilities and human concepts. With Principal Component Analysis (PCA), AUTOSKILL extracts skills as prominent modes in the model’s repre… view at source ↗

**Figure 2.** Figure 2: 2-dim PCA projection of layer-15 activations for 1,494 jailbreak prompts across 32 attack families, showing substantial overlap in activation space despite textual differences. Latent overlap across jailbreak families. To examine the mismatch between textual diversity and latent coverage, we curate 32 representative jailbreak papers spanning a twoyear period. This yields a set of 1,494 jailbreak prompt… view at source ↗

**Figure 3.** Figure 3: Vector norm at each layer for each principal direction extracted by AUTOSKILL. The vector norm gradually increases from the upper to the lower layers, suggesting that the lower layers take larger weights in the principal directions [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Correlation map between principal directions extracted from all layers and principal directions from top/middle/bottom layers. The brighter colors around the primary diagonal line are clearly visible, suggesting a high correlation between the same principal direction extracted from different layers. Principal directions from the top layers, which have the lowest vector norm, still show high correlation to … view at source ↗

read the original abstract

Skills are a natural unit for describing what a language model can do and how its behavior can be changed. However, existing characterizations rely on human-written taxonomies, textual descriptions, or manual profiling pipelines--all external hypotheses about what matters that need not align with the model's internal representations. We argue that when the goal is to intervene on model behavior, skill characterization should be *model-native*: grounded in the model's own representations rather than imposed through external ontologies. We instantiate this view by recovering a compact orthogonal basis from sequence-level activations. The resulting basis is semantically interpretable but need not correspond to any predefined human ontology; instead, it captures axes of behavioral variation that the model itself organizes around. We validate this characterization on reasoning post-training, using the recovered basis for both SFT data selection and inference-time steering. We develop lightweight proxy interventions to identify which directions are most useful for a given model. Across Llama3-8B and Qwen2.5-3B, selecting data along those directions improves Pass@1 by up to 20% on MATH and 41% on AMC, outperforming data selection based on human-characterized skills. Because the basis lives in activation space, the same directions also serve as steering vectors at inference time, improving Pass@8 by up to 4.8% on MATH--an intervention that human-characterized skills cannot support. We further validate the characterization on safety alignment, where selecting adversarial training data for model-native skill coverage rather than textual diversity yields more sample-efficient learning. These results suggest that recovering skills from the model's own representations, rather than imposing them externally, provides a more effective foundation for intervening on model behavior. Codes are open-sourced.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Activation-derived orthogonal bases give concrete gains for data selection and steering over human taxonomies on reasoning benchmarks, but the results need tighter controls on stability and significance.

read the letter

The paper's main point is that recovering a compact orthogonal basis from sequence-level activations provides a model-native way to characterize skills, which then improves both SFT data selection and inference-time steering compared to human-written taxonomies. Across Llama3-8B and Qwen2.5-3B, this yields up to 20% Pass@1 gains on MATH and 41% on AMC, plus a 4.8% Pass@8 boost from steering vectors. The same basis also helps with sample-efficient safety alignment by prioritizing coverage of those directions rather than textual diversity. Code is open-sourced, which is useful for checking the details. The dual-use aspect is genuinely new: the directions live in activation space, so they support interventions that human skill labels cannot directly provide. The evaluation stays non-circular by measuring against external benchmarks like MATH and AMC rather than internal reconstruction metrics. The argument for model-native characterizations is internally consistent and addresses a real practical need in post-training pipelines. The soft spots are in the empirical support. There are no reported statistical significance tests, and the methods section gives limited information on how basis extraction hyperparameters were chosen or ablated. The stress-test concern about dataset-specific artifacts has merit here: without explicit checks for stability across different corpora, layers, or pooling choices, it remains possible that the directions largely reflect the particular data used to build the basis rather than stable axes the model organizes around. The proxy interventions help identify useful directions but do not close that gap. This work is aimed at researchers doing post-training, alignment, or scalable intervention on model behavior who are open to moving past human taxonomies. Readers who value empirical comparisons on standard benchmarks and open code will get something out of it. It has enough substance and reproducibility elements to warrant serious referee time, though it will probably come back with requests for more ablations and robustness tests. I would send it for peer review.

Referee Report

3 major / 3 minor

Summary. The paper proposes recovering a compact orthogonal basis from sequence-level activations to characterize model-native skills, arguing this is preferable to human-imposed taxonomies. The basis is applied to select SFT data for reasoning post-training and as steering vectors at inference, with reported gains of up to 20% Pass@1 on MATH and 41% on AMC across Llama3-8B and Qwen2.5-3B (outperforming human skill baselines), plus up to 4.8% Pass@8 improvement via steering; a similar approach is shown for safety alignment data selection.

Significance. If the empirical claims hold under stronger controls, the work offers a concrete, representation-grounded alternative to external skill ontologies, with the dual utility for data selection and steering as a clear strength. Open-sourcing the code is a positive for reproducibility. The results could influence post-training and alignment practices by prioritizing internal model structure, though the significance depends on confirming the basis reflects genuine behavioral axes rather than extraction artifacts.

major comments (3)

[Abstract and §4] Abstract and §4 (empirical results): the central outperformance claims (up to 20% Pass@1 on MATH, 41% on AMC) are presented without statistical significance tests, variance across seeds, or error bars, which is load-bearing for the comparison to human-characterized baselines.
[§3] §3 (basis extraction): no ablation or sensitivity analysis is reported on key choices such as layer selection, sequence pooling method, or the number of retained directions, leaving open whether the gains depend on specific hyperparameters rather than intrinsic model properties.
[§4.2 and §5] §4.2 and §5 (proxy interventions and validation): the tests for direction utility do not include stability checks of the recovered basis across alternative datasets, layers, or extraction variants, so the results do not yet rule out that the directions are corpus-specific correlations rather than model-organized axes.

minor comments (3)

[Abstract] Abstract: the models (Llama3-8B, Qwen2.5-3B) and exact benchmark splits could be named earlier for immediate clarity.
[§3] Notation: the definition of the orthogonal basis would benefit from an explicit equation showing the extraction objective and orthogonality constraint.
[Figures 3-5] Figures: axis labels and legend text in the steering and data-selection plots are small; increasing font size would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the empirical rigor of our work. We address each major comment below and commit to revisions that incorporate additional statistical analyses and sensitivity checks.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (empirical results): the central outperformance claims (up to 20% Pass@1 on MATH, 41% on AMC) are presented without statistical significance tests, variance across seeds, or error bars, which is load-bearing for the comparison to human-characterized baselines.

Authors: We agree that the absence of statistical significance testing and variance reporting weakens the comparison to baselines. In the revised manuscript, we will rerun the key experiments across multiple random seeds (e.g., 5 seeds), report mean and standard deviation with error bars in the figures and tables of §4, and include p-values from appropriate statistical tests (such as paired t-tests) to confirm the significance of the outperformance over human-characterized skill baselines. The abstract will be updated to note these controls if space permits. revision: yes
Referee: [§3] §3 (basis extraction): no ablation or sensitivity analysis is reported on key choices such as layer selection, sequence pooling method, or the number of retained directions, leaving open whether the gains depend on specific hyperparameters rather than intrinsic model properties.

Authors: We acknowledge that additional ablations would better demonstrate robustness. While our layer and pooling choices were selected based on initial validation for semantic interpretability and computational efficiency, we will include a new subsection in §3 with sensitivity analyses. Specifically, we will vary the extraction layer (comparing middle vs. late layers), pooling strategies (mean pooling vs. last-token), and the number of retained orthogonal directions (e.g., 8, 16, 32), reporting the impact on downstream data selection performance to show that gains are not overly sensitive to these choices. revision: yes
Referee: [§4.2 and §5] §4.2 and §5 (proxy interventions and validation): the tests for direction utility do not include stability checks of the recovered basis across alternative datasets, layers, or extraction variants, so the results do not yet rule out that the directions are corpus-specific correlations rather than model-organized axes.

Authors: We agree that stability across variations is crucial to support the model-native claim. In the revision, we will add experiments in §4.2 and §5 that extract bases from alternative datasets (such as a held-out reasoning corpus and general web text), different layers, and minor variants of the orthogonalization procedure. We will quantify stability via metrics like cosine similarity between direction sets and re-evaluate the proxy interventions and downstream gains to confirm consistency, thereby addressing the concern that directions may be corpus-specific. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical gains measured on external benchmarks.

full rationale

The derivation extracts an orthogonal basis from sequence-level activations via standard dimensionality reduction, then applies the directions to data selection and steering. Reported improvements (up to 20% Pass@1 on MATH, 41% on AMC, 4.8% Pass@8 via steering) are quantified on independent test sets and compared against a separate human-characterized baseline. No equation or step reduces these gains to quantities defined by the basis construction itself; the proxy interventions for direction utility are lightweight and their success is evaluated externally. No self-citations, ansatzes, or uniqueness claims are load-bearing in a way that collapses the argument to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method assumes that a low-dimensional orthogonal basis extracted from activations meaningfully organizes behavioral variation and that this basis transfers to data selection and steering without additional fitting.

axioms (1)

domain assumption Sequence-level activations contain linearly independent directions that correspond to axes of behavioral variation organized by the model
Invoked when recovering the compact orthogonal basis from activations.

pith-pipeline@v0.9.0 · 5617 in / 1340 out tokens · 46601 ms · 2026-05-10T05:39:07.932943+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

104 extracted references · 65 canonical work pages · 22 internal anchors

[1]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
[2]

A theory for emergence of complex skills in language models

A theory for emergence of complex skills in language models , author=. arXiv preprint arXiv:2307.15936 , year=

work page arXiv
[3]

2025 , month = oct, day =

Zhang, Barry and Lazuka, Keith and Murag, Mahesh , title =. 2025 , month = oct, day =

2025
[4]

Evaltree: Pro- filing language model weaknesses via hierarchical capability trees.arXiv preprint arXiv:2503.08893,

Evaltree: Profiling language model weaknesses via hierarchical capability trees , author=. arXiv preprint arXiv:2503.08893 , year=

work page arXiv
[5]

arXiv preprint arXiv:2512.01775 , year=

How Does RL Post-training Induce Skill Composition? A Case Study on Countdown , author=. arXiv preprint arXiv:2512.01775 , year=

work page arXiv
[6]

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy, June 2025

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy , author=. arXiv preprint arXiv:2506.13284 , year=

work page arXiv
[7]

Metacognitive reuse: Turning recurring llm reasoning into concise behaviors.ArXiv preprint arXiv:2509.13237,

Metacognitive reuse: Turning recurring llm reasoning into concise behaviors , author=. arXiv preprint arXiv:2509.13237 , year=

work page arXiv
[8]

arXiv preprint arXiv:2503.17195 , year=

TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning , author=. arXiv preprint arXiv:2503.17195 , year=

work page arXiv
[9]

Llama-nemotron: Efficient reasoning models

Llama-nemotron: Efficient reasoning models , author=. arXiv preprint arXiv:2505.00949 , year=

work page arXiv
[10]

arXiv preprint arXiv:2510.10023 , year=

Skill-Targeted Adaptive Training , author=. arXiv preprint arXiv:2510.10023 , year=

work page arXiv
[11]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Advances in Neural Information Processing Systems , volume=

Can models learn skill composition from examples? , author=. Advances in Neural Information Processing Systems , volume=
[13]

The Linear Representation Hypothesis and the Geometry of Large Language Models

The linear representation hypothesis and the geometry of large language models , author=. arXiv preprint arXiv:2311.03658 , year=

work page internal anchor Pith review arXiv
[14]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Steering llama 2 via contrastive activation addition , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[15]

Steering Language Models With Activation Engineering

Steering language models with activation engineering , author=. arXiv preprint arXiv:2308.10248 , year=

work page internal anchor Pith review arXiv
[16]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Tulu 3: Pushing frontiers in open language model post-training , author=. arXiv preprint arXiv:2411.15124 , year=

work page internal anchor Pith review arXiv
[17]

doi:10.48550/arXiv.2410.05295 , abstract =

Autodan-turbo: A lifelong agent for strategy self-exploration to jailbreak llms , author=. arXiv preprint arXiv:2410.05295 , year=

work page arXiv
[18]

Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, King Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin, Zhenlin Wei, et al

h4rm3l: A language for composable jailbreak attack synthesis , author=. arXiv preprint arXiv:2408.04811 , year=

work page arXiv
[19]

arXiv preprint arXiv:2505.20259 , year=

Lifelong Safety Alignment for Language Models , author=. arXiv preprint arXiv:2505.20259 , year=

work page arXiv
[20]

arXiv preprint arXiv:2510.21910 , year=

Adversarial D 'ej a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks , author=. arXiv preprint arXiv:2510.21910 , year=

work page arXiv
[21]

Advances in Neural Information Processing Systems , volume=

Refusal in language models is mediated by a single direction , author=. Advances in Neural Information Processing Systems , volume=
[22]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

SCANS: Mitigating the exaggerated safety for llms via safety-conscious activation steering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[23]

Just enough shifts: Mitigating over-refusal in aligned language models with targeted representation fine-tuning

Just enough shifts: Mitigating over-refusal in aligned language models with targeted representation fine-tuning , author=. arXiv preprint arXiv:2507.04250 , year=

work page arXiv
[24]

Advances in Neural Information Processing Systems , volume=

A strongreject for empty jailbreaks , author=. Advances in Neural Information Processing Systems , volume=
[25]

2024 , month = apr, howpublished =

Introducing GPT-4.1 , author =. 2024 , month = apr, howpublished =

2024
[26]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000
[27]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980
[28]

M. J. Kearns , title =
[29]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983
[30]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000
[31]

Suppressed for Anonymity , author=
[32]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981
[33]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959
[34]

11 Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Hee- woo Jun, Tom B

Language models scale reliably with over-training and on downstream tasks , author=. arXiv preprint arXiv:2403.08540 , year=

work page arXiv
[35]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning , author=. arXiv preprint arXiv:2506.01939 , year=

work page internal anchor Pith review arXiv
[36]

Advances in Neural Information Processing Systems , volume=

Paloma: A benchmark for evaluating language model fit , author=. Advances in Neural Information Processing Systems , volume=
[37]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
[38]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
[39]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016
[40]

Llama-Nemotron: Efficient Reasoning Models , author=
[41]

AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning.arXiv preprint arXiv:2505.16400,

Acereason-nemotron: Advancing math and code reasoning through reinforcement learning , author=. arXiv preprint arXiv:2505.16400 , year=

work page arXiv
[42]

2025 , journal=

DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL , author=. 2025 , journal=

2025
[43]

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Llamafactory: Unified efficient fine-tuning of 100+ language models , author=. arXiv preprint arXiv:2403.13372 , year=

work page internal anchor Pith review arXiv
[44]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024
[45]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
[46]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top-down approach to ai transparency , author=. arXiv preprint arXiv:2310.01405 , year=

work page internal anchor Pith review arXiv
[47]

arXiv preprint arXiv:2510.01624 , year=

Quagmires in sft-rl post-training: When high sft scores mislead and what to use instead , author=. arXiv preprint arXiv:2510.01624 , year=

work page arXiv
[48]

Subliminal learning: Language models transmit behavioral traits via hidden signals in data.arXiv preprint arXiv:2507.14805,

Subliminal learning: Language models transmit behavioral traits via hidden signals in data , author=. arXiv preprint arXiv:2507.14805 , year=

work page arXiv
[49]

Math-Verify: Math Verification Library , author=
[50]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review arXiv
[51]

2023 , publisher =

Hemish Veeraboina , title =. 2023 , publisher =

2023
[52]

Training Verifiers to Solve Math Word Problems

Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[53]

2025 , publisher =

Mathematical Association of America , title =. 2025 , publisher =

2025
[54]

2025 , url =

American Mathematics Competitions , title =. 2025 , url =

2025
[55]

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems , author=. arXiv preprint arXiv:2402.14008 , year=

work page internal anchor Pith review arXiv
[56]

Advances in neural information processing systems , volume=

Solving quantitative reasoning problems with language models , author=. Advances in neural information processing systems , volume=
[57]

DeepSeek-V3 Technical Report

Deepseek-v3 technical report , author=. arXiv preprint arXiv:2412.19437 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[60]

QwQ-32B: Embracing the Power of Reinforcement Learning , url =

Qwen Team , month =. QwQ-32B: Embracing the Power of Reinforcement Learning , url =
[61]

OpenAI GPT-5 System Card

Openai gpt-5 system card , author=. arXiv preprint arXiv:2601.03267 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[62]

OpenAI o1 System Card

Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[63]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review arXiv
[64]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[65]

https://ai

The llama 4 herd: The beginning of a new era of natively multimodal ai innovation , author=. https://ai. meta. com/blog/llama-4-multimodal-intelligence/, checked on , volume=
[66]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[67]

https://mistral.ai/news/mistral-nemo, accessed on 2025-09-25 , year=

Mistral NeMo , author=. https://mistral.ai/news/mistral-nemo, accessed on 2025-09-25 , year=

2025
[68]

arXiv preprint arXiv:2502.13124 , year=

Naturalreasoning: Reasoning in the wild with 2.8 m challenging questions , author=. arXiv preprint arXiv:2502.13124 , year=

work page arXiv
[69]

s1: Simple test-time scaling

s1: Simple test-time scaling , author=. arXiv preprint arXiv:2501.19393 , year=

work page Pith review arXiv
[70]

arXiv preprint arXiv:2502.04194 , year=

The best instruction-tuning data are those that fit , author=. arXiv preprint arXiv:2502.04194 , year=

work page arXiv
[71]

arXiv preprint arXiv:2501.18578 , year=

Rip: Better models by survival of the fittest prompts , author=. arXiv preprint arXiv:2501.18578 , year=

work page arXiv
[72]

arXiv preprint arXiv:2502.03387 , year=

Limo: Less is more for reasoning , author=. arXiv preprint arXiv:2502.03387 , year=

work page arXiv
[73]

Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond

Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond , author=. arXiv preprint arXiv:2503.10460 , year=

work page arXiv
[74]

Learning to reason without external rewards

Learning to reason without external rewards , author=. arXiv preprint arXiv:2505.19590 , year=

work page arXiv
[75]

Spurious rewards: Rethinking training signals in rlvr.arXiv preprint arXiv:2506.10947,

Spurious rewards: Rethinking training signals in rlvr , author=. arXiv preprint arXiv:2506.10947 , year=

work page arXiv
[76]

Sft or rl? an early investigation into training r1-like reasoning large vision-language models

Sft or rl? an early investigation into training r1-like reasoning large vision-language models , author=. arXiv preprint arXiv:2504.11468 , year=

work page arXiv
[77]

TTRL: Test-time reinforcement learning.arXiv preprint arXiv:2504.16084, 2025

Ttrl: Test-time reinforcement learning , author=. arXiv preprint arXiv:2504.16084 , year=

work page arXiv
[78]

Octothinker: Mid-training incentivizes reinforcement learning scaling.arXiv preprint arXiv:2506.20512, 2025

Octothinker: Mid-training incentivizes reinforcement learning scaling , author=. arXiv preprint arXiv:2506.20512 , year=

work page arXiv
[79]

Does math reasoning improve general llm capabilities? understanding transferability of llm reasoning, 2025

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning , author=. arXiv preprint arXiv:2507.00432 , year=

work page arXiv
[80]

Behavior injection: Preparing language models for reinforcement learning

Behavior Injection: Preparing Language Models for Reinforcement Learning , author=. arXiv preprint arXiv:2505.18917 , year=

work page arXiv

Showing first 80 references.