Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

Byron C. Wallace; Debjyoti Saha Roy; Javed A. Aslam

arxiv: 2606.06840 · v1 · pith:VYPOSO4Anew · submitted 2026-06-05 · 💻 cs.CL · cs.AI· cs.LG

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

Debjyoti Saha Roy , Byron C. Wallace , Javed A. Aslam This is my paper

Pith reviewed 2026-06-27 22:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords mechanistic reasoningdistillationlarge output spacesshortlistingmulti-label classificationzero-shot performance

0 comments

The pith

Reasoning in large output spaces proceeds via broad shortlisting followed by fine-grained evaluation over the narrowed set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern models achieve strong zero-shot results on multi-label tasks with hundreds of thousands or millions of candidate labels. The paper characterizes this as a two-phase process in which an initial broad shortlisting narrows the options and a subsequent fine-grained phase evaluates the shortlist. Evidence across datasets shows the phases can be isolated and are complementary. This separation is used to create a distillation procedure that outperforms standard distillation by addressing each phase distinctly.

Core claim

Reasoning is a two-phase process of broad shortlisting of candidates followed by fine-grained reasoning over the resulting set. These steps can be isolated and are complementary. This characterization supports a mechanistic distillation strategy that consistently outperforms standard distillation.

What carries the argument

The two-phase reasoning process of broad candidate shortlisting followed by fine-grained reasoning over the shortlist.

If this is right

Shortlisting reduces the candidate pool from millions to a tractable size before detailed evaluation occurs.
The two phases are complementary, so both must function for strong overall performance.
Isolating the phases permits targeted knowledge transfer during distillation.
The resulting distillation method improves performance consistently across multiple datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two-phase separation may apply to other prediction tasks with very large label spaces.
Models could potentially be trained or prompted to make the shortlisting step more explicit.
Disrupting one phase independently might produce predictable drops in accuracy.

Load-bearing premise

The shortlisting and fine-grained reasoning phases can be reliably isolated from each other in a manner that directly yields a superior distillation procedure.

What would settle it

A demonstration that the proposed mechanistic distillation does not outperform standard distillation on the tested datasets would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.06840 by Byron C. Wallace, Debjyoti Saha Roy, Javed A. Aslam.

**Figure 1.** Figure 1: LLaMA3-70B exhibits early Focus buildup & late Confusion reduction over CoT. Mechanistic distillation recovers teacher trajectory while representative CoT distillations [He et al., 2025] fail. In very large-scale multi-label settings, a small set of relevant labels must be selected from hundreds of thousands to millions of candidates [Zhou et al., 2024, Zhang et al., 2025, Ortego et al., 2025]. Such setti… view at source ↗

**Figure 2.** Figure 2: LLM Reasoning. (Left) Early CoT progressively builds focus on the broad categories. (Right) Late CoT progressively rules out near-miss categories until only the true signals remain. [Modarressi et al., 2025, Wang, 2025, Kuratov et al., 2024, Marjanovic et al. ´ , 2026], by leveraging structured mechanisms such as long Chain-of-Thought (CoT) [Chen et al., 2025a, Yeo et al., 2025], tree-of-thoughts [Yao et a… view at source ↗

**Figure 3.** Figure 3: Phase 1—While generating early CoT, we measure whether attention heads attend to salient [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: (Left) Top heads by CoarseScore in LLaMA3-70B model. Early-layer heads (L1–6) exhibiting anchor-focused attention and aligned residual updates during early CoT. (Right) Attention from a top early-layer head (L3H22), sharply focusing on key clinical anchors (red: cardiac phrases like “heart failure” and “EF 35%”; orange: respiratory/renal like “pulmonary edema” & “pneumonia”). Top 3 Top 8 Top 16 Top 40 Top… view at source ↗

**Figure 5.** Figure 5: Early-layer attention heads causally control coarse filtering. (Left) Denoising patching progressively larger bins of top heads ranked by 𝖢𝗈𝖺𝗋𝗌𝖾𝖲𝖼𝗈𝗋𝖾 substantially restores reasoning focus toward semantic anchors. (Right) Noising these heads significantly degrades focus. MI” (cardiac anchors), as well as related respiratory and renal anchors like “pulmonary edema” and “pneumonia”. This indicates early-laye… view at source ↗

**Figure 6.** Figure 6: Phase 2—During later CoT, attention heads refine predictions by suppressing near-misses, [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Later-layer heads showing iterative refinement. (Left) Mid-to-late attention heads ranked by 𝑄𝐾 preference for own prior shortlist keys over near-miss keys + 𝑂𝑉 updates widening margins. (Right) Over CoT, top refinement heads sharpen 𝑄𝐾 preference toward shortlist representations, downweight near-misses, and strengthen 𝑂𝑉 contributions for iterative margin widening. Top 3 Top 8 Top 16 Top 40 Top 100 Cumula… view at source ↗

**Figure 8.** Figure 8: Later-layer heads causally drive iterative refinement. (Left/Right) Denoising successively larger bins of top 𝖱𝖾𝖿𝗂𝗇𝖾𝖲𝖼𝗈𝗋𝖾-ranked heads sharply ↓↓ near-miss confusion & noising them ↑↑ it. High positive 𝖱𝖾𝖿𝗂𝗇𝖾𝖲𝖼𝗈𝗋𝖾 identifies heads that re-attend to prior shortlist representations while avoiding near-misses, and write updates that widen the target–near-miss margin. As illustrated in [PITH_FULL_IMAGE:figure… view at source ↗

**Figure 9.** Figure 9: Ablating phases & specific atten./write/interaction losses in Eqs. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Modern reasoning models offer surprisingly strong zero-shot performance on challenging multi-label tasks that require selecting a small set of relevant options from hundreds of thousands to millions of candidate labels. We investigate how they achieve this mechanistically. We characterize reasoning as a two-phase process: A broad "shortlisting" of candidates followed by fine-grained reasoning over the resulting set. We provide evidence across a range of datasets that these steps can be isolated and are complementary. Using this characterization, we develop a mechanistic distillation strategy that consistently outperforms standard distillation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames large-output reasoning as shortlisting plus fine-grained steps, isolates those phases, and builds a distillation method that beats standard baselines across datasets.

read the letter

The main takeaway is that this work treats reasoning over huge label sets as two separable phases—broad shortlisting followed by detailed comparison—and shows that distilling along those lines gives consistent gains over ordinary distillation.

What is new is the explicit split and its use as a guide for the distillation procedure. Prior distillation papers usually match logits or hidden states without breaking the process into these mechanistic stages. The authors report that the phases can be isolated and that they complement each other on multiple datasets.

The paper does a reasonable job demonstrating the practical payoff: the resulting strategy outperforms the usual approach across the tested cases. That empirical consistency is the strongest part of the contribution.

The soft spots sit in the isolation step itself. The abstract does not detail how the shortlisting phase is extracted or whether the separation is stable across model sizes and prompting styles. If the gains depend on particular implementation choices rather than the two-phase structure, the mechanistic story weakens. Reviewers will also want to see ablations that tie the performance lift directly to the characterization rather than ancillary factors.

This is for researchers working on extreme multi-label tasks or model compression in NLP. Anyone who needs to shrink models while keeping accuracy on very large output spaces could get something usable from it.

The claims are concrete enough and the empirical angle is clear enough that the paper should go to peer review rather than a desk reject.

Referee Report

2 major / 0 minor

Summary. The paper claims that modern reasoning models achieve strong zero-shot performance on multi-label tasks with very large candidate sets via a two-phase process of broad shortlisting followed by fine-grained reasoning over the shortlist. It asserts that these phases can be isolated and shown to be complementary across datasets, and that a distillation procedure derived from this characterization consistently outperforms standard distillation.

Significance. If the two-phase characterization is valid and the isolation procedure is shown to be causal rather than post-hoc, the work could provide a useful mechanistic lens on how transformers manage large output spaces and yield practically better distillation recipes for such tasks. The absence of any equations, algorithms, datasets, or quantitative results in the manuscript prevents assessment of whether these benefits are realized or attributable to the proposed framing.

major comments (2)

[Abstract] Abstract: the manuscript asserts that 'evidence across a range of datasets' exists for isolability and complementarity and that the resulting distillation 'consistently outperforms standard distillation,' yet supplies no methods section, no experimental protocol, no tables of results, and no description of baselines or controls. This renders the central empirical claims unverifiable from the text.
[Abstract] Abstract: the claim that the phases 'can be isolated' is load-bearing for the distillation contribution, but no procedure, loss function, or intervention for performing the isolation is described, making it impossible to determine whether the reported gains are due to the mechanistic insight or to ancillary implementation choices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We agree that the current manuscript text consists only of a high-level abstract and lacks the detailed methods, protocols, results, and isolation procedure needed to substantiate the claims. We will revise the manuscript accordingly to address these issues.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts that 'evidence across a range of datasets' exists for isolability and complementarity and that the resulting distillation 'consistently outperforms standard distillation,' yet supplies no methods section, no experimental protocol, no tables of results, and no description of baselines or controls. This renders the central empirical claims unverifiable from the text.

Authors: We accept this observation. The provided manuscript is limited to the abstract summarizing the claims without supporting details. In the revised version we will add a methods section, full experimental protocol, descriptions of datasets and baselines, and tables of quantitative results to make the claims verifiable. revision: yes
Referee: [Abstract] Abstract: the claim that the phases 'can be isolated' is load-bearing for the distillation contribution, but no procedure, loss function, or intervention for performing the isolation is described, making it impossible to determine whether the reported gains are due to the mechanistic insight or to ancillary implementation choices.

Authors: We agree that the isolation procedure must be explicitly described for the contribution to be assessable. The revised manuscript will include a detailed account of the isolation method, including any loss functions or interventions used, to clarify its role in the distillation strategy. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description characterize reasoning as a two-phase shortlisting plus fine-grained process, claim empirical isolation across datasets, and derive a distillation strategy from that characterization. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are supplied that would allow any load-bearing step to reduce to its own inputs by construction. The derivation therefore remains self-contained and relies on external empirical evidence rather than internal redefinition or self-referential fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.1-grok · 5616 in / 911 out tokens · 23439 ms · 2026-06-27T22:21:04.752400+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

167 extracted references · 128 canonical work pages · 33 internal anchors

[1]

RIP , author=

Kurtosis as peakedness, 1905--2014. RIP , author=. The American Statistician , volume=. 2014 , publisher=

1905
[2]

2025 , eprint =

Eliciting Latent Predictions from Transformers with the Tuned Lens , author =. 2025 , eprint =

2025
[3]

2024 , journal =

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet , author =. 2024 , journal =

2024
[4]

arXiv preprint arXiv:2509.25002 , year=

Circuit Distillation , author=. arXiv preprint arXiv:2509.25002 , year=

arXiv
[5]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv
[6]

arXiv e-prints , pages=

Test-time computing: from system-1 thinking to system-2 thinking , author=. arXiv e-prints , pages=. 2025 , url=

2025
[7]

Introducing OpenAI o3 and o4-mini , year =
[8]

Introducing GPT-5.2 , year =
[9]

2023 , howpublished =

Kamradt, Greg , title =. 2023 , howpublished =

2023
[10]

and Dahiya, K

Bhatia, K. and Dahiya, K. and Jain, H. and Kar, P. and Mittal, A. and Prabhu, Y. and Varma, M. , title =
[11]

Scientific data , volume=

MIMIC-IV, a freely accessible electronic health record dataset , author=. Scientific data , volume=. 2023 , publisher=

2023
[12]

Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
[13]

Journal of Machine learning research , volume=

Statistical comparisons of classifiers over multiple data sets , author=. Journal of Machine learning research , volume=
[14]

Investigating Mysteries of C o T -Augmented Distillation

Wadhwa, Somin and Amir, Silvio and Wallace, Byron C. Investigating Mysteries of C o T -Augmented Distillation. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.349

work page doi:10.18653/v1/2024.emnlp-main.349 2024
[15]

2025 , url =

WHO , title =. 2025 , url =

2025
[16]

Advances in Neural Information Processing Systems , volume=

Towards semi-structured automatic ICD coding via tree-based contrastive learning , author=. Advances in Neural Information Processing Systems , volume=
[17]

arXiv preprint arXiv:2509.20317 , year=

SIM-CoT: Supervised Implicit Chain-of-Thought , author=. arXiv preprint arXiv:2509.20317 , year=

arXiv
[18]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Softcot: Soft chain-of-thought for efficient reasoning with llms , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[19]

arXiv preprint arXiv:2311.01460 , year=

Implicit chain of thought reasoning via knowledge distillation , author=. arXiv preprint arXiv:2311.01460 , year=

arXiv
[20]

arXiv e-prints , pages=

Cothink: Token-efficient reasoning via instruct models guiding reasoning models , author=. arXiv e-prints , pages=
[21]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Codi: Compressing chain-of-thought into continuous space via self-distillation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[22]

arXiv preprint arXiv:2412.13171 , year=

Compressed chain of thought: Efficient reasoning through dense representations , author=. arXiv preprint arXiv:2412.13171 , year=

Pith/arXiv arXiv
[23]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

C3ot: Generating shorter chain-of-thought without compromising effectiveness , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[24]

arXiv preprint arXiv:2405.16064 , year=

Keypoint-based progressive chain-of-thought distillation for llms , author=. arXiv preprint arXiv:2405.16064 , year=

arXiv
[25]

Advances in Neural Information Processing Systems , volume=

Iteration head: A mechanistic study of chain-of-thought , author=. Advances in Neural Information Processing Systems , volume=
[27]

doi:10.48550/ARXIV.2510.24940 , abstract =

He, Yinhan and Zheng, Wendy and Zhu, Yaochen and Zheng, Zaiyi and Su, Lin and Vasudevan, Sriram and Guo, Qi and Hong, Liangjie and Li, Jundong , year =. doi:10.48550/ARXIV.2510.24940 , abstract =

work page doi:10.48550/arxiv.2510.24940
[28]

Chen, Xiaoshu and Zhou, Sihang and Liang, Ke and Sun, Xiaoyu and Liu, Xinwang , editor =. Skip-. Proceedings of the 2025. 2025 , pages =. doi:10.18653/v1/2025.emnlp-main.610 , abstract =

work page doi:10.18653/v1/2025.emnlp-main.610 2025
[29]

Yan, JianZhi and Liu, Le and Pan, Youcheng and Chen, Shiwei and Xiang, Yang and Tang, Buzhou , editor =. Towards. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-emnlp.413 , abstract =

work page doi:10.18653/v1/2025.findings-emnlp.413 2025
[30]

Zhuang, Xianwei and Zhu, Zhihong and Wang, Zhichang and Cheng, Xuxin and Zou, Yuexian , year =
[31]

arXiv.org , author =

Probing to. arXiv.org , author =
[32]

and Aslam, Javed A

Roy, Debjyoti Saha and Wallace, Byron C. and Aslam, Javed A. , month = dec, year =. Don't. doi:10.48550/arXiv.2410.23066 , abstract =

work page doi:10.48550/arxiv.2410.23066
[33]

Distilling the Knowledge in a Neural Network

Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff , month = mar, year =. Distilling the. doi:10.48550/arXiv.1503.02531 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.02531
[34]

Kim, Jaehoon and Seo, Kwangwook and Lee, Dongha , month = sep, year =. In. doi:10.48550/arXiv.2509.22230 , abstract =

work page doi:10.48550/arxiv.2509.22230
[35]

Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

Bhambri, Siddhant and Biswas, Upasana and Kambhampati, Subbarao , month = may, year =. Interpretable. doi:10.48550/arXiv.2505.13792 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.13792
[36]

Ramesh, Suhas Kamasetty and Sengupta, Ayan and Chakraborty, Tanmoy , month = aug, year =. On the. doi:10.48550/arXiv.2505.15442 , abstract =

work page doi:10.48550/arxiv.2505.15442
[37]

Knowledge

Fang, Luyang and Yu, Xiaowei and Cai, Jiazhang and Chen, Yongkai and Wu, Shushan and Liu, Zhengliang and Yang, Zhenyuan and Lu, Haoran and Gong, Xilin and Liu, Yufang and Ma, Terry and Ruan, Wei and Abbasi, Ali and Zhang, Jing and Wang, Tao and Latif, Ehsan and You, Weihang and Jiang, Hanqi and Liu, Wei and Zhang, Wei and Kolouri, Soheil and Zhai, Xiaomin...

work page doi:10.48550/arxiv.2504.14772
[38]

Belinkov, Yonatan , month = mar, year =. Probing. Computational Linguistics , publisher =. doi:10.1162/coli_a_00422 , abstract =

work page internal anchor Pith review doi:10.1162/coli_a_00422
[39]

Mixture of

Fu, Tianyu and Huang, Haofeng and Ning, Xuefei and Zhang, Genghan and Chen, Boju and Wu, Tianqi and Wang, Hongyi and Huang, Zixiao and Li, Shiyao and Yan, Shengen and Dai, Guohao and Yang, Huazhong and Wang, Yu , month = nov, year =. Mixture of. doi:10.48550/arXiv.2406.14909 , abstract =

work page doi:10.48550/arxiv.2406.14909
[40]

doi:10.48550/arXiv.2407.15891 , abstract =

Tang, Hanlin and Lin, Yang and Lin, Jing and Han, Qingsen and Hong, Shikuan and Yao, Yiwu and Wang, Gongyi , month = jul, year =. doi:10.48550/arXiv.2407.15891 , abstract =

work page doi:10.48550/arxiv.2407.15891
[41]

Attention

Zheng, Zifan and Wang, Yezhaohui and Huang, Yuxin and Song, Shichao and Yang, Mingchuan and Tang, Bo and Xiong, Feiyu and Li, Zhiyu , month = dec, year =. Attention. doi:10.48550/arXiv.2409.03752 , abstract =

work page doi:10.48550/arxiv.2409.03752
[42]

Retrieval

Wu, Wenhao and Wang, Yizhong and Xiao, Guangxuan and Peng, Hao and Fu, Yao , month = apr, year =. Retrieval. doi:10.48550/arXiv.2404.15574 , abstract =

work page doi:10.48550/arxiv.2404.15574
[43]

arXiv.org , author =

A. arXiv.org , author =
[44]

arXiv.org , author =

Eliciting. arXiv.org , author =
[45]

2024 , booktitle =

Syed, Aaquib and Rager, Can and Conmy, Arthur , editor =. Attribution. Proceedings of the 7th. 2024 , pages =. doi:10.18653/v1/2024.blackboxnlp-1.25 , abstract =

work page doi:10.18653/v1/2024.blackboxnlp-1.25 2024
[46]

arXiv.org , author =
[47]

Iteration

Cabannes, Vivien and Arnal, Charles and Bouaziz, Wassim and Yang, Alice and Charton, Francois and Kempe, Julia , month = oct, year =. Iteration. doi:10.48550/arXiv.2406.02128 , abstract =

work page doi:10.48550/arxiv.2406.02128
[48]

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Guan, Xinyu and Zhang, Li Lyna and Liu, Yifei and Shang, Ning and Sun, Youran and Zhu, Yi and Yang, Fan and Yang, Mao , month = jan, year =. doi:10.48550/arXiv.2501.04519 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.04519
[49]

Lieberum, Tom and Rajamanoharan, Senthooran and Conmy, Arthur and Smith, Lewis and Sonnerat, Nicolas and Varma, Vikrant and Kramár, János and Dragan, Anca and Shah, Rohin and Nanda, Neel , month = aug, year =. Gemma. doi:10.48550/arXiv.2408.05147 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.05147
[50]

Shu, Dong and Wu, Xuansheng and Zhao, Haiyan and Rai, Daking and Yao, Ziyu and Liu, Ninghao and Du, Mengnan , month = sep, year =. A. doi:10.48550/arXiv.2503.05613 , abstract =

work page doi:10.48550/arxiv.2503.05613
[51]

and Tutubalina, Elena and Oseledets, Ivan , month = aug, year =

Galichin, Andrey and Dontsov, Alexey and Druzhinina, Polina and Razzhigaev, Anton and Rogov, Oleg Y. and Tutubalina, Elena and Oseledets, Ivan , month = aug, year =. I. doi:10.48550/arXiv.2503.18878 , abstract =

work page doi:10.48550/arxiv.2503.18878
[52]

Distill , author =

Multimodal. Distill , author =. 2021 , pages =. doi:10.23915/distill.00030 , number =

work page doi:10.23915/distill.00030 2021
[53]

Distill , author =

High/. Distill , author =. 2021 , pages =. doi:10.23915/distill.00024.005 , number =

work page doi:10.23915/distill.00024.005 2021
[54]

Distill , author =

Feature. Distill , author =. 2017 , pages =. doi:10.23915/distill.00007 , number =

work page doi:10.23915/distill.00007 2017
[55]

Zoom in: An introduction to circuits

Zoom. Distill , author =. 2020 , pages =. doi:10.23915/distill.00024.001 , number =

work page doi:10.23915/distill.00024.001 2020
[56]

Scaling and evaluating sparse autoencoders

Gao, Leo and Tour, Tom Dupré la and Tillman, Henk and Goh, Gabriel and Troll, Rajan and Radford, Alec and Sutskever, Ilya and Leike, Jan and Wu, Jeffrey , month = jun, year =. Scaling and evaluating sparse autoencoders , url =. doi:10.48550/arXiv.2406.04093 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.04093
[57]

Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , month = oct, year =. Sparse. doi:10.48550/arXiv.2309.08600 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.08600
[58]

Zhang, Fred and Nanda, Neel , month = jan, year =. Towards. doi:10.48550/arXiv.2309.16042 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.16042
[59]

How to use and interpret activation patching

Heimersheim, Stefan and Nanda, Neel , month = apr, year =. How to use and interpret activation patching , url =. doi:10.48550/arXiv.2404.15255 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.15255
[60]

Nanda, Neel , month = jul, year =. An
[61]

Stolfo, Alessandro and Belinkov, Yonatan and Sachan, Mrinmaya , month = oct, year =. A. doi:10.48550/arXiv.2305.15054 , abstract =

work page doi:10.48550/arxiv.2305.15054
[62]

Hierarchical Reasoning Model

Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi , month = aug, year =. Hierarchical. doi:10.48550/arXiv.2506.21734 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.21734
[63]

Ren, Zirui and Liu, Ziming , month = jan, year =. Are. doi:10.48550/arXiv.2601.10679 , abstract =

work page doi:10.48550/arxiv.2601.10679
[64]

Recursive Language Models

Zhang, Alex L. and Kraska, Tim and Khattab, Omar , month = jan, year =. Recursive. doi:10.48550/arXiv.2512.24601 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.24601
[65]

and Rossi, Ryan A

Basu, Samyadeep and Morariu, Vlad I. and Rossi, Ryan A. and Zhao, Nanxuan and Wang, Zichao and Feizi, Soheil and Manjunatha, Varun , month = aug, year =. On
[66]

Du, Hongzhe and Li, Weikai and Cai, Min and Saraipour, Karim and Zhang, Zimin and Lakkaraju, Himabindu and Sun, Yizhou and Zhang, Shichang , month = nov, year =. How. doi:10.48550/arXiv.2504.02904 , abstract =

work page doi:10.48550/arxiv.2504.02904
[67]

Hanna, Michael and Pezzelle, Sandro and Belinkov, Yonatan , month = jul, year =. Have. doi:10.48550/arXiv.2403.17806 , abstract =

work page doi:10.48550/arxiv.2403.17806
[68]

Yan, Jianzhi and Liu, Le and Pan, Youcheng and Chen, Shiwei and Xiang, Yang and Tang, Buzhou , month = sep, year =. Towards. doi:10.48550/arXiv.2509.23574 , abstract =

work page doi:10.48550/arxiv.2509.23574
[69]

Distilling the

Chen, Wei-Rui and Kothapalli, Vignesh and Fatahibaarzi, Ata and Sang, Hejian and Tang, Shao and Song, Qingquan and Wang, Zhipeng and Abdul-Mageed, Muhammad , month = jan, year =. Distilling the. doi:10.48550/arXiv.2512.21002 , abstract =

work page doi:10.48550/arxiv.2512.21002
[70]

, month = nov, year =

Tian, Yijun and Han, Yikun and Chen, Xiusi and Wang, Wei and Chawla, Nitesh V. , month = nov, year =. Beyond. doi:10.48550/arXiv.2402.04616 , abstract =

work page doi:10.48550/arxiv.2402.04616
[71]

Dai, Chengwei and Li, Kun and Zhou, Wei and Hu, Songlin , month = may, year =. Beyond. doi:10.48550/arXiv.2405.19737 , abstract =

work page doi:10.48550/arxiv.2405.19737
[72]

Hu, Yueqing and Peng, Xinyang and Peng, Shuting and Wang, Hanqi and Wang, Tianhong , month = jan, year =. Hán. doi:10.48550/arXiv.2601.05019 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.05019
[73]

Li, Chenglin and Chen, Qianglong and Li, Liangyue and Wang, Caiyu and Li, Yicheng and Chen, Zulong and Zhang, Yin , month = feb, year =. Mixed. doi:10.48550/arXiv.2312.10730 , abstract =

work page doi:10.48550/arxiv.2312.10730
[74]

doi:10.48550/arXiv.2310.14747 , abstract =

Chen, Hongzhan and Wu, Siyue and Quan, Xiaojun and Wang, Rui and Yan, Ming and Zhang, Ji , month = dec, year =. doi:10.48550/arXiv.2310.14747 , abstract =

work page doi:10.48550/arxiv.2310.14747
[75]

Chen, Qiguang and Du, Yantao and Li, Ziniu and Liu, Jinhao and Duan, Songyao and Guo, Jiarui and Liu, Minghao and Liu, Jiaheng and Yang, Tong and Zhang, Ge and Qin, Libo and Che, Wanxiang and Huang, Wenhao , month = jan, year =. The. doi:10.48550/arXiv.2601.06002 , abstract =

work page doi:10.48550/arxiv.2601.06002
[76]

and Zaharia, Matei and Gonzalez, Joseph E

Li, Dacheng and Cao, Shiyi and Griggs, Tyler and Liu, Shu and Mo, Xiangxi and Tang, Eric and Hegde, Sumanth and Hakhamaneshi, Kourosh and Patil, Shishir G. and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , month = feb, year =. doi:10.48550/arXiv.2502.07374 , abstract =

work page doi:10.48550/arxiv.2502.07374
[77]

Demystifying Long Chain-of-Thought Reasoning in LLMs

Yeo, Edward and Tong, Yuxuan and Niu, Morry and Neubig, Graham and Yue, Xiang , month = feb, year =. Demystifying. doi:10.48550/arXiv.2502.03373 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.03373
[78]

Keypoint-based

Feng, Kaituo and Li, Changsheng and Zhang, Xiaolu and Zhou, Jun and Yuan, Ye and Wang, Guoren , month = may, year =. Keypoint-based. doi:10.48550/arXiv.2405.16064 , abstract =

work page doi:10.48550/arxiv.2405.16064
[79]

Chen, Xiao and Zhou, Sihang and Liang, Ke and Sun, Xiaoyu and Liu, Xinwang , month = may, year =. Skip-. doi:10.48550/arXiv.2505.18642 , abstract =

work page doi:10.48550/arxiv.2505.18642
[80]

Michaud, Eric J. and Liao, Isaac and Lad, Vedang and Liu, Ziming and Mudide, Anish and Loughridge, Chloe and Guo, Zifan Carl and Kheirkhah, Tara Rezaei and Vukelić, Mateja and Tegmark, Max , month = feb, year =. Opening the. doi:10.48550/arXiv.2402.05110 , abstract =

work page doi:10.48550/arxiv.2402.05110
[81]

doi:10.48550/arXiv.2402.04678 , abstract =

Chuang, Yu-Neng and Wang, Guanchu and Chang, Chia-Yuan and Tang, Ruixiang and Zhong, Shaochen and Yang, Fan and Du, Mengnan and Cai, Xuanting and Braverman, Vladimir and Hu, Xia , month = oct, year =. doi:10.48550/arXiv.2402.04678 , abstract =

work page doi:10.48550/arxiv.2402.04678

Showing first 80 references.

[1] [1]

RIP , author=

Kurtosis as peakedness, 1905--2014. RIP , author=. The American Statistician , volume=. 2014 , publisher=

1905

[2] [2]

2025 , eprint =

Eliciting Latent Predictions from Transformers with the Tuned Lens , author =. 2025 , eprint =

2025

[3] [3]

2024 , journal =

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet , author =. 2024 , journal =

2024

[4] [4]

arXiv preprint arXiv:2509.25002 , year=

Circuit Distillation , author=. arXiv preprint arXiv:2509.25002 , year=

arXiv

[5] [5]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv

[6] [6]

arXiv e-prints , pages=

Test-time computing: from system-1 thinking to system-2 thinking , author=. arXiv e-prints , pages=. 2025 , url=

2025

[7] [7]

Introducing OpenAI o3 and o4-mini , year =

[8] [8]

Introducing GPT-5.2 , year =

[9] [9]

2023 , howpublished =

Kamradt, Greg , title =. 2023 , howpublished =

2023

[10] [10]

and Dahiya, K

Bhatia, K. and Dahiya, K. and Jain, H. and Kar, P. and Mittal, A. and Prabhu, Y. and Varma, M. , title =

[11] [11]

Scientific data , volume=

MIMIC-IV, a freely accessible electronic health record dataset , author=. Scientific data , volume=. 2023 , publisher=

2023

[12] [12]

Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=

[13] [13]

Journal of Machine learning research , volume=

Statistical comparisons of classifiers over multiple data sets , author=. Journal of Machine learning research , volume=

[14] [14]

Investigating Mysteries of C o T -Augmented Distillation

Wadhwa, Somin and Amir, Silvio and Wallace, Byron C. Investigating Mysteries of C o T -Augmented Distillation. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.349

work page doi:10.18653/v1/2024.emnlp-main.349 2024

[15] [15]

2025 , url =

WHO , title =. 2025 , url =

2025

[16] [16]

Advances in Neural Information Processing Systems , volume=

Towards semi-structured automatic ICD coding via tree-based contrastive learning , author=. Advances in Neural Information Processing Systems , volume=

[17] [17]

arXiv preprint arXiv:2509.20317 , year=

SIM-CoT: Supervised Implicit Chain-of-Thought , author=. arXiv preprint arXiv:2509.20317 , year=

arXiv

[18] [18]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Softcot: Soft chain-of-thought for efficient reasoning with llms , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[19] [19]

arXiv preprint arXiv:2311.01460 , year=

Implicit chain of thought reasoning via knowledge distillation , author=. arXiv preprint arXiv:2311.01460 , year=

arXiv

[20] [20]

arXiv e-prints , pages=

Cothink: Token-efficient reasoning via instruct models guiding reasoning models , author=. arXiv e-prints , pages=

[21] [21]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Codi: Compressing chain-of-thought into continuous space via self-distillation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[22] [22]

arXiv preprint arXiv:2412.13171 , year=

Compressed chain of thought: Efficient reasoning through dense representations , author=. arXiv preprint arXiv:2412.13171 , year=

Pith/arXiv arXiv

[23] [23]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

C3ot: Generating shorter chain-of-thought without compromising effectiveness , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[24] [24]

arXiv preprint arXiv:2405.16064 , year=

Keypoint-based progressive chain-of-thought distillation for llms , author=. arXiv preprint arXiv:2405.16064 , year=

arXiv

[25] [25]

Advances in Neural Information Processing Systems , volume=

Iteration head: A mechanistic study of chain-of-thought , author=. Advances in Neural Information Processing Systems , volume=

[26] [27]

doi:10.48550/ARXIV.2510.24940 , abstract =

He, Yinhan and Zheng, Wendy and Zhu, Yaochen and Zheng, Zaiyi and Su, Lin and Vasudevan, Sriram and Guo, Qi and Hong, Liangjie and Li, Jundong , year =. doi:10.48550/ARXIV.2510.24940 , abstract =

work page doi:10.48550/arxiv.2510.24940

[27] [28]

Chen, Xiaoshu and Zhou, Sihang and Liang, Ke and Sun, Xiaoyu and Liu, Xinwang , editor =. Skip-. Proceedings of the 2025. 2025 , pages =. doi:10.18653/v1/2025.emnlp-main.610 , abstract =

work page doi:10.18653/v1/2025.emnlp-main.610 2025

[28] [29]

Yan, JianZhi and Liu, Le and Pan, Youcheng and Chen, Shiwei and Xiang, Yang and Tang, Buzhou , editor =. Towards. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-emnlp.413 , abstract =

work page doi:10.18653/v1/2025.findings-emnlp.413 2025

[29] [30]

Zhuang, Xianwei and Zhu, Zhihong and Wang, Zhichang and Cheng, Xuxin and Zou, Yuexian , year =

[30] [31]

arXiv.org , author =

Probing to. arXiv.org , author =

[31] [32]

and Aslam, Javed A

Roy, Debjyoti Saha and Wallace, Byron C. and Aslam, Javed A. , month = dec, year =. Don't. doi:10.48550/arXiv.2410.23066 , abstract =

work page doi:10.48550/arxiv.2410.23066

[32] [33]

Distilling the Knowledge in a Neural Network

Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff , month = mar, year =. Distilling the. doi:10.48550/arXiv.1503.02531 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.02531

[33] [34]

Kim, Jaehoon and Seo, Kwangwook and Lee, Dongha , month = sep, year =. In. doi:10.48550/arXiv.2509.22230 , abstract =

work page doi:10.48550/arxiv.2509.22230

[34] [35]

Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

Bhambri, Siddhant and Biswas, Upasana and Kambhampati, Subbarao , month = may, year =. Interpretable. doi:10.48550/arXiv.2505.13792 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.13792

[35] [36]

Ramesh, Suhas Kamasetty and Sengupta, Ayan and Chakraborty, Tanmoy , month = aug, year =. On the. doi:10.48550/arXiv.2505.15442 , abstract =

work page doi:10.48550/arxiv.2505.15442

[36] [37]

Knowledge

Fang, Luyang and Yu, Xiaowei and Cai, Jiazhang and Chen, Yongkai and Wu, Shushan and Liu, Zhengliang and Yang, Zhenyuan and Lu, Haoran and Gong, Xilin and Liu, Yufang and Ma, Terry and Ruan, Wei and Abbasi, Ali and Zhang, Jing and Wang, Tao and Latif, Ehsan and You, Weihang and Jiang, Hanqi and Liu, Wei and Zhang, Wei and Kolouri, Soheil and Zhai, Xiaomin...

work page doi:10.48550/arxiv.2504.14772

[37] [38]

Belinkov, Yonatan , month = mar, year =. Probing. Computational Linguistics , publisher =. doi:10.1162/coli_a_00422 , abstract =

work page internal anchor Pith review doi:10.1162/coli_a_00422

[38] [39]

Mixture of

Fu, Tianyu and Huang, Haofeng and Ning, Xuefei and Zhang, Genghan and Chen, Boju and Wu, Tianqi and Wang, Hongyi and Huang, Zixiao and Li, Shiyao and Yan, Shengen and Dai, Guohao and Yang, Huazhong and Wang, Yu , month = nov, year =. Mixture of. doi:10.48550/arXiv.2406.14909 , abstract =

work page doi:10.48550/arxiv.2406.14909

[39] [40]

doi:10.48550/arXiv.2407.15891 , abstract =

Tang, Hanlin and Lin, Yang and Lin, Jing and Han, Qingsen and Hong, Shikuan and Yao, Yiwu and Wang, Gongyi , month = jul, year =. doi:10.48550/arXiv.2407.15891 , abstract =

work page doi:10.48550/arxiv.2407.15891

[40] [41]

Attention

Zheng, Zifan and Wang, Yezhaohui and Huang, Yuxin and Song, Shichao and Yang, Mingchuan and Tang, Bo and Xiong, Feiyu and Li, Zhiyu , month = dec, year =. Attention. doi:10.48550/arXiv.2409.03752 , abstract =

work page doi:10.48550/arxiv.2409.03752

[41] [42]

Retrieval

Wu, Wenhao and Wang, Yizhong and Xiao, Guangxuan and Peng, Hao and Fu, Yao , month = apr, year =. Retrieval. doi:10.48550/arXiv.2404.15574 , abstract =

work page doi:10.48550/arxiv.2404.15574

[42] [43]

arXiv.org , author =

A. arXiv.org , author =

[43] [44]

arXiv.org , author =

Eliciting. arXiv.org , author =

[44] [45]

2024 , booktitle =

Syed, Aaquib and Rager, Can and Conmy, Arthur , editor =. Attribution. Proceedings of the 7th. 2024 , pages =. doi:10.18653/v1/2024.blackboxnlp-1.25 , abstract =

work page doi:10.18653/v1/2024.blackboxnlp-1.25 2024

[45] [46]

arXiv.org , author =

[46] [47]

Iteration

Cabannes, Vivien and Arnal, Charles and Bouaziz, Wassim and Yang, Alice and Charton, Francois and Kempe, Julia , month = oct, year =. Iteration. doi:10.48550/arXiv.2406.02128 , abstract =

work page doi:10.48550/arxiv.2406.02128

[47] [48]

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Guan, Xinyu and Zhang, Li Lyna and Liu, Yifei and Shang, Ning and Sun, Youran and Zhu, Yi and Yang, Fan and Yang, Mao , month = jan, year =. doi:10.48550/arXiv.2501.04519 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.04519

[48] [49]

Lieberum, Tom and Rajamanoharan, Senthooran and Conmy, Arthur and Smith, Lewis and Sonnerat, Nicolas and Varma, Vikrant and Kramár, János and Dragan, Anca and Shah, Rohin and Nanda, Neel , month = aug, year =. Gemma. doi:10.48550/arXiv.2408.05147 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.05147

[49] [50]

Shu, Dong and Wu, Xuansheng and Zhao, Haiyan and Rai, Daking and Yao, Ziyu and Liu, Ninghao and Du, Mengnan , month = sep, year =. A. doi:10.48550/arXiv.2503.05613 , abstract =

work page doi:10.48550/arxiv.2503.05613

[50] [51]

and Tutubalina, Elena and Oseledets, Ivan , month = aug, year =

Galichin, Andrey and Dontsov, Alexey and Druzhinina, Polina and Razzhigaev, Anton and Rogov, Oleg Y. and Tutubalina, Elena and Oseledets, Ivan , month = aug, year =. I. doi:10.48550/arXiv.2503.18878 , abstract =

work page doi:10.48550/arxiv.2503.18878

[51] [52]

Distill , author =

Multimodal. Distill , author =. 2021 , pages =. doi:10.23915/distill.00030 , number =

work page doi:10.23915/distill.00030 2021

[52] [53]

Distill , author =

High/. Distill , author =. 2021 , pages =. doi:10.23915/distill.00024.005 , number =

work page doi:10.23915/distill.00024.005 2021

[53] [54]

Distill , author =

Feature. Distill , author =. 2017 , pages =. doi:10.23915/distill.00007 , number =

work page doi:10.23915/distill.00007 2017

[54] [55]

Zoom in: An introduction to circuits

Zoom. Distill , author =. 2020 , pages =. doi:10.23915/distill.00024.001 , number =

work page doi:10.23915/distill.00024.001 2020

[55] [56]

Scaling and evaluating sparse autoencoders

Gao, Leo and Tour, Tom Dupré la and Tillman, Henk and Goh, Gabriel and Troll, Rajan and Radford, Alec and Sutskever, Ilya and Leike, Jan and Wu, Jeffrey , month = jun, year =. Scaling and evaluating sparse autoencoders , url =. doi:10.48550/arXiv.2406.04093 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.04093

[56] [57]

Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , month = oct, year =. Sparse. doi:10.48550/arXiv.2309.08600 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.08600

[57] [58]

Zhang, Fred and Nanda, Neel , month = jan, year =. Towards. doi:10.48550/arXiv.2309.16042 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.16042

[58] [59]

How to use and interpret activation patching

Heimersheim, Stefan and Nanda, Neel , month = apr, year =. How to use and interpret activation patching , url =. doi:10.48550/arXiv.2404.15255 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.15255

[59] [60]

Nanda, Neel , month = jul, year =. An

[60] [61]

Stolfo, Alessandro and Belinkov, Yonatan and Sachan, Mrinmaya , month = oct, year =. A. doi:10.48550/arXiv.2305.15054 , abstract =

work page doi:10.48550/arxiv.2305.15054

[61] [62]

Hierarchical Reasoning Model

Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi , month = aug, year =. Hierarchical. doi:10.48550/arXiv.2506.21734 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.21734

[62] [63]

Ren, Zirui and Liu, Ziming , month = jan, year =. Are. doi:10.48550/arXiv.2601.10679 , abstract =

work page doi:10.48550/arxiv.2601.10679

[63] [64]

Recursive Language Models

Zhang, Alex L. and Kraska, Tim and Khattab, Omar , month = jan, year =. Recursive. doi:10.48550/arXiv.2512.24601 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.24601

[64] [65]

and Rossi, Ryan A

Basu, Samyadeep and Morariu, Vlad I. and Rossi, Ryan A. and Zhao, Nanxuan and Wang, Zichao and Feizi, Soheil and Manjunatha, Varun , month = aug, year =. On

[65] [66]

Du, Hongzhe and Li, Weikai and Cai, Min and Saraipour, Karim and Zhang, Zimin and Lakkaraju, Himabindu and Sun, Yizhou and Zhang, Shichang , month = nov, year =. How. doi:10.48550/arXiv.2504.02904 , abstract =

work page doi:10.48550/arxiv.2504.02904

[66] [67]

Hanna, Michael and Pezzelle, Sandro and Belinkov, Yonatan , month = jul, year =. Have. doi:10.48550/arXiv.2403.17806 , abstract =

work page doi:10.48550/arxiv.2403.17806

[67] [68]

Yan, Jianzhi and Liu, Le and Pan, Youcheng and Chen, Shiwei and Xiang, Yang and Tang, Buzhou , month = sep, year =. Towards. doi:10.48550/arXiv.2509.23574 , abstract =

work page doi:10.48550/arxiv.2509.23574

[68] [69]

Distilling the

Chen, Wei-Rui and Kothapalli, Vignesh and Fatahibaarzi, Ata and Sang, Hejian and Tang, Shao and Song, Qingquan and Wang, Zhipeng and Abdul-Mageed, Muhammad , month = jan, year =. Distilling the. doi:10.48550/arXiv.2512.21002 , abstract =

work page doi:10.48550/arxiv.2512.21002

[69] [70]

, month = nov, year =

Tian, Yijun and Han, Yikun and Chen, Xiusi and Wang, Wei and Chawla, Nitesh V. , month = nov, year =. Beyond. doi:10.48550/arXiv.2402.04616 , abstract =

work page doi:10.48550/arxiv.2402.04616

[70] [71]

Dai, Chengwei and Li, Kun and Zhou, Wei and Hu, Songlin , month = may, year =. Beyond. doi:10.48550/arXiv.2405.19737 , abstract =

work page doi:10.48550/arxiv.2405.19737

[71] [72]

Hu, Yueqing and Peng, Xinyang and Peng, Shuting and Wang, Hanqi and Wang, Tianhong , month = jan, year =. Hán. doi:10.48550/arXiv.2601.05019 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.05019

[72] [73]

Li, Chenglin and Chen, Qianglong and Li, Liangyue and Wang, Caiyu and Li, Yicheng and Chen, Zulong and Zhang, Yin , month = feb, year =. Mixed. doi:10.48550/arXiv.2312.10730 , abstract =

work page doi:10.48550/arxiv.2312.10730

[73] [74]

doi:10.48550/arXiv.2310.14747 , abstract =

Chen, Hongzhan and Wu, Siyue and Quan, Xiaojun and Wang, Rui and Yan, Ming and Zhang, Ji , month = dec, year =. doi:10.48550/arXiv.2310.14747 , abstract =

work page doi:10.48550/arxiv.2310.14747

[74] [75]

Chen, Qiguang and Du, Yantao and Li, Ziniu and Liu, Jinhao and Duan, Songyao and Guo, Jiarui and Liu, Minghao and Liu, Jiaheng and Yang, Tong and Zhang, Ge and Qin, Libo and Che, Wanxiang and Huang, Wenhao , month = jan, year =. The. doi:10.48550/arXiv.2601.06002 , abstract =

work page doi:10.48550/arxiv.2601.06002

[75] [76]

and Zaharia, Matei and Gonzalez, Joseph E

Li, Dacheng and Cao, Shiyi and Griggs, Tyler and Liu, Shu and Mo, Xiangxi and Tang, Eric and Hegde, Sumanth and Hakhamaneshi, Kourosh and Patil, Shishir G. and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , month = feb, year =. doi:10.48550/arXiv.2502.07374 , abstract =

work page doi:10.48550/arxiv.2502.07374

[76] [77]

Demystifying Long Chain-of-Thought Reasoning in LLMs

Yeo, Edward and Tong, Yuxuan and Niu, Morry and Neubig, Graham and Yue, Xiang , month = feb, year =. Demystifying. doi:10.48550/arXiv.2502.03373 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.03373

[77] [78]

Keypoint-based

Feng, Kaituo and Li, Changsheng and Zhang, Xiaolu and Zhou, Jun and Yuan, Ye and Wang, Guoren , month = may, year =. Keypoint-based. doi:10.48550/arXiv.2405.16064 , abstract =

work page doi:10.48550/arxiv.2405.16064

[78] [79]

Chen, Xiao and Zhou, Sihang and Liang, Ke and Sun, Xiaoyu and Liu, Xinwang , month = may, year =. Skip-. doi:10.48550/arXiv.2505.18642 , abstract =

work page doi:10.48550/arxiv.2505.18642

[79] [80]

Michaud, Eric J. and Liao, Isaac and Lad, Vedang and Liu, Ziming and Mudide, Anish and Loughridge, Chloe and Guo, Zifan Carl and Kheirkhah, Tara Rezaei and Vukelić, Mateja and Tegmark, Max , month = feb, year =. Opening the. doi:10.48550/arXiv.2402.05110 , abstract =

work page doi:10.48550/arxiv.2402.05110

[80] [81]

doi:10.48550/arXiv.2402.04678 , abstract =

Chuang, Yu-Neng and Wang, Guanchu and Chang, Chia-Yuan and Tang, Ruixiang and Zhong, Shaochen and Yang, Fan and Du, Mengnan and Cai, Xuanting and Braverman, Vladimir and Hu, Xia , month = oct, year =. doi:10.48550/arXiv.2402.04678 , abstract =

work page doi:10.48550/arxiv.2402.04678