Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Chia-Mu Yu; Ci-Yang Tsai; Raluca Ada Popa; Yu-An Lu; Yu-Lin Tsai

arxiv: 2606.00642 · v1 · pith:AMRZCPCSnew · submitted 2026-05-30 · 💻 cs.AI · cs.CR

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Yu-An Lu , Ci-Yang Tsai , Yu-Lin Tsai , Raluca Ada Popa , Chia-Mu Yu This is my paper

Pith reviewed 2026-06-28 18:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CR

keywords reasoning traceslarge language modelsknowledge distillationpromptingtrace exposuremodel securityin-context learningreasoning elicitation

0 comments

The pith

Prompting can force LLMs to expose detailed reasoning traces even when designed to hide them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Reasoning Exposure Prompting (REP), a method that elicits hidden reasoning traces from language models using in-context examples from shadow models wrapped in code-like formats. Experiments across reasoning datasets, victim models, and distillation setups show that REP increases similarity between the prompted visible traces and the models' internal reasoning processes. The extracted traces still carry useful signals that support capability transfer to weaker student models. A sympathetic reader would care because deployed systems often rely on interface-level hiding to protect reasoning supervision, yet this approach appears insufficient against simple prompting. The work centers on whether trace hiding at the user interface actually prevents access to valuable internal signals.

Core claim

Reasoning Exposure Prompting (REP) is a lightweight in-context elicitation method that uses shadow-model-generated demonstrations wrapped in auxiliary code-like formats to raise user-visible reasoning traces from a victim model. Across the common reasoning dataset, different victim models, and different student model distillation, REP substantially increases similarity between exposed and REP-conditioned internal traces while preserving useful reasoning signals.

What carries the argument

Reasoning Exposure Prompting (REP), an in-context elicitation technique that wraps shadow-model demonstrations in auxiliary code-like formats to surface internal reasoning traces.

If this is right

Interface-level hiding of reasoning traces fails to block extraction of usable supervision signals.
Reasoning distillation can occur using only user-visible prompted outputs without direct internal access.
Deployed reasoning systems may leak their thinking processes through ordinary prompting interactions.
Effective trace protection requires defenses beyond simple output summarization or hiding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models may need output filtering or adversarial training against elicitation to keep reasoning private.
REP-style methods could be evaluated against models with explicit safety alignments to test bypass potential.
The similarity measure could be validated by checking whether REP traces yield measurable downstream task improvements.
This suggests that reasoning capability and extractability may be tightly coupled in current model architectures.

Load-bearing premise

That increased similarity between elicited traces and internal traces reflects extraction of useful transferable reasoning supervision rather than superficial format matching.

What would settle it

An experiment in which student models trained on REP-elicited traces show no accuracy gains over those trained on non-REP prompted outputs or answer-only summaries across standard reasoning benchmarks.

Figures

Figures reproduced from arXiv: 2606.00642 by Chia-Mu Yu, Ci-Yang Tsai, Raluca Ada Popa, Yu-An Lu, Yu-Lin Tsai.

**Figure 1.** Figure 1: Overview of REP. REP constructs k-shot reasoning demonstrations Sk from an auxiliary dataset Ddemo , transforms them with a wrapper T(·), and prepends the resulting demonstrations to each target question q ∈ Ds . The victim model Mv is then prompted to produce user-visible exposed reasoning r2 and final answer a. functional utility indicates that exposed traces carry transferable reasoning information beyo… view at source ↗

**Figure 2.** Figure 2: End-to-end example of an exposed reasoning trace under REP (victim [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

Reasoning traces have become a valuable form of learning signals for improving and transferring the capabilities of large language models. In particular, detailed traces can help distill reasoning behavior from stronger teacher models into weaker student models. The value of capability transfer has motivated many deployed systems with reasoning models to hide raw internal traces and expose at most summaries and answers to users. As a result, we ask whether such interface-level trace hiding prevents users from obtaining useful reasoning supervision through prompting. We study this question with Reasoning Exposure Prompting (REP), a lightweight in-context elicitation method that uses shadow-model-generated demonstrations wrapped in auxiliary code-like formats to raise user-visible reasoning traces from a victim model. Across the common reasoning dataset, different victim models, and different student model distillation, REP substantially increases similarity between exposed and REP-conditioned internal traces while preserving useful reasoning signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REP shows prompting can surface hidden traces but the abstract leaves the format-priming alternative unaddressed.

read the letter

The main point is that hiding reasoning traces behind summaries or answers does not stop a user from eliciting them with the right in-context examples. The paper presents REP as a lightweight method that wraps shadow-model demonstrations in code-like formats to raise those traces from victim models.

What is new is the specific combination of shadow demonstrations plus auxiliary code formatting aimed at trace exposure, plus the tests across multiple victim and student models on standard reasoning datasets. The work correctly flags a real deployment issue for anyone trying to limit capability transfer through hidden traces.

The soft spots are the missing pieces. The abstract claims substantial similarity gains and preserved utility but supplies no metrics, baselines, or statistical tests. There is also no sign of controls that would separate actual reasoning extraction from the model simply adopting the demonstrated output structure. The stress-test concern holds on the information given; without ablations using randomized or content-free traces the central claim stays hard to assess.

This is for people working on model extraction, distillation, and interface-level security. A reader already following prompting attacks on capability protection would get the most out of it.

Send it to peer review so the authors can supply the numbers and the format controls.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Reasoning Exposure Prompting (REP), an in-context elicitation technique that wraps shadow-model demonstrations in auxiliary code-like formats to surface reasoning traces from LLMs that otherwise expose only summaries or answers. It claims that, across common reasoning datasets, multiple victim models, and student-model distillation settings, REP substantially raises similarity between the elicited traces and the models' internal traces while preserving the utility of those traces as supervision signals.

Significance. If the empirical results are supported by quantitative metrics and appropriate controls, the finding would show that interface-level trace hiding does not block extraction of usable reasoning supervision. This would bear on the design of deployed reasoning systems, the feasibility of trace-based distillation, and the limits of output-side obfuscation.

major comments (2)

[Abstract] Abstract: the central claim that REP 'substantially increases similarity between exposed and REP-conditioned internal traces while preserving useful reasoning signals' is stated without any reported metrics, baselines, statistical tests, or quantitative results, so the support for the claim cannot be evaluated from the provided text.
The experimental design does not describe ablations that use structurally identical but content-free or randomized traces; without such controls it remains unclear whether observed similarity gains reflect extraction of internal reasoning or simply format priming induced by the code-like demonstration style.

minor comments (1)

The phrase 'shadow-model-generated demonstrations' is used without an explicit definition or pointer to the precise construction of the shadow models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that REP 'substantially increases similarity between exposed and REP-conditioned internal traces while preserving useful reasoning signals' is stated without any reported metrics, baselines, statistical tests, or quantitative results, so the support for the claim cannot be evaluated from the provided text.

Authors: We agree that the abstract would benefit from quantitative support. The body of the manuscript reports specific metrics (trace similarity scores, distillation performance deltas, baselines, and statistical tests), but these are not summarized in the abstract. We will revise the abstract to include key quantitative results, such as the reported similarity gains and distillation utility metrics, while remaining within length constraints. revision: yes
Referee: The experimental design does not describe ablations that use structurally identical but content-free or randomized traces; without such controls it remains unclear whether observed similarity gains reflect extraction of internal reasoning or simply format priming induced by the code-like demonstration style.

Authors: This is a fair point about potential format confounds. Our experiments include comparisons to standard prompting and other elicitation methods, but do not explicitly include content-free or randomized-trace controls. We will add these ablations in the revised manuscript, using structurally matched but semantically null or randomized demonstrations, to isolate the contribution of content versus format. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparisons with no self-referential definitions or fitted predictions

full rationale

The paper describes an empirical method (REP) and reports similarity increases across datasets and models. No equations, parameters fitted to subsets then re-predicted, or self-citations that bear the central load appear in the provided text. The claim rests on observable trace similarity metrics and distillation utility, which are externally measurable and not defined in terms of the result itself. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The work is empirical and introduces one new method. The central assumption that reasoning traces are valuable distillation signals is taken from prior literature rather than derived here.

axioms (1)

domain assumption Detailed reasoning traces serve as effective learning signals for distilling capabilities from teacher to student models
Invoked in the first sentence of the abstract as the motivation for hiding traces.

invented entities (1)

Reasoning Exposure Prompting (REP) no independent evidence
purpose: Elicit user-visible reasoning traces from models that hide them
New prompting method defined and tested in the paper

pith-pipeline@v0.9.1-grok · 5682 in / 1081 out tokens · 23026 ms · 2026-06-28T18:46:52.805013+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 14 canonical work pages · 9 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

Publications Manual , year = "1983", publisher =

1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url=

2007
[5]

Dan Gusfield , title =. 1997

1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =. 2005 , url=

2005
[8]

and Tukey, John W

Cooley, James W. and Tukey, John W. , journal=. An algorithm for the machine calculation of complex. 1965 , url=

1965
[9]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[10]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=
[11]

and Chi, Ed H

Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc V. and Chi, Ed H. and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. International Conference on Learning Representations , year =
[12]

arXiv preprint arXiv:2207.00747 , year =

Rationale-Augmented Ensembles in Language Models , author =. arXiv preprint arXiv:2207.00747 , year =

work page arXiv
[13]

Advances in Neural Information Processing Systems , volume=

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. Advances in Neural Information Processing Systems , volume=
[14]

Measuring Faithfulness in Chain-of-Thought Reasoning

Measuring faithfulness in chain-of-thought reasoning , author=. arXiv preprint arXiv:2307.13702 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Making reasoning matter: Measuring and improving faithfulness of chain-of-thought reasoning , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024
[16]

How to Steal Reasoning Without Reasoning Traces

How to Steal Reasoning Without Reasoning Traces , author=. arXiv preprint arXiv:2603.07267 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Qwen3 Technical Report

Yang, An and others , title =. arXiv preprint arXiv:2505.09388 , year =. 2505.09388 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , journal =. 2025 , month = jan, day =. doi:10.48550/arXiv.2412.15115 , url =. 2412.15115 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025
[19]

OpenThoughts: Data Recipes for Reasoning Models

OpenThoughts: Data Recipes for Reasoning Models , author =. arXiv preprint arXiv:2506.04178 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[20]

2024 , month = sep, day =

Learning to Reason with. 2024 , month = sep, day =

2024
[21]

2026 , month = apr, day =

Claude Mythos Preview System Card , institution =. 2026 , month = apr, day =

2026
[22]

2024 , howpublished =

American Invitational Mathematics Examination (. 2024 , howpublished =

2024
[23]

2025 , howpublished =

American Invitational Mathematics Examination (. 2025 , howpublished =

2025
[24]

International Conference on Learning Representations , volume=

Livecodebench: Holistic and contamination free evaluation of large language models for code , author=. International Conference on Learning Representations , volume=
[25]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

NeurIPS , year =

Measuring Mathematical Problem Solving With the MATH Dataset , author =. NeurIPS , year =
[27]

2025 , howpublished =

OpenThoughts-114k , author =. 2025 , howpublished =

2025
[28]

Gemini Thinking , year =
[29]

Building with Extended Thinking , author =
[30]

Terms of Use , year =
[31]

Can I Use My Outputs to Train an AI Model? , year =
[32]

Gemini API Additional Terms of Service , year =
[33]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[34]

International Conference on Learning Representations , volume=

Let's verify step by step , author=. International Conference on Learning Representations , volume=
[35]

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Monitoring reasoning models for misbehavior and the risks of promoting obfuscation , author=. arXiv preprint arXiv:2503.11926 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Advances in Neural Information Processing Systems , volume=

Star: Bootstrapping reasoning with reasoning , author=. Advances in Neural Information Processing Systems , volume=
[37]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Teaching small language models to reason , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=
[38]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Symbolic chain-of-thought distillation: Small models can also “think” step-by-step , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[39]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023
[40]

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Orca: Progressive learning from complex explanation traces of gpt-4 , author=. arXiv preprint arXiv:2306.02707 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[41]

Nature , volume=

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , author=. Nature , volume=. 2025 , publisher=

2025
[42]

Detecting and Preventing Distillation Attacks , year =
[43]

2026 , month = feb, day =

2026
[44]

2026 , month = feb, day =

Updated Stakes for American-Led, Democratic. 2026 , month = feb, day =

2026
[45]

2026 , month = mar, day =

Bearman, Theo , title =. 2026 , month = mar, day =

2026
[46]

Reasoning Models Don't Always Say What They Think

Reasoning models don't always say what they think , author=. arXiv preprint arXiv:2505.05410 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

arXiv preprint arXiv:2601.05076 , year=

Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models , author=. arXiv preprint arXiv:2601.05076 , year=

work page arXiv
[48]

arXiv preprint arXiv:2511.07772 , year=

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought , author=. arXiv preprint arXiv:2511.07772 , year=

work page arXiv
[49]

arXiv preprint arXiv:2603.05618 , year=

Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs , author=. arXiv preprint arXiv:2603.05618 , year=

work page arXiv
[50]

2025 , month = nov, day =

Gehlot, Abhishek , title =. 2025 , month = nov, day =

2025
[51]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Have llms advanced enough? a challenging problem solving benchmark for large language models , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023

[1] [1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[2] [2]

Publications Manual , year = "1983", publisher =

1983

[3] [3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[4] [4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url=

2007

[5] [5]

Dan Gusfield , title =. 1997

1997

[6] [6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[7] [7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =. 2005 , url=

2005

[8] [8]

and Tukey, John W

Cooley, James W. and Tukey, John W. , journal=. An algorithm for the machine calculation of complex. 1965 , url=

1965

[9] [9]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[10] [10]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=

[11] [11]

and Chi, Ed H

Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc V. and Chi, Ed H. and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. International Conference on Learning Representations , year =

[12] [12]

arXiv preprint arXiv:2207.00747 , year =

Rationale-Augmented Ensembles in Language Models , author =. arXiv preprint arXiv:2207.00747 , year =

work page arXiv

[13] [13]

Advances in Neural Information Processing Systems , volume=

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. Advances in Neural Information Processing Systems , volume=

[14] [14]

Measuring Faithfulness in Chain-of-Thought Reasoning

Measuring faithfulness in chain-of-thought reasoning , author=. arXiv preprint arXiv:2307.13702 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Making reasoning matter: Measuring and improving faithfulness of chain-of-thought reasoning , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024

[16] [16]

How to Steal Reasoning Without Reasoning Traces

How to Steal Reasoning Without Reasoning Traces , author=. arXiv preprint arXiv:2603.07267 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Qwen3 Technical Report

Yang, An and others , title =. arXiv preprint arXiv:2505.09388 , year =. 2505.09388 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , journal =. 2025 , month = jan, day =. doi:10.48550/arXiv.2412.15115 , url =. 2412.15115 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025

[19] [19]

OpenThoughts: Data Recipes for Reasoning Models

OpenThoughts: Data Recipes for Reasoning Models , author =. arXiv preprint arXiv:2506.04178 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

2024 , month = sep, day =

Learning to Reason with. 2024 , month = sep, day =

2024

[21] [21]

2026 , month = apr, day =

Claude Mythos Preview System Card , institution =. 2026 , month = apr, day =

2026

[22] [22]

2024 , howpublished =

American Invitational Mathematics Examination (. 2024 , howpublished =

2024

[23] [23]

2025 , howpublished =

American Invitational Mathematics Examination (. 2025 , howpublished =

2025

[24] [24]

International Conference on Learning Representations , volume=

Livecodebench: Holistic and contamination free evaluation of large language models for code , author=. International Conference on Learning Representations , volume=

[25] [25]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

NeurIPS , year =

Measuring Mathematical Problem Solving With the MATH Dataset , author =. NeurIPS , year =

[27] [27]

2025 , howpublished =

OpenThoughts-114k , author =. 2025 , howpublished =

2025

[28] [28]

Gemini Thinking , year =

[29] [29]

Building with Extended Thinking , author =

[30] [30]

Terms of Use , year =

[31] [31]

Can I Use My Outputs to Train an AI Model? , year =

[32] [32]

Gemini API Additional Terms of Service , year =

[33] [33]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[34] [34]

International Conference on Learning Representations , volume=

Let's verify step by step , author=. International Conference on Learning Representations , volume=

[35] [35]

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Monitoring reasoning models for misbehavior and the risks of promoting obfuscation , author=. arXiv preprint arXiv:2503.11926 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Advances in Neural Information Processing Systems , volume=

Star: Bootstrapping reasoning with reasoning , author=. Advances in Neural Information Processing Systems , volume=

[37] [37]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Teaching small language models to reason , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

[38] [38]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Symbolic chain-of-thought distillation: Small models can also “think” step-by-step , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[39] [39]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023

[40] [40]

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Orca: Progressive learning from complex explanation traces of gpt-4 , author=. arXiv preprint arXiv:2306.02707 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[41] [41]

Nature , volume=

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , author=. Nature , volume=. 2025 , publisher=

2025

[42] [42]

Detecting and Preventing Distillation Attacks , year =

[43] [43]

2026 , month = feb, day =

2026

[44] [44]

2026 , month = feb, day =

Updated Stakes for American-Led, Democratic. 2026 , month = feb, day =

2026

[45] [45]

2026 , month = mar, day =

Bearman, Theo , title =. 2026 , month = mar, day =

2026

[46] [46]

Reasoning Models Don't Always Say What They Think

Reasoning models don't always say what they think , author=. arXiv preprint arXiv:2505.05410 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

arXiv preprint arXiv:2601.05076 , year=

Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models , author=. arXiv preprint arXiv:2601.05076 , year=

work page arXiv

[48] [48]

arXiv preprint arXiv:2511.07772 , year=

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought , author=. arXiv preprint arXiv:2511.07772 , year=

work page arXiv

[49] [49]

arXiv preprint arXiv:2603.05618 , year=

Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs , author=. arXiv preprint arXiv:2603.05618 , year=

work page arXiv

[50] [50]

2025 , month = nov, day =

Gehlot, Abhishek , title =. 2025 , month = nov, day =

2025

[51] [51]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Have llms advanced enough? a challenging problem solving benchmark for large language models , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023