Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

Francis Ferraro; Yuxuan Jiang

arxiv: 2606.00305 · v1 · pith:XRESD7G4new · submitted 2026-05-29 · 💻 cs.CL · cs.AI

Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

Yuxuan Jiang , Francis Ferraro This is my paper

Pith reviewed 2026-06-28 22:07 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords on-policy distillationreasoning trajectoriestrajectory-aware supervisionlanguage model reasoningmathematical reasoningtoken-level vs trajectory-level learningdistributional drift

0 comments

The pith

Near-future trajectory guidance lets on-policy distillation focus on real reasoning divergences instead of surface-form mismatches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard on-policy distillation samples full trajectories from the student but applies token-level reverse-KL corrections to high-loss tokens. The paper shows this approach leaves roughly 30 percent of those tokens in low-divergence regimes, meaning they reflect wording differences rather than reasoning forks, while true divergences often require coordinated correction over several future tokens. Trajectory-aware OPD therefore uses information from the near-future portion of the trajectory to decide which states are genuinely divergent and then spreads the guidance signal across multiple tokens. The resulting performance lift, from 47.8 percent to 52.2 percent average accuracy and larger gains on AIME problems, indicates that trajectory-level signals can close the gap between sampled and target reasoning paths more effectively than isolated token fixes.

Core claim

On-policy distillation improves language-model reasoning by training on self-sampled trajectories under teacher supervision, yet its learning signal stays token-level and therefore cannot reliably steer student trajectories toward teacher trajectories. Roughly 30 percent of high-loss tokens lie in the low-divergence regime and represent surface-form mismatches; even genuinely divergent tokens resist repair by isolated reverse-KL updates because reasoning failures appear as short-horizon distributional drift. Trajectory-aware OPD identifies real divergent states with near-future trajectory information and distributes guidance across multiple future tokens, producing the observed accuracy incr

What carries the argument

Trajectory-aware On-Policy Distillation (TOPD), which selects real divergent states via near-future trajectory information and distributes the supervision signal across multiple future tokens.

If this is right

Suppressing non-divergent high-loss tokens raises standard OPD accuracy from 47.8 percent to 48.2 percent.
TOPD reaches 52.2 percent average accuracy across the evaluated tasks.
Accuracy on AIME24 rises from 60.0 percent to 63.3 percent and on AIME25 from 46.7 percent to 53.3 percent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same near-future selection principle could be tested in reinforcement-learning fine-tuning where trajectory drift is also common.
Token-level losses in autoregressive models may systematically over-penalize superficial variation, suggesting a broader need for trajectory-aware objectives.
If near-future guidance generalizes, it could reduce the sample complexity of aligning models on long-horizon reasoning tasks.

Load-bearing premise

Near-future trajectory information can reliably distinguish real reasoning divergences from surface-form mismatches without introducing new training instabilities.

What would settle it

An experiment in which near-future guidance produces no accuracy gain or increases training loss variance on the same benchmarks would falsify the claim that the mechanism reliably identifies and repairs genuine divergences.

Figures

Figures reproduced from arXiv: 2606.00305 by Francis Ferraro, Yuxuan Jiang.

**Figure 2.** Figure 2: Near-future divergence analysis. High-loss [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Case study of the token-by-token learning [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: OT distribution of high-loss tokens. Although [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Local trajectory correction analysis. We com [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Correlation between teacher token loss and [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

On-Policy Distillation (OPD) improves large language model reasoning by training a student model on trajectories sampled from its own policy under teacher supervision. Although OPD operates on trajectories, its learning signal remains token-level: it identifies deviations through high-loss tokens and repairs them through local reverse-KL correction. We show that this "trajectory-sampled but token-learned" mechanism cannot reliably bridge student trajectories toward teacher trajectories. About 30% of high-loss tokens fall into the low-divergence regime, indicating that many are surface-form mismatches rather than real reasoning forks. Moreover, even truly divergent tokens are difficult to repair with isolated token-level supervision, since reasoning failures often unfold as short-horizon distributional drift. We propose Trajectory-aware OPD (TOPD), which uses near-future trajectory information to identify real divergent states and distribute guidance across multiple future tokens. Experiments show that suppressing non-divergent high-loss tokens improves standard OPD from 47.8% to 48.2% average accuracy, while TOPD further improves performance to 52.2%, with gains on AIME24 from 60.0% to 63.3% and AIME25 from 46.7% to 53.3%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a plausible limit in token-level OPD and offers a near-future fix, but supplies almost no implementation details so the accuracy gains cannot be evaluated.

read the letter

The useful part is the observation that roughly 30% of high-loss tokens in standard OPD are low-divergence and therefore probably just surface-form differences rather than reasoning forks. Suppressing those tokens lifts average accuracy from 47.8% to 48.2%, which is a small but clean control. TOPD then reaches 52.2% with larger moves on AIME24 and AIME25.

What is new is the claim that near-future trajectory information can separate real distributional drift from harmless mismatches and spread the correction across several tokens instead of fixing one at a time. That framing is a direct response to the token-versus-trajectory mismatch they diagnose.

The soft spot is the complete absence of any description of the method. The abstract gives no divergence metric, no horizon length, no loss modification, and no sampling change. Without those pieces it is impossible to know whether the extra 4 points come from the claimed trajectory bridging or from some other alteration to the training loop. The stress-test concern holds on the information given.

The baselines and the suppression experiment are straightforward enough, but the central result still rests on an unshown mechanism. Reproducibility is therefore low until the implementation is written down.

This is for people who already work on on-policy distillation for math and logic reasoning. A reader in that niche might pick up the 30% statistic and the general idea, but the paper as written does not yet supply enough to replicate or extend the result.

I would send it to review only after the authors add a clear methods section with the missing equations and controls; the current version is too thin for a serious referee to assess the main claim.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that standard On-Policy Distillation (OPD) for LLM reasoning is limited by its token-level supervision, which cannot reliably bridge full trajectories because ~30% of high-loss tokens occur in a low-divergence regime (surface-form mismatches rather than reasoning forks) and because isolated token corrections cannot address short-horizon distributional drift. It proposes Trajectory-aware OPD (TOPD) that uses near-future trajectory information to identify real divergent states and distribute guidance across multiple future tokens. Experiments report that suppressing non-divergent high-loss tokens raises standard OPD from 47.8% to 48.2% average accuracy while TOPD reaches 52.2%, including gains on AIME24 (60.0% to 63.3%) and AIME25 (46.7% to 53.3%).

Significance. If the reported gains are shown to arise specifically from the trajectory-aware identification and guidance mechanism rather than from unstated modifications to sampling or loss, the work would offer a concrete empirical step toward trajectory-level rather than purely token-level on-policy distillation. The distinction between surface-form and reasoning divergences, together with the AIME benchmark numbers, supplies a falsifiable prediction that could be tested by other groups working on reasoning distillation.

major comments (3)

[Abstract] Abstract: the claim that 'about 30% of high-loss tokens fall into the low-divergence regime' is presented without any definition of the divergence metric, the horizon over which it is computed, the exact procedure for labeling tokens, or controls for confounding factors such as temperature or prompt variation; this statistic is load-bearing for the premise that standard OPD's token-level signal is misdirected.
[Abstract / §3] Abstract / §3 (method): no equation, pseudocode, or hyperparameter (horizon length, divergence threshold, loss re-weighting formula) is supplied for how near-future trajectory information is extracted or how guidance is 'distributed across multiple future tokens'; without these, it is impossible to rule out that the 48.2% o 52.2% jump is produced by an altered loss or sampling procedure rather than the claimed trajectory bridging.
[Experiments] Experiments section: the AIME24 (60.0% o 63.3%) and AIME25 (46.7% o 53.3%) gains are reported as single numbers with no mention of number of independent runs, standard deviation, or statistical test; this prevents assessment of whether the improvements exceed the variability expected from the suppressed-OPD baseline alone.

minor comments (1)

[Abstract] The abstract would be clearer if it named the full set of evaluation benchmarks and the base model sizes used for the reported averages.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our core claims. We respond to each major point below and will incorporate revisions to improve rigor and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'about 30% of high-loss tokens fall into the low-divergence regime' is presented without any definition of the divergence metric, the horizon over which it is computed, the exact procedure for labeling tokens, or controls for confounding factors such as temperature or prompt variation; this statistic is load-bearing for the premise that standard OPD's token-level signal is misdirected.

Authors: We agree the abstract would be improved by a concise definition. Section 3.1 of the manuscript defines the divergence metric as the average KL divergence between student and teacher distributions over a 5-token horizon, with low-divergence tokens labeled where KL < 0.1 yet loss > 2.0; labeling is performed on trajectories sampled at temperature 0.7 using the same prompt set. We will revise the abstract to include a parenthetical summary of the metric, horizon, and controls. revision: yes
Referee: [Abstract / §3] Abstract / §3 (method): no equation, pseudocode, or hyperparameter (horizon length, divergence threshold, loss re-weighting formula) is supplied for how near-future trajectory information is extracted or how guidance is 'distributed across multiple future tokens'; without these, it is impossible to rule out that the 48.2% to 52.2% jump is produced by an altered loss or sampling procedure rather than the claimed trajectory bridging.

Authors: We acknowledge that an explicit formalization is needed. We will add (i) the equation for near-future divergence, (ii) pseudocode for the TOPD procedure, and (iii) the specific hyperparameters (horizon length = 5, divergence threshold = 0.1, and the re-weighting that distributes the reverse-KL signal uniformly over the subsequent tokens). These additions will make clear that the gains arise from the trajectory-aware identification and guidance mechanism. revision: yes
Referee: [Experiments] Experiments section: the AIME24 (60.0% to 63.3%) and AIME25 (46.7% to 53.3%) gains are reported as single numbers with no mention of number of independent runs, standard deviation, or statistical test; this prevents assessment of whether the improvements exceed the variability expected from the suppressed-OPD baseline alone.

Authors: The reported AIME numbers reflect single evaluation runs. In the revision we will report results averaged over three independent runs with standard deviations and will include a paired t-test against the suppressed-OPD baseline to quantify whether the observed differences are statistically distinguishable from run-to-run variability. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided manuscript text contains no equations, derivations, or formal chains that reduce claimed results to self-referential definitions, fitted inputs renamed as predictions, or load-bearing self-citations. All performance numbers (e.g., 47.8% to 52.2%) are presented strictly as empirical experimental outcomes. No ansatz, uniqueness theorem, or renaming of known results is invoked in a way that collapses the central claim to its inputs by construction. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method is described conceptually without mathematical formulation or new postulated objects.

pith-pipeline@v0.9.1-grok · 5749 in / 1267 out tokens · 29822 ms · 2026-06-28T22:07:55.614645+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Formula-Driven Survey and Research Agenda for On-Policy Distillation
cs.AI 2026-06 unverdicted novelty 4.0

A survey creates a taxonomy for on-policy distillation in LLMs that separates temporal credit assignment from vocabulary-level probability routing.

Reference graph

Works this paper leans on

98 extracted references · 4 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation , year =

Luca Beurer. Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation , year =. Forty-first International Conference on Machine Learning,
[2]

Reasoning with exploration: An entropy perspective , year =

Cheng, Daixuan and Huang, Shaohan and Zhu, Xuekai and Dai, Bo and Zhao, Xin and Zhang, Zhenliang and Wei, Furu , booktitle =. Reasoning with exploration: An entropy perspective , year =
[3]

Technical Report , title =
[4]

, title =

Dekoninck, Jasper and Jovanovi'c, Nikola and Gehrunger, Tim and Rognvalddson, K'ari and Petrov, Ivo and Sun, Chenhao and Vechev, Martin T. , title =
[5]

The Language Model Evaluation Harness , version =

Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...
[6]

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes , year =

Fu, Yuqian and Huang, Haohuan and Jiang, Kaiwen and Zhu, Yuanheng and Zhao, Dongbin , journal =. Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes , year =
[7]

and Marten, Ryan and Keh, Sedrick Scott and Raoof, Negin and Smyrnis, G

Guha, E. and Marten, Ryan and Keh, Sedrick Scott and Raoof, Negin and Smyrnis, G. and Bansal, Hritik and Nezhurina, Marianna and Mercat, Jean-Pierre and Vu, Trung and Sprague, Zayne and Suvarna, Ashima and Feuer, Ben and Chen, Liangyu and Khan, Zaid and Frankel, Eric and Grover, Sachin and Choi, Caroline and Muennighoff, Niklas and Su, Shiye and Zhao, Wan...
[8]

and Ivanova, Anna A

Gupta, Akshat and Yeung, Jay and Anumanchipalli, G. and Ivanova, Anna A. , doi =. arXiv.org , title =
[9]

ArXiv preprint , title =

H. ArXiv preprint , title =
[10]

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe , year =

Li, Yaxuan and Zuo, Yuxin and He, Bingxiang and Zhang, Jinqian and Xiao, Chaojun and Qian, Cheng and Yu, Tianyu and Gao, Huan and Yang, Wenkai and Liu, Zhiyuan and Ding, Ning , journal =. Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe , year =
[11]

Let's Verify Step by Step , year =

Hunter Lightman and Vineet Kosaraju and Yuri Burda and Harrison Edwards and Bowen Baker and Teddy Lee and Jan Leike and John Schulman and Ilya Sutskever and Karl Cobbe , bibsource =. Let's Verify Step by Step , year =. The Twelfth International Conference on Learning Representations,
[12]

On-Policy Distillation , year =

Kevin Lu and Thinking Machines Lab , doi =. On-Policy Distillation , year =. Thinking Machines Lab: Connectionism , note =
[13]

Sparse but critical: A token-level analysis of distributional shifts in RLVR fine-tuning of LLMs , year =

Meng, Haoming and Huang, Kexin and Wei, Shaohang and Ma, Chiyu and Yang, Shuo and Wang, Xue and Wang, Guoyin and Ding, Bolin and Zhou, Jingren , journal =. Sparse but critical: A token-level analysis of distributional shifts in RLVR fine-tuning of LLMs , year =
[14]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[15]

ArXiv preprint , title =

Shenfeld, Idan and Damani, Mehul and H. ArXiv preprint , title =
[16]

TIP: Token Importance in On-Policy Distillation , year =

Yuanda Xu and Hejian Sang and Zhengze Zhou and Ran He and Zhipeng Wang and Alborz Geramifard , journal =. TIP: Token Importance in On-Policy Distillation , year =
[17]

Wang, Shenzhi and Yu, Le and Gao, Chang and Zheng, Chujie and Liu, Shixuan and Lu, Rui and Dang, K. and Chen, Xiong-hui and Yang, Jianxin and Zhang, Zhenru and Liu, Yuqiong and Yang, An and Zhao, Andrew and Yue, Yang and Song, Shiji and Yu, Bowen and Huang, Gao and Lin, Junyang , booktitle =. doi:10.48550/arxiv.2506.01939 , title =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.01939
[18]

Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning , year =

Wu, Chao-Chung and Tam, Zhi Rui and Lin, Chieh-Yen and Lee, Hung-yi and Chen, Yun-Nung , journal =. Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning , year =
[19]

Xiao, Xi and Xia, Bing and Yang, Bo and Gao, Bofei and Shen, Bowen and Zhang, Chen and He, Chenhong and Lou, Chiheng and Luo, Fuli and Wang, Gang and Xie, Gang and Zhang, Hailin and Lv, Hanglong and Li, Hanyu and Chen, Heyu and Xu, Hong-Mei and Zhang, Houbin and Liu, Huaqiu and Duo, Jiangshan and Wei, Jianyu and Xiao, Jiebao and Dong, Jinhao and Shi, Jun-...
[20]

arXiv.org , title =

Yang, Ning and Lin, Hai and Liu, Yibo and Tian, Baoliang and Liu, Guoqing and Zhang, Haijun , doi =. arXiv.org , title =
[21]

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models , year =

Songming Zhang and Xue Zhang and Tong Zhang and Bojie Hu and Yufeng Chen and Jinan Xu , journal =. KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models , year =
[22]

arXiv.org , title =

Zhao, Siyan and Xie, Zhihui and Liu, Mengchen and Huang, Jing and Pang, Guan and Chen, Feiyu and Grover, Aditya , doi =. arXiv.org , title =
[23]

Instruction-following evaluation for large language models , year =

Zhou, Jeffrey and Lu, Tianjian and Mishra, Swaroop and Brahma, Siddhartha and Basu, Sujoy and Luan, Yi and Zhou, Denny and Hou, Le , journal =. Instruction-following evaluation for large language models , year =
[24]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning , author=
[25]

2024 , eprint=

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models , author=. 2024 , eprint=

2024
[26]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs , author=. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[27]

arXiv preprint arXiv:2505.13975 , year=

DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models , author=. arXiv preprint arXiv:2505.13975 , year=

Pith/arXiv arXiv
[28]

MATH-AI @ NeurIPS 2025 , year=

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning , author=. MATH-AI @ NeurIPS 2025 , year=

2025
[29]

arXiv preprint arXiv:2510.10925 , year=

Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation , author=. arXiv preprint arXiv:2510.10925 , year=

Pith/arXiv arXiv
[30]

2026 , eprint=

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models , author=. 2026 , eprint=

2026
[31]

From generation to judgment: Opportunities and challenges of LLM-as-a-judge

Li, Dawei and Jiang, Bohan and Huang, Liangjie and Beigi, Alimohammad and Zhao, Chengshuai and Tan, Zhen and Bhattacharjee, Amrita and Jiang, Yuxuan and Chen, Canyu and Wu, Tianhao and Shu, Kai and Cheng, Lu and Liu, Huan. From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge. Proceedings of the 2025 Conference on Empirical Methods ...

work page doi:10.18653/v1/2025.emnlp-main.138 2025
[32]

The twelfth international conference on learning representations , year=

On-policy distillation of language models: Learning from self-generated mistakes , author=. The twelfth international conference on learning representations , year=
[33]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Large language models for data annotation and synthesis: A survey , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[34]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Can llms learn from previous mistakes? investigating llms’ errors to boost for reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[35]

Findings of the Association for Computational Linguistics: EACL 2024 , pages=

Contextualization distillation from large language model for knowledge graph completion , author=. Findings of the Association for Computational Linguistics: EACL 2024 , pages=

2024
[36]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv
[37]

arXiv preprint arXiv:2502.01534 , year=

Preference leakage: A contamination problem in llm-as-a-judge , author=. arXiv preprint arXiv:2502.01534 , year=

arXiv
[38]

arXiv preprint arXiv:2603.14602 , year=

PA3: Policy-Aware Agent Alignment through Chain-of-Thought , author=. arXiv preprint arXiv:2603.14602 , year=

arXiv
[39]

arXiv preprint arXiv:2601.06767 , year=

GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO , author=. arXiv preprint arXiv:2601.06767 , year=

Pith/arXiv arXiv
[40]

arXiv e-prints , pages=

† DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems , author=. arXiv e-prints , pages=
[41]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Multi-level optimal transport for universal cross-tokenizer knowledge distillation on language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[42]

Advances in Neural Information Processing Systems , volume=

Wasserstein distance rivals kullback-leibler divergence for knowledge distillation , author=. Advances in Neural Information Processing Systems , volume=
[43]

Proceedings of the 29th International Conference on Computational Linguistics , pages=

KNOT: Knowledge distillation using optimal transport for solving NLP tasks , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=
[44]

arXiv preprint arXiv:2604.20244 , year=

Hybrid Policy Distillation for LLMs , author=. arXiv preprint arXiv:2604.20244 , year=

Pith/arXiv arXiv
[45]

arXiv preprint arXiv:2601.03555 , year=

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models , author=. arXiv preprint arXiv:2601.03555 , year=

Pith/arXiv arXiv
[46]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

From generation to judgment: Opportunities and challenges of llm-as-a-judge , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[47]

2026 , eprint=

Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval , author=. 2026 , eprint=

2026
[48]

2026 , eprint=

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis , author=. 2026 , eprint=

2026
[49]

Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews , year=

Lai, Longying and Cheng, Zhiyuan and Cheng, Kai and Qi, Xiaoxi , booktitle=. Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews , year=
[50]

Available at SSRN 6698538 , year=

Toward Sustainable On-Device Intelligence: A Survey on Energy-Efficient RAG Systems with Small Language Models , author=. Available at SSRN 6698538 , year=
[51]

Available at SSRN 6720239 , year=

Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach , author=. Available at SSRN 6720239 , year=
[52]

2025 , eprint=

AutoNeural: Co-Designing Vision-Language Models for NPU Inference , author=. 2025 , eprint=

2025
[53]

Available at SSRN 6610278 , year=

Volatility Persistence and Model Choice in Cross-Market Volatility Forecasting , author=. Available at SSRN 6610278 , year=
[54]

2026 , booktitle=

Regime-Dependent Volatility Dynamics: Evidence from Time-Series Analysis , author=. 2026 , booktitle=. doi:10.2991/978-94-6239-672-2_18 , publisher=

work page doi:10.2991/978-94-6239-672-2_18 2026
[55]

MASCOT: Analyzing Malware Evolution Through a Well-Curated Source Code Dataset , year=

Li, Bojing and Zhong, Duo and Nadendla, Dharani and Terceros, Gabriel and Bhandary, Prajna and S, Raguvir and Nicholas, Charles , booktitle=. MASCOT: Analyzing Malware Evolution Through a Well-Curated Source Code Dataset , year=
[56]

Nucleic Acids Research , volume=

CelLink: integrating single-cell multi-omics data with weak feature linkage and imbalanced cell populations , author=. Nucleic Acids Research , volume=. 2025 , publisher=

2025
[57]

arXiv preprint arXiv:2605.09253 , year=

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation , author=. arXiv preprint arXiv:2605.09253 , year=

Pith/arXiv arXiv
[58]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[59]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Sepprune: Structured pruning for efficient deep speech separation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[60]

Authorea Preprints , year=

A Comprehensive Survey of Interaction Techniques in 3D Scene Generation , author=. Authorea Preprints , year=
[61]

IEEE Transactions on Emerging Topics in Computational Intelligence , year =

Li, Yuqi and Zeng, Hansheng and Zhang, Fuyan and Yang, Chuanguang and Li, Yanli and Ding, Weiping , title =. IEEE Transactions on Emerging Topics in Computational Intelligence , year =. doi:10.1109/TETCI.2025.3621221 , issn =

work page doi:10.1109/tetci.2025.3621221 2025
[62]

Distilling Time Series Foundation Models for Efficient Forecasting , year=

Li, Yuqi and Ding, Kuiye and Yang, Chuanguang and Chen, Szu-Yu and Tian, Yingli , booktitle=. Distilling Time Series Foundation Models for Efficient Forecasting , year=
[63]

arXiv preprint arXiv:2511.17448 , year=

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models , author=. arXiv preprint arXiv:2511.17448 , year=

arXiv
[64]

arXiv preprint arXiv:2511.16715 , year=

DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting , author=. arXiv preprint arXiv:2511.16715 , year=

arXiv
[65]

arXiv preprint arXiv:2506.17290 , year=

SRKD: Towards Efficient 3D Point Cloud Segmentation via Structure-and Relation-aware Knowledge Distillation , author=. arXiv preprint arXiv:2506.17290 , year=

arXiv
[66]

arXiv preprint arXiv:2509.00039 , year=

AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models , author=. arXiv preprint arXiv:2509.00039 , year=

arXiv
[67]

ICMR , year=

COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing , author =. ICMR , year=
[68]

arXiv preprint arXiv:2410.14720 , year=

SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models , author=. arXiv preprint arXiv:2410.14720 , year=

arXiv
[69]

Knowledge-Based Systems , year=

Enhancing spatiotemporal prediction through the integration of Mamba state space models and Diffusion Transformers , author=. Knowledge-Based Systems , year=
[70]

arXiv preprint arXiv:2511.23402 , year =

Quantized-TinyLLaVA: A New Multimodal Foundation Model Enables Efficient Split Learning , author =. arXiv preprint arXiv:2511.23402 , year =

arXiv
[71]

arXiv preprint arXiv:2605.08060 , year=

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents , author=. arXiv preprint arXiv:2605.08060 , year=

Pith/arXiv arXiv
[72]

2026 , eprint=

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict , author=. 2026 , eprint=

2026
[73]

2026 , eprint=

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG , author=. 2026 , eprint=

2026
[74]

2026 , eprint=

Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding , author=. 2026 , eprint=

2026
[75]

2023 IEEE 39th International Conference on Data Engineering (ICDE) , pages=

Towards efficient shortest path counting on billion-scale graphs , author=. 2023 IEEE 39th International Conference on Data Engineering (ICDE) , pages=. 2023 , organization=

2023
[76]

Yiqi Wang and Long Yuan and Wenjie Zhang and Xuemin Li and Zi Chen and Qing Liu , title =. Proc
[77]

DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning , year=

Gao, Yaxin and Lu, Yao and Zhang, Zongfei and Nie, Jiaqi and Yu, Shanqing and Xuan, Qi , booktitle=. DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning , year=
[78]

DRF: LLM-AGENT Dynamic Reputation Filtering Framework

Lou, Yuwei and Hu, Hao and Ma, Shaocong and Zhang, Zongfei and Wang, Liang and Ge, Jidong and Tao, Xianping. DRF: LLM-AGENT Dynamic Reputation Filtering Framework. Neural Information Processing. 2026

2026
[79]

2026 , eprint=

Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models , author=. 2026 , eprint=

2026
[80]

Synergized Data Efficiency and Compression (SEC) Optimization for Large Language Models , year=

Li, Xinjin and Ma, Yu and Huang, Yangchen and Wang, Xingqi and Lin, Yuzhen and Zhang, Chenxi , booktitle=. Synergized Data Efficiency and Compression (SEC) Optimization for Large Language Models , year=

Showing first 80 references.

[1] [1]

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation , year =

Luca Beurer. Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation , year =. Forty-first International Conference on Machine Learning,

[2] [2]

Reasoning with exploration: An entropy perspective , year =

Cheng, Daixuan and Huang, Shaohan and Zhu, Xuekai and Dai, Bo and Zhao, Xin and Zhang, Zhenliang and Wei, Furu , booktitle =. Reasoning with exploration: An entropy perspective , year =

[3] [3]

Technical Report , title =

[4] [4]

, title =

Dekoninck, Jasper and Jovanovi'c, Nikola and Gehrunger, Tim and Rognvalddson, K'ari and Petrov, Ivo and Sun, Chenhao and Vechev, Martin T. , title =

[5] [5]

The Language Model Evaluation Harness , version =

Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...

[6] [6]

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes , year =

Fu, Yuqian and Huang, Haohuan and Jiang, Kaiwen and Zhu, Yuanheng and Zhao, Dongbin , journal =. Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes , year =

[7] [7]

and Marten, Ryan and Keh, Sedrick Scott and Raoof, Negin and Smyrnis, G

Guha, E. and Marten, Ryan and Keh, Sedrick Scott and Raoof, Negin and Smyrnis, G. and Bansal, Hritik and Nezhurina, Marianna and Mercat, Jean-Pierre and Vu, Trung and Sprague, Zayne and Suvarna, Ashima and Feuer, Ben and Chen, Liangyu and Khan, Zaid and Frankel, Eric and Grover, Sachin and Choi, Caroline and Muennighoff, Niklas and Su, Shiye and Zhao, Wan...

[8] [8]

and Ivanova, Anna A

Gupta, Akshat and Yeung, Jay and Anumanchipalli, G. and Ivanova, Anna A. , doi =. arXiv.org , title =

[9] [9]

ArXiv preprint , title =

H. ArXiv preprint , title =

[10] [10]

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe , year =

Li, Yaxuan and Zuo, Yuxin and He, Bingxiang and Zhang, Jinqian and Xiao, Chaojun and Qian, Cheng and Yu, Tianyu and Gao, Huan and Yang, Wenkai and Liu, Zhiyuan and Ding, Ning , journal =. Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe , year =

[11] [11]

Let's Verify Step by Step , year =

Hunter Lightman and Vineet Kosaraju and Yuri Burda and Harrison Edwards and Bowen Baker and Teddy Lee and Jan Leike and John Schulman and Ilya Sutskever and Karl Cobbe , bibsource =. Let's Verify Step by Step , year =. The Twelfth International Conference on Learning Representations,

[12] [12]

On-Policy Distillation , year =

Kevin Lu and Thinking Machines Lab , doi =. On-Policy Distillation , year =. Thinking Machines Lab: Connectionism , note =

[13] [13]

Sparse but critical: A token-level analysis of distributional shifts in RLVR fine-tuning of LLMs , year =

Meng, Haoming and Huang, Kexin and Wei, Shaohang and Ma, Chiyu and Yang, Shuo and Wang, Xue and Wang, Guoyin and Ding, Bolin and Zhou, Jingren , journal =. Sparse but critical: A token-level analysis of distributional shifts in RLVR fine-tuning of LLMs , year =

[14] [14]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[15] [15]

ArXiv preprint , title =

Shenfeld, Idan and Damani, Mehul and H. ArXiv preprint , title =

[16] [16]

TIP: Token Importance in On-Policy Distillation , year =

Yuanda Xu and Hejian Sang and Zhengze Zhou and Ran He and Zhipeng Wang and Alborz Geramifard , journal =. TIP: Token Importance in On-Policy Distillation , year =

[17] [17]

Wang, Shenzhi and Yu, Le and Gao, Chang and Zheng, Chujie and Liu, Shixuan and Lu, Rui and Dang, K. and Chen, Xiong-hui and Yang, Jianxin and Zhang, Zhenru and Liu, Yuqiong and Yang, An and Zhao, Andrew and Yue, Yang and Song, Shiji and Yu, Bowen and Huang, Gao and Lin, Junyang , booktitle =. doi:10.48550/arxiv.2506.01939 , title =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.01939

[18] [18]

Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning , year =

Wu, Chao-Chung and Tam, Zhi Rui and Lin, Chieh-Yen and Lee, Hung-yi and Chen, Yun-Nung , journal =. Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning , year =

[19] [19]

Xiao, Xi and Xia, Bing and Yang, Bo and Gao, Bofei and Shen, Bowen and Zhang, Chen and He, Chenhong and Lou, Chiheng and Luo, Fuli and Wang, Gang and Xie, Gang and Zhang, Hailin and Lv, Hanglong and Li, Hanyu and Chen, Heyu and Xu, Hong-Mei and Zhang, Houbin and Liu, Huaqiu and Duo, Jiangshan and Wei, Jianyu and Xiao, Jiebao and Dong, Jinhao and Shi, Jun-...

[20] [20]

arXiv.org , title =

Yang, Ning and Lin, Hai and Liu, Yibo and Tian, Baoliang and Liu, Guoqing and Zhang, Haijun , doi =. arXiv.org , title =

[21] [21]

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models , year =

Songming Zhang and Xue Zhang and Tong Zhang and Bojie Hu and Yufeng Chen and Jinan Xu , journal =. KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models , year =

[22] [22]

arXiv.org , title =

Zhao, Siyan and Xie, Zhihui and Liu, Mengchen and Huang, Jing and Pang, Guan and Chen, Feiyu and Grover, Aditya , doi =. arXiv.org , title =

[23] [23]

Instruction-following evaluation for large language models , year =

Zhou, Jeffrey and Lu, Tianjian and Mishra, Swaroop and Brahma, Siddhartha and Basu, Sujoy and Luan, Yi and Zhou, Denny and Hou, Le , journal =. Instruction-following evaluation for large language models , year =

[24] [24]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning , author=

[25] [25]

2024 , eprint=

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models , author=. 2024 , eprint=

2024

[26] [26]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs , author=. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[27] [27]

arXiv preprint arXiv:2505.13975 , year=

DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models , author=. arXiv preprint arXiv:2505.13975 , year=

Pith/arXiv arXiv

[28] [28]

MATH-AI @ NeurIPS 2025 , year=

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning , author=. MATH-AI @ NeurIPS 2025 , year=

2025

[29] [29]

arXiv preprint arXiv:2510.10925 , year=

Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation , author=. arXiv preprint arXiv:2510.10925 , year=

Pith/arXiv arXiv

[30] [30]

2026 , eprint=

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models , author=. 2026 , eprint=

2026

[31] [31]

From generation to judgment: Opportunities and challenges of LLM-as-a-judge

Li, Dawei and Jiang, Bohan and Huang, Liangjie and Beigi, Alimohammad and Zhao, Chengshuai and Tan, Zhen and Bhattacharjee, Amrita and Jiang, Yuxuan and Chen, Canyu and Wu, Tianhao and Shu, Kai and Cheng, Lu and Liu, Huan. From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge. Proceedings of the 2025 Conference on Empirical Methods ...

work page doi:10.18653/v1/2025.emnlp-main.138 2025

[32] [32]

The twelfth international conference on learning representations , year=

On-policy distillation of language models: Learning from self-generated mistakes , author=. The twelfth international conference on learning representations , year=

[33] [33]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Large language models for data annotation and synthesis: A survey , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024

[34] [34]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Can llms learn from previous mistakes? investigating llms’ errors to boost for reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[35] [35]

Findings of the Association for Computational Linguistics: EACL 2024 , pages=

Contextualization distillation from large language model for knowledge graph completion , author=. Findings of the Association for Computational Linguistics: EACL 2024 , pages=

2024

[36] [36]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv

[37] [37]

arXiv preprint arXiv:2502.01534 , year=

Preference leakage: A contamination problem in llm-as-a-judge , author=. arXiv preprint arXiv:2502.01534 , year=

arXiv

[38] [38]

arXiv preprint arXiv:2603.14602 , year=

PA3: Policy-Aware Agent Alignment through Chain-of-Thought , author=. arXiv preprint arXiv:2603.14602 , year=

arXiv

[39] [39]

arXiv preprint arXiv:2601.06767 , year=

GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO , author=. arXiv preprint arXiv:2601.06767 , year=

Pith/arXiv arXiv

[40] [40]

arXiv e-prints , pages=

† DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems , author=. arXiv e-prints , pages=

[41] [41]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Multi-level optimal transport for universal cross-tokenizer knowledge distillation on language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[42] [42]

Advances in Neural Information Processing Systems , volume=

Wasserstein distance rivals kullback-leibler divergence for knowledge distillation , author=. Advances in Neural Information Processing Systems , volume=

[43] [43]

Proceedings of the 29th International Conference on Computational Linguistics , pages=

KNOT: Knowledge distillation using optimal transport for solving NLP tasks , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=

[44] [44]

arXiv preprint arXiv:2604.20244 , year=

Hybrid Policy Distillation for LLMs , author=. arXiv preprint arXiv:2604.20244 , year=

Pith/arXiv arXiv

[45] [45]

arXiv preprint arXiv:2601.03555 , year=

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models , author=. arXiv preprint arXiv:2601.03555 , year=

Pith/arXiv arXiv

[46] [46]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

From generation to judgment: Opportunities and challenges of llm-as-a-judge , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[47] [47]

2026 , eprint=

Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval , author=. 2026 , eprint=

2026

[48] [48]

2026 , eprint=

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis , author=. 2026 , eprint=

2026

[49] [49]

Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews , year=

Lai, Longying and Cheng, Zhiyuan and Cheng, Kai and Qi, Xiaoxi , booktitle=. Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews , year=

[50] [50]

Available at SSRN 6698538 , year=

Toward Sustainable On-Device Intelligence: A Survey on Energy-Efficient RAG Systems with Small Language Models , author=. Available at SSRN 6698538 , year=

[51] [51]

Available at SSRN 6720239 , year=

Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach , author=. Available at SSRN 6720239 , year=

[52] [52]

2025 , eprint=

AutoNeural: Co-Designing Vision-Language Models for NPU Inference , author=. 2025 , eprint=

2025

[53] [53]

Available at SSRN 6610278 , year=

Volatility Persistence and Model Choice in Cross-Market Volatility Forecasting , author=. Available at SSRN 6610278 , year=

[54] [54]

2026 , booktitle=

Regime-Dependent Volatility Dynamics: Evidence from Time-Series Analysis , author=. 2026 , booktitle=. doi:10.2991/978-94-6239-672-2_18 , publisher=

work page doi:10.2991/978-94-6239-672-2_18 2026

[55] [55]

MASCOT: Analyzing Malware Evolution Through a Well-Curated Source Code Dataset , year=

Li, Bojing and Zhong, Duo and Nadendla, Dharani and Terceros, Gabriel and Bhandary, Prajna and S, Raguvir and Nicholas, Charles , booktitle=. MASCOT: Analyzing Malware Evolution Through a Well-Curated Source Code Dataset , year=

[56] [56]

Nucleic Acids Research , volume=

CelLink: integrating single-cell multi-omics data with weak feature linkage and imbalanced cell populations , author=. Nucleic Acids Research , volume=. 2025 , publisher=

2025

[57] [57]

arXiv preprint arXiv:2605.09253 , year=

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation , author=. arXiv preprint arXiv:2605.09253 , year=

Pith/arXiv arXiv

[58] [58]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[59] [59]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Sepprune: Structured pruning for efficient deep speech separation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[60] [60]

Authorea Preprints , year=

A Comprehensive Survey of Interaction Techniques in 3D Scene Generation , author=. Authorea Preprints , year=

[61] [61]

IEEE Transactions on Emerging Topics in Computational Intelligence , year =

Li, Yuqi and Zeng, Hansheng and Zhang, Fuyan and Yang, Chuanguang and Li, Yanli and Ding, Weiping , title =. IEEE Transactions on Emerging Topics in Computational Intelligence , year =. doi:10.1109/TETCI.2025.3621221 , issn =

work page doi:10.1109/tetci.2025.3621221 2025

[62] [62]

Distilling Time Series Foundation Models for Efficient Forecasting , year=

Li, Yuqi and Ding, Kuiye and Yang, Chuanguang and Chen, Szu-Yu and Tian, Yingli , booktitle=. Distilling Time Series Foundation Models for Efficient Forecasting , year=

[63] [63]

arXiv preprint arXiv:2511.17448 , year=

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models , author=. arXiv preprint arXiv:2511.17448 , year=

arXiv

[64] [64]

arXiv preprint arXiv:2511.16715 , year=

DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting , author=. arXiv preprint arXiv:2511.16715 , year=

arXiv

[65] [65]

arXiv preprint arXiv:2506.17290 , year=

SRKD: Towards Efficient 3D Point Cloud Segmentation via Structure-and Relation-aware Knowledge Distillation , author=. arXiv preprint arXiv:2506.17290 , year=

arXiv

[66] [66]

arXiv preprint arXiv:2509.00039 , year=

AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models , author=. arXiv preprint arXiv:2509.00039 , year=

arXiv

[67] [67]

ICMR , year=

COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing , author =. ICMR , year=

[68] [68]

arXiv preprint arXiv:2410.14720 , year=

SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models , author=. arXiv preprint arXiv:2410.14720 , year=

arXiv

[69] [69]

Knowledge-Based Systems , year=

Enhancing spatiotemporal prediction through the integration of Mamba state space models and Diffusion Transformers , author=. Knowledge-Based Systems , year=

[70] [70]

arXiv preprint arXiv:2511.23402 , year =

Quantized-TinyLLaVA: A New Multimodal Foundation Model Enables Efficient Split Learning , author =. arXiv preprint arXiv:2511.23402 , year =

arXiv

[71] [71]

arXiv preprint arXiv:2605.08060 , year=

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents , author=. arXiv preprint arXiv:2605.08060 , year=

Pith/arXiv arXiv

[72] [72]

2026 , eprint=

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict , author=. 2026 , eprint=

2026

[73] [73]

2026 , eprint=

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG , author=. 2026 , eprint=

2026

[74] [74]

2026 , eprint=

Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding , author=. 2026 , eprint=

2026

[75] [75]

2023 IEEE 39th International Conference on Data Engineering (ICDE) , pages=

Towards efficient shortest path counting on billion-scale graphs , author=. 2023 IEEE 39th International Conference on Data Engineering (ICDE) , pages=. 2023 , organization=

2023

[76] [76]

Yiqi Wang and Long Yuan and Wenjie Zhang and Xuemin Li and Zi Chen and Qing Liu , title =. Proc

[77] [77]

DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning , year=

Gao, Yaxin and Lu, Yao and Zhang, Zongfei and Nie, Jiaqi and Yu, Shanqing and Xuan, Qi , booktitle=. DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning , year=

[78] [78]

DRF: LLM-AGENT Dynamic Reputation Filtering Framework

Lou, Yuwei and Hu, Hao and Ma, Shaocong and Zhang, Zongfei and Wang, Liang and Ge, Jidong and Tao, Xianping. DRF: LLM-AGENT Dynamic Reputation Filtering Framework. Neural Information Processing. 2026

2026

[79] [79]

2026 , eprint=

Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models , author=. 2026 , eprint=

2026

[80] [80]

Synergized Data Efficiency and Compression (SEC) Optimization for Large Language Models , year=

Li, Xinjin and Ma, Yu and Huang, Yangchen and Wang, Xingqi and Lin, Yuzhen and Zhang, Chenxi , booktitle=. Synergized Data Efficiency and Compression (SEC) Optimization for Large Language Models , year=