Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
Pith reviewed 2026-06-28 22:07 UTC · model grok-4.3
The pith
Near-future trajectory guidance lets on-policy distillation focus on real reasoning divergences instead of surface-form mismatches.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On-policy distillation improves language-model reasoning by training on self-sampled trajectories under teacher supervision, yet its learning signal stays token-level and therefore cannot reliably steer student trajectories toward teacher trajectories. Roughly 30 percent of high-loss tokens lie in the low-divergence regime and represent surface-form mismatches; even genuinely divergent tokens resist repair by isolated reverse-KL updates because reasoning failures appear as short-horizon distributional drift. Trajectory-aware OPD identifies real divergent states with near-future trajectory information and distributes guidance across multiple future tokens, producing the observed accuracy incr
What carries the argument
Trajectory-aware On-Policy Distillation (TOPD), which selects real divergent states via near-future trajectory information and distributes the supervision signal across multiple future tokens.
If this is right
- Suppressing non-divergent high-loss tokens raises standard OPD accuracy from 47.8 percent to 48.2 percent.
- TOPD reaches 52.2 percent average accuracy across the evaluated tasks.
- Accuracy on AIME24 rises from 60.0 percent to 63.3 percent and on AIME25 from 46.7 percent to 53.3 percent.
Where Pith is reading between the lines
- The same near-future selection principle could be tested in reinforcement-learning fine-tuning where trajectory drift is also common.
- Token-level losses in autoregressive models may systematically over-penalize superficial variation, suggesting a broader need for trajectory-aware objectives.
- If near-future guidance generalizes, it could reduce the sample complexity of aligning models on long-horizon reasoning tasks.
Load-bearing premise
Near-future trajectory information can reliably distinguish real reasoning divergences from surface-form mismatches without introducing new training instabilities.
What would settle it
An experiment in which near-future guidance produces no accuracy gain or increases training loss variance on the same benchmarks would falsify the claim that the mechanism reliably identifies and repairs genuine divergences.
Figures
read the original abstract
On-Policy Distillation (OPD) improves large language model reasoning by training a student model on trajectories sampled from its own policy under teacher supervision. Although OPD operates on trajectories, its learning signal remains token-level: it identifies deviations through high-loss tokens and repairs them through local reverse-KL correction. We show that this "trajectory-sampled but token-learned" mechanism cannot reliably bridge student trajectories toward teacher trajectories. About 30% of high-loss tokens fall into the low-divergence regime, indicating that many are surface-form mismatches rather than real reasoning forks. Moreover, even truly divergent tokens are difficult to repair with isolated token-level supervision, since reasoning failures often unfold as short-horizon distributional drift. We propose Trajectory-aware OPD (TOPD), which uses near-future trajectory information to identify real divergent states and distribute guidance across multiple future tokens. Experiments show that suppressing non-divergent high-loss tokens improves standard OPD from 47.8% to 48.2% average accuracy, while TOPD further improves performance to 52.2%, with gains on AIME24 from 60.0% to 63.3% and AIME25 from 46.7% to 53.3%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that standard On-Policy Distillation (OPD) for LLM reasoning is limited by its token-level supervision, which cannot reliably bridge full trajectories because ~30% of high-loss tokens occur in a low-divergence regime (surface-form mismatches rather than reasoning forks) and because isolated token corrections cannot address short-horizon distributional drift. It proposes Trajectory-aware OPD (TOPD) that uses near-future trajectory information to identify real divergent states and distribute guidance across multiple future tokens. Experiments report that suppressing non-divergent high-loss tokens raises standard OPD from 47.8% to 48.2% average accuracy while TOPD reaches 52.2%, including gains on AIME24 (60.0% to 63.3%) and AIME25 (46.7% to 53.3%).
Significance. If the reported gains are shown to arise specifically from the trajectory-aware identification and guidance mechanism rather than from unstated modifications to sampling or loss, the work would offer a concrete empirical step toward trajectory-level rather than purely token-level on-policy distillation. The distinction between surface-form and reasoning divergences, together with the AIME benchmark numbers, supplies a falsifiable prediction that could be tested by other groups working on reasoning distillation.
major comments (3)
- [Abstract] Abstract: the claim that 'about 30% of high-loss tokens fall into the low-divergence regime' is presented without any definition of the divergence metric, the horizon over which it is computed, the exact procedure for labeling tokens, or controls for confounding factors such as temperature or prompt variation; this statistic is load-bearing for the premise that standard OPD's token-level signal is misdirected.
- [Abstract / §3] Abstract / §3 (method): no equation, pseudocode, or hyperparameter (horizon length, divergence threshold, loss re-weighting formula) is supplied for how near-future trajectory information is extracted or how guidance is 'distributed across multiple future tokens'; without these, it is impossible to rule out that the 48.2% o 52.2% jump is produced by an altered loss or sampling procedure rather than the claimed trajectory bridging.
- [Experiments] Experiments section: the AIME24 (60.0% o 63.3%) and AIME25 (46.7% o 53.3%) gains are reported as single numbers with no mention of number of independent runs, standard deviation, or statistical test; this prevents assessment of whether the improvements exceed the variability expected from the suppressed-OPD baseline alone.
minor comments (1)
- [Abstract] The abstract would be clearer if it named the full set of evaluation benchmarks and the base model sizes used for the reported averages.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our core claims. We respond to each major point below and will incorporate revisions to improve rigor and reproducibility.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'about 30% of high-loss tokens fall into the low-divergence regime' is presented without any definition of the divergence metric, the horizon over which it is computed, the exact procedure for labeling tokens, or controls for confounding factors such as temperature or prompt variation; this statistic is load-bearing for the premise that standard OPD's token-level signal is misdirected.
Authors: We agree the abstract would be improved by a concise definition. Section 3.1 of the manuscript defines the divergence metric as the average KL divergence between student and teacher distributions over a 5-token horizon, with low-divergence tokens labeled where KL < 0.1 yet loss > 2.0; labeling is performed on trajectories sampled at temperature 0.7 using the same prompt set. We will revise the abstract to include a parenthetical summary of the metric, horizon, and controls. revision: yes
-
Referee: [Abstract / §3] Abstract / §3 (method): no equation, pseudocode, or hyperparameter (horizon length, divergence threshold, loss re-weighting formula) is supplied for how near-future trajectory information is extracted or how guidance is 'distributed across multiple future tokens'; without these, it is impossible to rule out that the 48.2% to 52.2% jump is produced by an altered loss or sampling procedure rather than the claimed trajectory bridging.
Authors: We acknowledge that an explicit formalization is needed. We will add (i) the equation for near-future divergence, (ii) pseudocode for the TOPD procedure, and (iii) the specific hyperparameters (horizon length = 5, divergence threshold = 0.1, and the re-weighting that distributes the reverse-KL signal uniformly over the subsequent tokens). These additions will make clear that the gains arise from the trajectory-aware identification and guidance mechanism. revision: yes
-
Referee: [Experiments] Experiments section: the AIME24 (60.0% to 63.3%) and AIME25 (46.7% to 53.3%) gains are reported as single numbers with no mention of number of independent runs, standard deviation, or statistical test; this prevents assessment of whether the improvements exceed the variability expected from the suppressed-OPD baseline alone.
Authors: The reported AIME numbers reflect single evaluation runs. In the revision we will report results averaged over three independent runs with standard deviations and will include a paired t-test against the suppressed-OPD baseline to quantify whether the observed differences are statistically distinguishable from run-to-run variability. revision: yes
Circularity Check
No significant circularity
full rationale
The provided manuscript text contains no equations, derivations, or formal chains that reduce claimed results to self-referential definitions, fitted inputs renamed as predictions, or load-bearing self-citations. All performance numbers (e.g., 47.8% to 52.2%) are presented strictly as empirical experimental outcomes. No ansatz, uniqueness theorem, or renaming of known results is invoked in a way that collapses the central claim to its inputs by construction. The analysis therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
A Formula-Driven Survey and Research Agenda for On-Policy Distillation
A survey creates a taxonomy for on-policy distillation in LLMs that separates temporal credit assignment from vocabulary-level probability routing.
Reference graph
Works this paper leans on
-
[1]
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation , year =
Luca Beurer. Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation , year =. Forty-first International Conference on Machine Learning,
-
[2]
Reasoning with exploration: An entropy perspective , year =
Cheng, Daixuan and Huang, Shaohan and Zhu, Xuekai and Dai, Bo and Zhao, Xin and Zhang, Zhenliang and Wei, Furu , booktitle =. Reasoning with exploration: An entropy perspective , year =
-
[3]
Technical Report , title =
-
[4]
, title =
Dekoninck, Jasper and Jovanovi'c, Nikola and Gehrunger, Tim and Rognvalddson, K'ari and Petrov, Ivo and Sun, Chenhao and Vechev, Martin T. , title =
-
[5]
The Language Model Evaluation Harness , version =
Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...
-
[6]
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes , year =
Fu, Yuqian and Huang, Haohuan and Jiang, Kaiwen and Zhu, Yuanheng and Zhao, Dongbin , journal =. Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes , year =
-
[7]
and Marten, Ryan and Keh, Sedrick Scott and Raoof, Negin and Smyrnis, G
Guha, E. and Marten, Ryan and Keh, Sedrick Scott and Raoof, Negin and Smyrnis, G. and Bansal, Hritik and Nezhurina, Marianna and Mercat, Jean-Pierre and Vu, Trung and Sprague, Zayne and Suvarna, Ashima and Feuer, Ben and Chen, Liangyu and Khan, Zaid and Frankel, Eric and Grover, Sachin and Choi, Caroline and Muennighoff, Niklas and Su, Shiye and Zhao, Wan...
-
[8]
and Ivanova, Anna A
Gupta, Akshat and Yeung, Jay and Anumanchipalli, G. and Ivanova, Anna A. , doi =. arXiv.org , title =
-
[9]
ArXiv preprint , title =
H. ArXiv preprint , title =
-
[10]
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe , year =
Li, Yaxuan and Zuo, Yuxin and He, Bingxiang and Zhang, Jinqian and Xiao, Chaojun and Qian, Cheng and Yu, Tianyu and Gao, Huan and Yang, Wenkai and Liu, Zhiyuan and Ding, Ning , journal =. Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe , year =
-
[11]
Let's Verify Step by Step , year =
Hunter Lightman and Vineet Kosaraju and Yuri Burda and Harrison Edwards and Bowen Baker and Teddy Lee and Jan Leike and John Schulman and Ilya Sutskever and Karl Cobbe , bibsource =. Let's Verify Step by Step , year =. The Twelfth International Conference on Learning Representations,
-
[12]
On-Policy Distillation , year =
Kevin Lu and Thinking Machines Lab , doi =. On-Policy Distillation , year =. Thinking Machines Lab: Connectionism , note =
-
[13]
Sparse but critical: A token-level analysis of distributional shifts in RLVR fine-tuning of LLMs , year =
Meng, Haoming and Huang, Kexin and Wei, Shaohang and Ma, Chiyu and Yang, Shuo and Wang, Xue and Wang, Guoyin and Ding, Bolin and Zhou, Jingren , journal =. Sparse but critical: A token-level analysis of distributional shifts in RLVR fine-tuning of LLMs , year =
-
[14]
arXiv preprint arXiv:2505.09388 , year=
Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=
-
[15]
ArXiv preprint , title =
Shenfeld, Idan and Damani, Mehul and H. ArXiv preprint , title =
-
[16]
TIP: Token Importance in On-Policy Distillation , year =
Yuanda Xu and Hejian Sang and Zhengze Zhou and Ran He and Zhipeng Wang and Alborz Geramifard , journal =. TIP: Token Importance in On-Policy Distillation , year =
-
[17]
Wang, Shenzhi and Yu, Le and Gao, Chang and Zheng, Chujie and Liu, Shixuan and Lu, Rui and Dang, K. and Chen, Xiong-hui and Yang, Jianxin and Zhang, Zhenru and Liu, Yuqiong and Yang, An and Zhao, Andrew and Yue, Yang and Song, Shiji and Yu, Bowen and Huang, Gao and Lin, Junyang , booktitle =. doi:10.48550/arxiv.2506.01939 , title =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.01939
-
[18]
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning , year =
Wu, Chao-Chung and Tam, Zhi Rui and Lin, Chieh-Yen and Lee, Hung-yi and Chen, Yun-Nung , journal =. Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning , year =
-
[19]
Xiao, Xi and Xia, Bing and Yang, Bo and Gao, Bofei and Shen, Bowen and Zhang, Chen and He, Chenhong and Lou, Chiheng and Luo, Fuli and Wang, Gang and Xie, Gang and Zhang, Hailin and Lv, Hanglong and Li, Hanyu and Chen, Heyu and Xu, Hong-Mei and Zhang, Houbin and Liu, Huaqiu and Duo, Jiangshan and Wei, Jianyu and Xiao, Jiebao and Dong, Jinhao and Shi, Jun-...
-
[20]
arXiv.org , title =
Yang, Ning and Lin, Hai and Liu, Yibo and Tian, Baoliang and Liu, Guoqing and Zhang, Haijun , doi =. arXiv.org , title =
-
[21]
KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models , year =
Songming Zhang and Xue Zhang and Tong Zhang and Bojie Hu and Yufeng Chen and Jinan Xu , journal =. KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models , year =
-
[22]
arXiv.org , title =
Zhao, Siyan and Xie, Zhihui and Liu, Mengchen and Huang, Jing and Pang, Guan and Chen, Feiyu and Grover, Aditya , doi =. arXiv.org , title =
-
[23]
Instruction-following evaluation for large language models , year =
Zhou, Jeffrey and Lu, Tianjian and Mishra, Swaroop and Brahma, Siddhartha and Basu, Sujoy and Luan, Yi and Zhou, Denny and Hou, Le , journal =. Instruction-following evaluation for large language models , year =
-
[24]
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning , author=
-
[25]
2024 , eprint=
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models , author=. 2024 , eprint=
2024
-
[26]
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs , author=. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[27]
arXiv preprint arXiv:2505.13975 , year=
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models , author=. arXiv preprint arXiv:2505.13975 , year=
-
[28]
MATH-AI @ NeurIPS 2025 , year=
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning , author=. MATH-AI @ NeurIPS 2025 , year=
2025
-
[29]
arXiv preprint arXiv:2510.10925 , year=
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation , author=. arXiv preprint arXiv:2510.10925 , year=
-
[30]
2026 , eprint=
SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models , author=. 2026 , eprint=
2026
-
[31]
From generation to judgment: Opportunities and challenges of LLM-as-a-judge
Li, Dawei and Jiang, Bohan and Huang, Liangjie and Beigi, Alimohammad and Zhao, Chengshuai and Tan, Zhen and Bhattacharjee, Amrita and Jiang, Yuxuan and Chen, Canyu and Wu, Tianhao and Shu, Kai and Cheng, Lu and Liu, Huan. From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge. Proceedings of the 2025 Conference on Empirical Methods ...
-
[32]
The twelfth international conference on learning representations , year=
On-policy distillation of language models: Learning from self-generated mistakes , author=. The twelfth international conference on learning representations , year=
-
[33]
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
Large language models for data annotation and synthesis: A survey , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=
2024
-
[34]
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Can llms learn from previous mistakes? investigating llms’ errors to boost for reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[35]
Findings of the Association for Computational Linguistics: EACL 2024 , pages=
Contextualization distillation from large language model for knowledge graph completion , author=. Findings of the Association for Computational Linguistics: EACL 2024 , pages=
2024
-
[36]
arXiv preprint arXiv:2501.12948 , year=
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=
-
[37]
arXiv preprint arXiv:2502.01534 , year=
Preference leakage: A contamination problem in llm-as-a-judge , author=. arXiv preprint arXiv:2502.01534 , year=
-
[38]
arXiv preprint arXiv:2603.14602 , year=
PA3: Policy-Aware Agent Alignment through Chain-of-Thought , author=. arXiv preprint arXiv:2603.14602 , year=
-
[39]
arXiv preprint arXiv:2601.06767 , year=
GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO , author=. arXiv preprint arXiv:2601.06767 , year=
-
[40]
arXiv e-prints , pages=
† DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems , author=. arXiv e-prints , pages=
-
[41]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Multi-level optimal transport for universal cross-tokenizer knowledge distillation on language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[42]
Advances in Neural Information Processing Systems , volume=
Wasserstein distance rivals kullback-leibler divergence for knowledge distillation , author=. Advances in Neural Information Processing Systems , volume=
-
[43]
Proceedings of the 29th International Conference on Computational Linguistics , pages=
KNOT: Knowledge distillation using optimal transport for solving NLP tasks , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=
-
[44]
arXiv preprint arXiv:2604.20244 , year=
Hybrid Policy Distillation for LLMs , author=. arXiv preprint arXiv:2604.20244 , year=
-
[45]
arXiv preprint arXiv:2601.03555 , year=
SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models , author=. arXiv preprint arXiv:2601.03555 , year=
-
[46]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
From generation to judgment: Opportunities and challenges of llm-as-a-judge , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
2025
-
[47]
2026 , eprint=
Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval , author=. 2026 , eprint=
2026
-
[48]
2026 , eprint=
Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis , author=. 2026 , eprint=
2026
-
[49]
Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews , year=
Lai, Longying and Cheng, Zhiyuan and Cheng, Kai and Qi, Xiaoxi , booktitle=. Do Transformers Always Win? An Empirical Study of Semantic Embeddings for Short-Text E-commerce Reviews , year=
-
[50]
Available at SSRN 6698538 , year=
Toward Sustainable On-Device Intelligence: A Survey on Energy-Efficient RAG Systems with Small Language Models , author=. Available at SSRN 6698538 , year=
-
[51]
Available at SSRN 6720239 , year=
Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach , author=. Available at SSRN 6720239 , year=
-
[52]
2025 , eprint=
AutoNeural: Co-Designing Vision-Language Models for NPU Inference , author=. 2025 , eprint=
2025
-
[53]
Available at SSRN 6610278 , year=
Volatility Persistence and Model Choice in Cross-Market Volatility Forecasting , author=. Available at SSRN 6610278 , year=
-
[54]
Regime-Dependent Volatility Dynamics: Evidence from Time-Series Analysis , author=. 2026 , booktitle=. doi:10.2991/978-94-6239-672-2_18 , publisher=
-
[55]
MASCOT: Analyzing Malware Evolution Through a Well-Curated Source Code Dataset , year=
Li, Bojing and Zhong, Duo and Nadendla, Dharani and Terceros, Gabriel and Bhandary, Prajna and S, Raguvir and Nicholas, Charles , booktitle=. MASCOT: Analyzing Malware Evolution Through a Well-Curated Source Code Dataset , year=
-
[56]
Nucleic Acids Research , volume=
CelLink: integrating single-cell multi-omics data with weak feature linkage and imbalanced cell populations , author=. Nucleic Acids Research , volume=. 2025 , publisher=
2025
-
[57]
arXiv preprint arXiv:2605.09253 , year=
Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation , author=. arXiv preprint arXiv:2605.09253 , year=
-
[58]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[59]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Sepprune: Structured pruning for efficient deep speech separation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[60]
Authorea Preprints , year=
A Comprehensive Survey of Interaction Techniques in 3D Scene Generation , author=. Authorea Preprints , year=
-
[61]
IEEE Transactions on Emerging Topics in Computational Intelligence , year =
Li, Yuqi and Zeng, Hansheng and Zhang, Fuyan and Yang, Chuanguang and Li, Yanli and Ding, Weiping , title =. IEEE Transactions on Emerging Topics in Computational Intelligence , year =. doi:10.1109/TETCI.2025.3621221 , issn =
-
[62]
Distilling Time Series Foundation Models for Efficient Forecasting , year=
Li, Yuqi and Ding, Kuiye and Yang, Chuanguang and Chen, Szu-Yu and Tian, Yingli , booktitle=. Distilling Time Series Foundation Models for Efficient Forecasting , year=
-
[63]
arXiv preprint arXiv:2511.17448 , year=
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models , author=. arXiv preprint arXiv:2511.17448 , year=
-
[64]
arXiv preprint arXiv:2511.16715 , year=
DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting , author=. arXiv preprint arXiv:2511.16715 , year=
-
[65]
arXiv preprint arXiv:2506.17290 , year=
SRKD: Towards Efficient 3D Point Cloud Segmentation via Structure-and Relation-aware Knowledge Distillation , author=. arXiv preprint arXiv:2506.17290 , year=
-
[66]
arXiv preprint arXiv:2509.00039 , year=
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models , author=. arXiv preprint arXiv:2509.00039 , year=
-
[67]
ICMR , year=
COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing , author =. ICMR , year=
-
[68]
arXiv preprint arXiv:2410.14720 , year=
SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models , author=. arXiv preprint arXiv:2410.14720 , year=
-
[69]
Knowledge-Based Systems , year=
Enhancing spatiotemporal prediction through the integration of Mamba state space models and Diffusion Transformers , author=. Knowledge-Based Systems , year=
-
[70]
arXiv preprint arXiv:2511.23402 , year =
Quantized-TinyLLaVA: A New Multimodal Foundation Model Enables Efficient Split Learning , author =. arXiv preprint arXiv:2511.23402 , year =
-
[71]
arXiv preprint arXiv:2605.08060 , year=
The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents , author=. arXiv preprint arXiv:2605.08060 , year=
-
[72]
2026 , eprint=
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict , author=. 2026 , eprint=
2026
-
[73]
2026 , eprint=
Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG , author=. 2026 , eprint=
2026
-
[74]
2026 , eprint=
Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding , author=. 2026 , eprint=
2026
-
[75]
2023 IEEE 39th International Conference on Data Engineering (ICDE) , pages=
Towards efficient shortest path counting on billion-scale graphs , author=. 2023 IEEE 39th International Conference on Data Engineering (ICDE) , pages=. 2023 , organization=
2023
-
[76]
Yiqi Wang and Long Yuan and Wenjie Zhang and Xuemin Li and Zi Chen and Qing Liu , title =. Proc
-
[77]
DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning , year=
Gao, Yaxin and Lu, Yao and Zhang, Zongfei and Nie, Jiaqi and Yu, Shanqing and Xuan, Qi , booktitle=. DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning , year=
-
[78]
DRF: LLM-AGENT Dynamic Reputation Filtering Framework
Lou, Yuwei and Hu, Hao and Ma, Shaocong and Zhang, Zongfei and Wang, Liang and Ge, Jidong and Tao, Xianping. DRF: LLM-AGENT Dynamic Reputation Filtering Framework. Neural Information Processing. 2026
2026
-
[79]
2026 , eprint=
Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models , author=. 2026 , eprint=
2026
-
[80]
Synergized Data Efficiency and Compression (SEC) Optimization for Large Language Models , year=
Li, Xinjin and Ma, Yu and Huang, Yangchen and Wang, Xingqi and Lin, Yuzhen and Zhang, Chenxi , booktitle=. Synergized Data Efficiency and Compression (SEC) Optimization for Large Language Models , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.