{"total":37,"items":[{"citing_arxiv_id":"2606.31986","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CoLT: Teaching Multi-Modal Models to Think with Chain of Latent Thoughts","primary_cat":"cs.CV","submitted_at":"2026-06-30T17:24:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoLT replaces text-based chain-of-thought in MLLMs with 3-step latent thought chains supervised by a removable external decoder in forward and backward modes, yielding 10.1x faster inference on eight benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.31779","ref_index":126,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers","primary_cat":"cs.LG","submitted_at":"2026-06-30T14:58:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.31048","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition","primary_cat":"cs.LG","submitted_at":"2026-06-30T02:34:45+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Distilling CoT from DeepSeek-R1 to Qwen2.5-7B on competition problems yields 4.76 pp accuracy gain to 69.43% and 73.1% on MATH-500, with accuracy falling as response length decreases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.29712","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Why Struggle with Continuous Latents? Interpretable Discrete Latent Reasoning via Rendered Compression","primary_cat":"cs.CL","submitted_at":"2026-06-29T02:34:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DLR creates discrete latent tokens from rendered CoT images via clustering, enabling up to 20x compression and interpretable trajectories that outperform continuous latent baselines on reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28070","ref_index":42,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications","primary_cat":"cs.AI","submitted_at":"2026-06-26T13:33:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Oxygen AIIC is an industrial platform using LLMs and VLMs for scalable item knowledge production and service at JD.com, reporting 94.2% precision and 82.8% recall along with business metric improvements.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27617","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Masked Language Flow Models","primary_cat":"cs.CL","submitted_at":"2026-06-26T00:16:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10184","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning","primary_cat":"cs.LG","submitted_at":"2026-06-08T21:21:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07720","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning","primary_cat":"cs.AI","submitted_at":"2026-06-05T15:45:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AGCLR extends CoCoNuT with a gated concept stream for persistent memory to fix fact loss in latent reasoning, yielding improvements on reasoning benchmarks as depth increases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07157","ref_index":25,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models","primary_cat":"cs.AI","submitted_at":"2026-06-05T11:17:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Frontier AI models' no-CoT 50% task-completion time horizons have doubled yearly over six years, reaching over 3 minutes for GPT-5.5 with projections to 25 minutes by 2030.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06840","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces","primary_cat":"cs.CL","submitted_at":"2026-06-05T02:32:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05315","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LoRi: Low-Rank Distillation for Implicit Reasoning","primary_cat":"cs.CL","submitted_at":"2026-06-03T18:05:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LoRi distills implicit chain-of-thought by matching low-rank structures in hidden states, raising math-reasoning accuracy toward explicit CoT levels on LLaMA and Qwen models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04627","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models","primary_cat":"cs.AI","submitted_at":"2026-06-03T09:01:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MIRAGE compresses explicit chain-of-thought into latent vectors and adds a generative world model to predict future interface states, matching explicit reasoning performance with 3-5x fewer tokens on Android benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09881","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Calibrated, Fair, and accurate Deepfake Detection","primary_cat":"cs.LG","submitted_at":"2026-06-03T05:44:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02248","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Geometric Latent Reasoning Induces Shorter Generations in LLMs","primary_cat":"cs.CL","submitted_at":"2026-06-01T13:40:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GLR formulates latent reasoning as geometric path approximation in pretrained embedding space and reports shorter LLM generations on math tasks without an explicit length penalty.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30343","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unlocking the Working Memory of Large Language Models for Latent Reasoning","primary_cat":"cs.CL","submitted_at":"2026-05-28T17:59:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RiM trains LLMs to perform latent reasoning via fixed memory blocks processed in one forward pass using a two-stage curriculum, matching or exceeding prior latent methods on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29068","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Robust and Efficient Guardrails with Latent Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-27T20:15:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"COLAGUARD matches explicit-reasoning guardrail performance on safety benchmarks while delivering 12.9X speedup and 22.4X token reduction by propagating hidden states instead of generating text.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28600","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Transformers Provably Learn to Internalize Chain-of-Thought","primary_cat":"cs.LG","submitted_at":"2026-05-27T15:17:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28888","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap","primary_cat":"cs.IR","submitted_at":"2026-05-27T07:27:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"GPlan compresses LLM reasoning into small models via Progressive Implicit CoT Distillation and Spatiotemporal Counterfactual DPO to generate logically coherent and physically executable intent sequences for recommendation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28008","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training","primary_cat":"cs.AI","submitted_at":"2026-05-27T06:02:41+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Coarser compressed CoT needs more SFT data, scales differently with repetition, and RL later breaks apart the compressed steps learned in SFT.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26106","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Looped Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-25T17:58:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16638","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens","primary_cat":"cs.AI","submitted_at":"2026-05-15T21:10:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TTE-Flash trains latent think tokens with CoT generation loss and embedding tokens with contrastive loss to deliver high-performance multimodal representations without generating explicit reasoning at inference time.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08221","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-06T13:58:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26760","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Factorized Latent Reasoning for LLM-based Recommendation","primary_cat":"cs.IR","submitted_at":"2026-04-29T14:55:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FLR factorizes latent reasoning into multiple preference factors using multi-factor attention and regularizations, outperforming baselines on recommendation benchmarks while adding robustness and interpretability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24881","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate","primary_cat":"cs.AI","submitted_at":"2026-04-27T18:06:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Two-stage fine-tuning distills multi-agent debate into single LLMs, matching performance at 93% lower token cost while revealing agent-specific activation subspaces for steering.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21027","ref_index":103,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering","primary_cat":"cs.AI","submitted_at":"2026-04-22T19:18:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08299","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SeLaR: Selective Latent Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-09T14:32:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"therefore report three complementary metrics that together characterize cost-effectiveness in Figure 3, all as percentage changes relative to the CoT (Sam- pling) baseline on Qwen3-8B. We denote the average tokens on correctly- and wrongly-answered samples as Tc and Tw, respec- tively, and write accuracy asα. Our headline metric isT okens per Correct Answer: TPCA= α·T c + (1−α)·T w α ,(10) Finding 3:On average, SeLaR is more cost- effective than SwiR, with the advantage widening on the hardest reasoning tasks. On TPCA (Figure 3a), SeLaR outperforms SwiR by 6.5, 4.8, 52.4, and 27.2 percentage points on GSM8K, MATH500, AIME 2024, and AIME 2025, respectively. This advantage is most pronounced on AIME 2024, where SeLaR reduces TPCA by 19."},{"citing_arxiv_id":"2604.06427","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning","primary_cat":"cs.LG","submitted_at":"2026-04-07T20:04:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs discover latent planning strategies up to five steps during training and execute them up to eight steps at test time, with larger models reaching seven under few-shot prompting, revealing a dissociation between discovery and execution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03679","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LightThinker++: From Reasoning Compression to Memory Management","primary_cat":"cs.CL","submitted_at":"2026-04-04T10:46:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"06769,2024.doi: 10.48550/ARXIV.2412.06769. URLhttps://doi.org/10.48550/arXiv.2412.06769. [19] YuntianDeng,KiranPrasad,RolandFernandez,PaulSmolensky,VishravChaudhary,andStuartM.Shieber. Implicit chainofthoughtreasoningviaknowledgedistillation. CoRR,abs/2311.01460,2023. doi: 10.48550/ARXIV.2311.01460. URLhttps://doi.org/10.48550/arXiv.2311.01460. 25 [20] Yuntian Deng, Yejin Choi, and Stuart M. Shieber. From explicit cot to implicit cot: Learning to internalize cot step by step. CoRR, abs/2405.14838, 2024. doi: 10.48550/ARXIV.2405.14838. URLhttps://doi.org/10.48550/arXiv. 2405.14838. [21] Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuan- dong Tian, Christopher Ré, Clark W."},{"citing_arxiv_id":"2604.02371","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Internalized Reasoning for Long-Context Visual Document Understanding","primary_cat":"cs.CV","submitted_at":"2026-03-31T04:41:01+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[9] DeepSeek. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025. URLhttps://arxiv.org/abs/2501.12948. [10] Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, and Stuart Shieber. Implicit chain of thought reasoning via knowledge distillation, 2023. URL https://arxiv.org/abs/2311.01460. [11] Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, et al. Vlmevalkit: An open-source toolkit for evaluating large multi-modality models. InProceedings of the 32nd ACM International Conference on Multimedia, pages 11198-11201, 2024. [12] Yuchen Duan, Zhe Chen, Yusong Hu, Weiyun Wang, Shenglong Ye, Botian Shi, Lewei Lu,"},{"citing_arxiv_id":"2511.08983","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving","primary_cat":"cs.CL","submitted_at":"2025-11-12T05:05:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SpiralThinker stabilizes iterative latent reasoning in LLMs via text-latent interleaving and progressive alignment, achieving SOTA results among latent baselines on math, logic, and commonsense tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.24941","ref_index":5,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought","primary_cat":"cs.LG","submitted_at":"2025-10-28T20:14:02+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.25020","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deep Thinking by Markov Chain of Continuous Thoughts","primary_cat":"cs.LG","submitted_at":"2025-09-29T16:44:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MarCos modifies transformers to perform continuous multi-step reasoning by mapping thought-level continuous states directly to next-thought distributions, achieving substantial wall-clock speedups on math problems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.02181","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey of Scaling in Large Language Model Reasoning","primary_cat":"cs.AI","submitted_at":"2025-04-02T23:51:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"bilities enable dialogue agents to maintain coherent and informative long-term interactions, effectively integrating historical context and external knowledge [7, 43]. Multi-agent systems leverage it- erative refinement and structured verification among specialized reasoning agents, further enhancing accuracy and reducing errors such as hallucinations [42]. Interactive AI environments such as LLM-based Cursor [ 34] leverage LLMs' contextual reasoning to facilitate precise user interactions, enabling targeted queries and refined outputs. 6.3 Science The scaling of LLMs has significantly benefited scientific domains, with medicine, finance, and disaster management emerging as prominent application areas. Medical Domain. The medical domain has experienced remark-"},{"citing_arxiv_id":"2502.21074","ref_index":90,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation","primary_cat":"cs.CL","submitted_at":"2025-02-28T14:07:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.03387","ref_index":90,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LIMO: Less is More for Reasoning","primary_cat":"cs.CL","submitted_at":"2025-02-05T17:23:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.13171","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Compressed Chain of Thought: Efficient Reasoning Through Dense Representations","primary_cat":"cs.CL","submitted_at":"2024-12-17T18:50:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CCoT generates variable-length continuous contemplation tokens that compress explicit reasoning chains, enabling additional dense reasoning and accuracy gains in off-the-shelf language models while allowing adaptive control of token count.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.06769","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Training Large Language Models to Reason in a Continuous Latent Space","primary_cat":"cs.CL","submitted_at":"2024-12-09T18:55:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"Appendix A Datasets A.1 Examples We provide some examples of the questions and CoT solutions for the datasets used in our experiments. GSM8k Question = \"John cuts his grass to 2 inches. It grows .5 inches per month. When it gets to 4 inches he cuts it back down to 2 inches. It cost $100 to get his grass cut. How much does he pay per year?\" Steps = [\"«4-2=2»\", \"«2/.5=4»\", \"«12/4=3»\", \"«100*3=300»\"] Answer = \"300\" ProntoQA Question = \"Brimpuses are not luminous. Shumpuses are amenable. Each yumpus is a lorpus. Gorpuses are shumpuses. Each zumpus is a grimpus. Gorpuses are rompuses. Dumpuses are not floral. Lempuses are cold. Brimpuses are impuses. Every lorpus is floral. Every rompus is transparent. Grimpuses are muffled. Rompuses are yumpuses."}],"limit":50,"offset":0}