pith. sign in

hub Canonical reference

Chain-of-thought prompting elicits reasoning in large language models

Canonical reference. 77% of citing Pith papers cite this work as background.

74 Pith papers citing it
Background 77% of classified citations

hub tools

citation-role summary

background 20 method 4 other 2

citation-polarity summary

claims ledger

  • background model's own correct rollouts, applying a symmetric efficiency reward that penalizes both overthinking and over-compression. Evaluated on five mathematical reason- ing benchmarks, LEAD achieves the highest accuracy and Accuracy-Efficiency Score among RL-trained efficient-reasoning methods while producing substantially shorter outputs than the base model. 1 Introduction Chain-of-thought (CoT) prompting [1] shows that large language models (LLMs) can improve com- plex problem solving through explic
  • method Finding 3: Existing VLMs can fail to extract visual information and improve strategic reasoning and decision-making performance with multimodal observations. 4.2 Test-time scaling We observe in the evaluation results that reasoning models generally achieve better performance than chat models. We further investigate the test-time scaling of VLMs in multi-agent environments by using Chain-of-Thought (CoT) [76] prompting for chat models and comparing their performance with reasoning models and chat
  • background progress in AI, giving rise to Multimodal Chain-of-Thought (MCoT) reasoning [27, 28]. The MCoT topic has generated a spectrum of innovative outcomes due to both the CoT attributes and the het- erogeneous nature of cross-modal data interactions. On one hand, the original CoT framework has evolved into advanced reasoning architectures incorporating hierarchical thought structures, from linear sequences [19] to graph-based representations [23]. On the other hand, unlike the unimodal text setting, d
  • background Figure 4: Typical training-free test-time enhancing methods: verbal reinforcement search, memory- based reinforcement, and agentic system search. Table 3: A list of representative works of training-free test-time reinforcing. Method Category Representative literature Verbal Reinforcement Search Individual Agent Romera et al.[115], Shojaee et al.[130], Mysocki et al.[162],Ma et al.[88] Multi-Agent Chen et al.[20],Zhou et al.[199], Le et al.[69] ,Yu et al.[176] Embodied Agent Boiko et al.[13] Memo
  • background Further analyses demonstrate thatRISlearns diverse, interpretable, and progressively integrated latent trajectories, offering a practical path toward faithful internal visual reasoning in MLLMs. 1 Introduction Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse vision-language tasks, largely due to Chain-of-Thought (CoT) reasoning[ 1, 2]. However, these models still treat visual information as static preconditions, converting continuous visual features into d
  • background reward structure, and optimization dynamics shape the attainable trade-off. Such analysis may help 9 distinguish removable redundancy from reasoning steps that are genuinely necessary for correctness. Ultimately, this line of research points toward lossless reasoning compression, where models can reduce unnecessary computation while preserving the full reasoning accuracy of long responses. References [1] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zh

co-cited works

representative citing papers

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MoleCode unlocks structural intelligence in large language models

q-bio.BM · 2026-05-15 · unverdicted · novelty 7.0

MoleCode is a training-free, LLM-native representation that makes molecular graphs with explicit atoms, bonds, and topology directly readable and editable in language models, improving structural tasks over implicit string encodings.

Latent Abstraction for Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.

Evaluating the Search Agent in a Parallel World

cs.AI · 2026-03-05 · unverdicted · novelty 7.0

Mind-ParaWorld creates parallel worlds with atomic facts to evaluate search agents on future scenarios, showing they synthesize evidence well but struggle with collection, coverage, sufficiency judgment, and stopping decisions.

GRIT: Teaching MLLMs to Think with Images

cs.CV · 2025-05-21 · unverdicted · novelty 7.0

GRIT introduces a grounded reasoning paradigm for MLLMs where reasoning chains interleave text and bounding boxes, trained via GRPO-GR reinforcement learning on as few as 20 examples without annotations.

Video-R1: Reinforcing Video Reasoning in MLLMs

cs.CV · 2025-03-27 · conditional · novelty 7.0

Video-R1 uses temporal-aware RL and mixed datasets to boost video reasoning in MLLMs, with a 7B model reaching 37.1% on VSI-Bench and surpassing GPT-4o.

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

Generative Recursive Reasoning

cs.AI · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.

citing papers explorer

Showing 50 of 74 citing papers.