hub

Let’s think dot by dot: Hidden computation in transformer language models.arXiv preprint arXiv:2404.15758

Jacob Pfau, William Merrill, Samuel R Bowman · 2024 · arXiv 2404.15758

29 Pith papers cite this work. Polarity classification is still indexing.

29 Pith papers citing it

read on arXiv browse 29 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Transformers Provably Learn to Internalize Chain-of-Thought

cs.LG · 2026-05-27 · unverdicted · novelty 8.0

L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.

The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

cs.LG · 2026-05-11 · accept · novelty 8.0 · 2 refs

Corruption studies of CoT faithfulness largely measure explicit answer placement in prompt format rather than computational importance of reasoning steps.

Does Verbose Chain-of-Thought Really Help? In-Distribution Evidence that Content, Not Length, Matters

cs.AI · 2026-06-29 · accept · novelty 7.0

In-distribution sampling across 25 models and controlled interventions with DAG-verified content show that semantic reasoning and validation content, not token count, drive CoT gains.

PearlVLA: Progressive Embodied Action-Plan Refinement in Latent Space

cs.RO · 2026-06-16 · unverdicted · novelty 7.0

PearlVLA achieves SOTA on LIBERO by separating VLM representations into visual grounding and an iterative latent plan branch refined via world model queries and RefineNet with process-reward RL.

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

SWITCH uses explicit <swi> and </swi> boundary tokens to make latent chain-of-thought compatible with on-policy RL (GRPO) and open to causal mechanistic probing, outperforming prior hidden-state recurrence methods.

Unlocking the Working Memory of Large Language Models for Latent Reasoning

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

RiM trains LLMs to perform latent reasoning via fixed memory blocks processed in one forward pass using a two-stage curriculum, matching or exceeding prior latent methods on benchmarks.

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

CoT probe-time gains arise primarily from lexical activation and short-range token co-occurrence rather than sentence-level logical derivation.

Training-Free Looped Transformers

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

PLUME: Latent Reasoning Based Universal Multimodal Embedding

cs.CV · 2026-04-02 · unverdicted · novelty 7.0

PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.

Enabling Agents to Communicate Entirely in Latent Space

cs.LG · 2025-11-12 · unverdicted · novelty 7.0

Interlat lets LLM agents exchange last hidden states in latent space for communication, outperforming CoT baselines across models while enabling up to 24x faster inference via compression.

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

cs.CL · 2025-06-12 · unverdicted · novelty 7.0

A learned continue-thinking token, trained via RL on its embedding alone, improves math benchmark accuracy more than fixed-token budget forcing in a frozen language model.

Training Large Language Models to Reason in a Continuous Latent Space

cs.CL · 2024-12-09 · unverdicted · novelty 7.0

Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

LARM enables test-time compute scaling in non-autoregressive ASR via depth-conditioned looping with CTC checkpoints, supervision embeddings, FiLM conditioning, and posterior feedback, yielding lower WER on LibriSpeech with more loops.

CIRF: Tokenizing Chain-of-Thoughts into Reusable Functional Units for Efficient Latent Reasoning in Large Language Models

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

CIRF tokenizes CoT traces into functional units, fine-tunes models to autoregressively emit these tokens plus optional results, and reports improved accuracy-latency trade-offs on math, symbolic, and commonsense benchmarks.

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

cs.AI · 2026-05-23 · unverdicted · novelty 6.0

Premature confidence in LLM chains of thought predicts flawed reasoning and is mitigated by progressive confidence shaping, a label-free RL objective that yields accuracy gains on arithmetic, math, and science tasks.

Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

cs.CL · 2026-05-08 · conditional · novelty 6.0

Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.

SeLaR: Selective Latent Reasoning in Large Language Models

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

cs.CL · 2025-10-10 · unverdicted · novelty 6.0

MPS proposes a dual-brain architecture separating formulation reasoning from articulation to achieve real-time CoT in SLMs with accuracy comparable to full pre-computation but much lower latency.

Conversable Complexity: Agentic LLM Collectives as Interpretable Substrates

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

Agentic LLM collectives are proposed as natural-language-interpretable computational substrates for ALife research.

DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

DiscoLoop adds a discrete embedding channel to looped transformers to fix representational misalignment in two-hop reasoning, yielding near-perfect accuracy on synthetic tasks and better pretraining loss on real data.

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior

cs.LG · 2026-05-26 · unverdicted · novelty 5.0

Latent Recurrent Transformer augments autoregressive transformers with a cross-layer recurrent latent pathway from prior hidden states and uses interleaved parallel training to improve loss and in-context learning at ~0.3% extra parameters.

NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.

citing papers explorer

Showing 23 of 23 citing papers after filters.

Transformers Provably Learn to Internalize Chain-of-Thought cs.LG · 2026-05-27 · unverdicted · none · ref 36
L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.
PearlVLA: Progressive Embodied Action-Plan Refinement in Latent Space cs.RO · 2026-06-16 · unverdicted · none · ref 16
PearlVLA achieves SOTA on LIBERO by separating VLM representations into visual grounding and an iterative latent plan branch refined via world model queries and RefineNet with process-reward RL.
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning cs.LG · 2026-06-11 · unverdicted · none · ref 55
SWITCH uses explicit <swi> and </swi> boundary tokens to make latent chain-of-thought compatible with on-policy RL (GRPO) and open to causal mechanistic probing, outperforming prior hidden-state recurrence methods.
Unlocking the Working Memory of Large Language Models for Latent Reasoning cs.CL · 2026-05-28 · unverdicted · none · ref 14
RiM trains LLMs to perform latent reasoning via fixed memory blocks processed in one forward pass using a two-stage curriculum, matching or exceeding prior latent methods on benchmarks.
What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation cs.AI · 2026-05-26 · unverdicted · none · ref 18
CoT probe-time gains arise primarily from lexical activation and short-range token co-occurrence rather than sentence-level logical derivation.
Training-Free Looped Transformers cs.LG · 2026-05-22 · unverdicted · none · ref 72
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning cs.CL · 2026-05-19 · unverdicted · none · ref 21
CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.
PLUME: Latent Reasoning Based Universal Multimodal Embedding cs.CV · 2026-04-02 · unverdicted · none · ref 33
PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.
Enabling Agents to Communicate Entirely in Latent Space cs.LG · 2025-11-12 · unverdicted · none · ref 2
Interlat lets LLM agents exchange last hidden states in latent space for communication, outperforming CoT baselines across models while enabling up to 24x faster inference via compression.
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling cs.CL · 2025-06-12 · unverdicted · none · ref 3
A learned continue-thinking token, trained via RL on its embedding alone, improves math benchmark accuracy more than fixed-token budget forcing in a frozen language model.
Training Large Language Models to Reason in a Continuous Latent Space cs.CL · 2024-12-09 · unverdicted · none · ref 23
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers cs.LG · 2026-06-03 · unverdicted · none · ref 16
LARM enables test-time compute scaling in non-autoregressive ASR via depth-conditioned looping with CTC checkpoints, supervision embeddings, FiLM conditioning, and posterior feedback, yielding lower WER on LibriSpeech with more loops.
CIRF: Tokenizing Chain-of-Thoughts into Reusable Functional Units for Efficient Latent Reasoning in Large Language Models cs.CL · 2026-05-27 · unverdicted · none · ref 2
CIRF tokenizes CoT traces into functional units, fine-tunes models to autoregressively emit these tokens plus optional results, and reports improved accuracy-latency trade-offs on math, symbolic, and commonsense benchmarks.
Understanding and Mitigating Premature Confidence for Better LLM Reasoning cs.AI · 2026-05-23 · unverdicted · none · ref 18
Premature confidence in LLM chains of thought predicts flawed reasoning and is mitigated by progressive confidence shaping, a label-free RL objective that yields accuracy gains on arithmetic, math, and science tasks.
SeLaR: Selective Latent Reasoning in Large Language Models cs.CL · 2026-04-09 · unverdicted · none · ref 28
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models cs.CL · 2025-10-10 · unverdicted · none · ref 25
MPS proposes a dual-brain architecture separating formulation reasoning from articulation to achieve real-time CoT in SLMs with accuracy comparable to full pre-computation but much lower latency.
Conversable Complexity: Agentic LLM Collectives as Interpretable Substrates cs.CL · 2026-07-01 · unverdicted · none · ref 86
Agentic LLM collectives are proposed as natural-language-interpretable computational substrates for ALife research.
DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning cs.CL · 2026-07-01 · unverdicted · none · ref 16
DiscoLoop adds a discrete embedding channel to looped transformers to fix representational misalignment in two-hop reasoning, yielding near-perfect accuracy on synthetic tasks and better pretraining loss on real data.
Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior cs.LG · 2026-05-26 · unverdicted · none · ref 16
Latent Recurrent Transformer augments autoregressive transformers with a cross-layer recurrent latent pathway from prior hidden states and uses interleaved parallel training to improve loss and in-context learning at ~0.3% extra parameters.
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning cs.LG · 2026-05-06 · unverdicted · none · ref 49
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
LLM Reasoning Is Latent, Not the Chain of Thought cs.AI · 2026-04-17 · unverdicted · none · ref 26
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
Integrated and Cross-Architecture Interpretation of LLM Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 23
Proposes IAR framework using MIP token isolation, DTR overlap analysis, and Jaccard stability to interpret reasoning patterns in Qwen and Llama models across math, code, logic, and commonsense domains.
Measuring AI Reasoning: A Guide for Researchers cs.AI · 2026-05-04 · unverdicted · none · ref 134
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

Let’s think dot by dot: Hidden computation in transformer language models.arXiv preprint arXiv:2404.15758

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer