DeepLatent introduces a parallel latent visual reasoning framework with learnable 2D tokens and continuous RL, trained via distillation then RL, plus a new 180K dataset, claiming SOTA benchmark results.
hub Canonical reference
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Canonical reference. 100% of citing Pith papers cite this work as background.
abstract
Chain-of-thought (CoT) decoding enables language models to improve reasoning performance at the cost of high generation latency in decoding. Recent proposals have explored variants of contemplation tokens, a term we introduce that refers to special tokens used during inference to allow for extra computation. Prior work has considered fixed-length sequences drawn from a discrete set of embeddings as contemplation tokens. Here we propose Compressed Chain-of-Thought (CCoT), a framework to generate contentful and continuous contemplation tokens of variable sequence length. The generated contemplation tokens are compressed representations of explicit reasoning chains, and our method can be applied to off-the-shelf decoder language models. Through experiments, we illustrate how CCoT enables additional reasoning over dense contentful representations to achieve corresponding improvements in accuracy. Moreover, the reasoning improvements can be adaptively modified on demand by controlling the number of contemplation tokens generated.
hub tools
citation-role summary
citation-polarity summary
roles
background 13polarities
background 13representative citing papers
RiM trains LLMs to perform latent reasoning via fixed memory blocks processed in one forward pass using a two-stage curriculum, matching or exceeding prior latent methods on benchmarks.
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
Abstract-CoT lets models reason with short discrete latent token sequences from a reserved vocabulary, using warm-up training and RL to match verbal CoT performance with up to 11.6x fewer tokens.
V-Reflection introduces a think-then-look mechanism where MLLM latent states actively interrogate visual features via two-stage distillation from a box-guided teacher to a dynamic autoregressive student, narrowing the fine-grained perception gap on benchmarks.
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
CoLT replaces text-based chain-of-thought in MLLMs with 3-step latent thought chains supervised by a removable external decoder in forward and backward modes, yielding 10.1x faster inference on eight benchmarks.
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.
VisReflect generates continuous latent visual reflections to emphasize relevant visual features and guide attention in LVLMs, yielding 4.1% gains on image benchmarks and 1.8% on video benchmarks with 44% less inference time than zooming methods.
CLSR lets LLM agents evolve and route symbolic languages that reduce generated tokens by 3-6x versus chain-of-thought while keeping accuracy on benchmarks.
Dynamic Rollout Editing reduces overthinking in RL-trained LLMs by editing post-answer continuations in successful rollouts and preferring the edited versions within GRPO groups.
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
MPCoT improves long-horizon VLA performance on LIBERO and CALVIN by initializing M latent hypotheses, refining them over K steps, and aggregating via a reward-trained path scorer while preserving the original 8-step action interface and generating zero reasoning tokens.
ALAR trains LLM agents to perform most reasoning in a latent space supervised by actions and escalates to explicit CoT only when needed, cutting tokens by up to 84.6% while preserving accuracy on search and tool-use benchmarks.
SpecFlow represents intermediate visual thoughts in fixed-size DCT space and uses classifier-free guidance to steer updates from textual thoughts, achieving up to 2.1x lower computation and KV cache costs.
ThinkSwitch uses iterative self-distillation with QLoRA and spherical weight interpolation to raise both instruct and thinking checkpoint accuracy on small AIME and PubMedQA sets using only 15 human prompts per domain.
Latent interventions can reactivate attack effects in clean executions of latent-based multi-agent systems, degrading performance especially via inter-agent KV-cache handoffs.
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
citing papers explorer
-
Generative Spatiotemporal Intent Sequence Recommendation via Implicit Reasoning in Amap
GPlan compresses LLM reasoning into small models via Progressive Implicit CoT Distillation and Spatiotemporal Counterfactual DPO to generate logically coherent and physically executable intent sequences for recommendation.