L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.
hub Mixed citations
arXiv preprint arXiv:2311.01460 , year=
Mixed citation behavior. Most common role is background (60%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
DLR creates discrete latent tokens from rendered CoT images via clustering, enabling up to 20x compression and interpretable trajectories that outperform continuous latent baselines on reasoning tasks.
MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
RiM trains LLMs to perform latent reasoning via fixed memory blocks processed in one forward pass using a two-stage curriculum, matching or exceeding prior latent methods on benchmarks.
COLAGUARD matches explicit-reasoning guardrail performance on safety benchmarks while delivering 12.9X speedup and 22.4X token reduction by propagating hidden states instead of generating text.
SpiralThinker stabilizes iterative latent reasoning in LLMs via text-latent interleaving and progressive alignment, achieving SOTA results among latent baselines on math, logic, and commonsense tasks.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
CoLT replaces text-based chain-of-thought in MLLMs with 3-step latent thought chains supervised by a removable external decoder in forward and backward modes, yielding 10.1x faster inference on eight benchmarks.
LOTUS uses a looped padded Transformer with parallel cross-entropy supervision on gold CoT tokens to match explicit CoT performance at 3B parameters while reducing thought-phase latency 2.5x-6.9x.
ParaBridge applies on-policy self-distillation with a scaffold as privileged view to convert brittle inference-time paralinguistic guidance into stable model behavior, raising VoxSafeBench SAR from 14.6% to 40.3% on Qwen3-Omni-thinking while preserving general capabilities.
Dropout-GRPO uses structured dropout to generate trajectory variance for GRPO in latent-reasoning models like Coconut, raising GSM8K pass@1 from 27.29% to 29.01%.
AGCLR extends CoCoNuT with a gated concept stream for persistent memory to fix fact loss in latent reasoning, yielding improvements on reasoning benchmarks as depth increases.
LoRi distills implicit chain-of-thought by matching low-rank structures in hidden states, raising math-reasoning accuracy toward explicit CoT levels on LLaMA and Qwen models.
GLR formulates latent reasoning as geometric path approximation in pretrained embedding space and reports shorter LLM generations on math tasks without an explicit length penalty.
Coarser compressed CoT needs more SFT data, scales differently with repetition, and RL later breaks apart the compressed steps learned in SFT.
LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.
TTE-Flash trains latent think tokens with CoT generation loss and embedding tokens with contrastive loss to deliver high-performance multimodal representations without generating explicit reasoning at inference time.
FLR factorizes latent reasoning into multiple preference factors using multi-factor attention and regularizations, outperforming baselines on recommendation benchmarks while adding robustness and interpretability.
Two-stage fine-tuning distills multi-agent debate into single LLMs, matching performance at 93% lower token cost while revealing agent-specific activation subspaces for steering.
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
LLMs discover latent planning strategies up to five steps during training and execute them up to eight steps at test time, with larger models reaching seven under few-shot prompting, revealing a dissociation between discovery and execution.
citing papers explorer
-
Robust and Efficient Guardrails with Latent Reasoning
COLAGUARD matches explicit-reasoning guardrail performance on safety benchmarks while delivering 12.9X speedup and 22.4X token reduction by propagating hidden states instead of generating text.
-
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
AGCLR extends CoCoNuT with a gated concept stream for persistent memory to fix fact loss in latent reasoning, yielding improvements on reasoning benchmarks as depth increases.
-
Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training
Coarser compressed CoT needs more SFT data, scales differently with repetition, and RL later breaks apart the compressed steps learned in SFT.
-
TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens
TTE-Flash trains latent think tokens with CoT generation loss and embedding tokens with contrastive loss to deliver high-performance multimodal representations without generating explicit reasoning at inference time.
-
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate
Two-stage fine-tuning distills multi-agent debate into single LLMs, matching performance at 93% lower token cost while revealing agent-specific activation subspaces for steering.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
Frontier AI models' no-CoT 50% task-completion time horizons have doubled yearly over six years, reaching over 3 minutes for GPT-5.5 with projections to 25 minutes by 2030.
-
MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models
MIRAGE compresses explicit chain-of-thought into latent vectors and adds a generative world model to predict future interface states, matching explicit reasoning performance with 3-5x fewer tokens on Android benchmarks.
-
JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications
Oxygen AIIC is an industrial platform using LLMs and VLMs for scalable item knowledge production and service at JD.com, reporting 94.2% precision and 82.8% recall along with business metric improvements.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.