hub

Scaling latent reasoning via looped language models

· 2025 · arXiv 2510.25741

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

LoopVLA adds recurrent refinement and learned sufficiency estimation to VLA models, cutting parameters 45% and raising throughput 1.7x while matching baseline task success on LIBERO and VLA-Arena.

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

cs.CL · 2026-05-10 · conditional · novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.

Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.

SMolLM: Small Language Models Learn Small Molecular Grammar

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A 53K-parameter model generates 95% valid SMILES on ZINC-250K, outperforming larger models, by resolving chemical constraints in fixed order: brackets first, rings second, valence last.

LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction

cs.IR · 2026-04-21 · unverdicted · novelty 7.0

LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.

A Mechanistic Analysis of Looped Reasoning Language Models

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.

N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

N-vium achieves 57.9% wall-clock speedup over matched standard transformers at no perplexity cost by mixing exact predictions from multiple model depths.

Sparse Layers are Critical to Scaling Looped Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Looped MoE models scale better than standard transformers because different experts activate on each loop pass, recovering expressivity without extra parameters, and support superior early exits.

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

MELT decouples reasoning depth from memory in looped LLMs by sharing a single gated KV cache per layer and using two-phase chunk-wise distillation from Ouro, delivering constant memory use while matching or beating standard LLM performance.

The Power of Power Law: Asymmetry Enables Compositional Reasoning

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

cs.LG · 2026-04-23 · conditional · novelty 6.0

Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.

LASER: Low-Rank Activation SVD for Efficient Recursion

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

LASER tracks low-rank activation subspaces in recursive models via matrix-free SVD updates and fidelity resets to save 60% memory without accuracy loss.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

Relational Preference Encoding in Looped Transformer Internal States

cs.LG · 2026-04-10 · conditional · novelty 6.0

Looped transformer hidden states encode preferences relationally via pairwise differences rather than independent pointwise classification, with the evaluator acting as an internal consistency probe on the model's own value system.

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.

Hyperloop Transformers

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

Hyperloop Transformers outperform standard and mHC Transformers with roughly 50% fewer parameters by looping a middle block of layers and applying hyper-connections only after each loop.

LEPO: Latent Reasoning Policy Optimization for Large Language Models

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.

NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence

q-bio.NC · 2026-04-19 · unverdicted · novelty 3.0

Workshop report identifies AI gaps in physical interaction, brittle learning, and energy inefficiency, then proposes neuroscience principles and a research roadmap for NeuroAI.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models cs.CL · 2026-05-10 · conditional · none · ref 104
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 1
MELT decouples reasoning depth from memory in looped LLMs by sharing a single gated KV cache per layer and using two-phase chunk-wise distillation from Ouro, delivering constant memory use while matching or beating standard LLM performance.
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers cs.CL · 2026-04-09 · unverdicted · none · ref 3
Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.

Scaling latent reasoning via looped language models

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer