LoopVLA adds recurrent refinement and learned sufficiency estimation to VLA models, cutting parameters 45% and raising throughput 1.7x while matching baseline task success on LIBERO and VLA-Arena.
citation dossier
Scaling latent reasoning via looped language models
why this work matters in Pith
Pith has found this work in 18 reviewed papers. Its strongest current cluster is cs.LG (11 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
years
2026 18representative citing papers
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
A 53K-parameter model generates 95% valid SMILES on ZINC-250K, outperforming larger models, by resolving chemical constraints in fixed order: brackets first, rings second, valence last.
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
Looped MoE models scale better than standard transformers because different experts activate on each loop pass, recovering expressivity without extra parameters, and support superior early exits.
MELT decouples reasoning depth from memory in looped LLMs by sharing a single gated KV cache per layer and using two-phase chunk-wise distillation from Ouro, delivering constant memory use while matching or beating standard LLM performance.
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.
LASER tracks low-rank activation subspaces in recursive models via matrix-free SVD updates and fidelity resets to save 60% memory without accuracy loss.
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
Looped transformer hidden states encode preferences relationally via pairwise differences rather than independent pointwise classification, with the evaluator acting as an internal consistency probe on the model's own value system.
Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.
Hyperloop Transformers outperform standard and mHC Transformers with roughly 50% fewer parameters by looping a middle block of layers and applying hyper-connections only after each loop.
LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.
Workshop report identifies AI gaps in physical interaction, brittle learning, and energy inefficiency, then proposes neuroscience principles and a research roadmap for NeuroAI.
citing papers explorer
-
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
LoopVLA adds recurrent refinement and learned sufficiency estimation to VLA models, cutting parameters 45% and raising throughput 1.7x while matching baseline task success on LIBERO and VLA-Arena.
-
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
-
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
-
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
-
SMolLM: Small Language Models Learn Small Molecular Grammar
A 53K-parameter model generates 95% valid SMILES on ZINC-250K, outperforming larger models, by resolving chemical constraints in fixed order: brackets first, rings second, valence last.
-
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
-
A Mechanistic Analysis of Looped Reasoning Language Models
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
-
Sparse Layers are Critical to Scaling Looped Language Models
Looped MoE models scale better than standard transformers because different experts activate on each loop pass, recovering expressivity without extra parameters, and support superior early exits.
-
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
MELT decouples reasoning depth from memory in looped LLMs by sharing a single gated KV cache per layer and using two-phase chunk-wise distillation from Ouro, delivering constant memory use while matching or beating standard LLM performance.
-
The Power of Power Law: Asymmetry Enables Compositional Reasoning
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
-
Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning
Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.
-
LASER: Low-Rank Activation SVD for Efficient Recursion
LASER tracks low-rank activation subspaces in recursive models via matrix-free SVD updates and fidelity resets to save 60% memory without accuracy loss.
-
Parcae: Scaling Laws For Stable Looped Language Models
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
-
Relational Preference Encoding in Looped Transformer Internal States
Looped transformer hidden states encode preferences relationally via pairwise differences rather than independent pointwise classification, with the evaluator acting as an internal consistency probe on the model's own value system.
-
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
Recurrent-depth transformers achieve systematic generalization and depth extrapolation on implicit reasoning tasks through iterative layer reuse, a three-stage grokking process, and inference-time scaling, while vanilla transformers fail.
-
Hyperloop Transformers
Hyperloop Transformers outperform standard and mHC Transformers with roughly 50% fewer parameters by looping a middle block of layers and applying hyper-connections only after each loop.
-
LEPO: Latent Reasoning Policy Optimization for Large Language Models
LEPO applies RL to continuous latent representations in LLMs by injecting Gumbel-Softmax stochasticity for diverse trajectory sampling and unified gradient estimation, outperforming existing discrete and latent RL methods.
-
NeuroAI and Beyond: Bridging Between Advances in Neuroscience and ArtificialIntelligence
Workshop report identifies AI gaps in physical interaction, brittle learning, and energy inefficiency, then proposes neuroscience principles and a research roadmap for NeuroAI.