hub

Pondernet: Learning to ponder.CoRR, abs/2107.05407

Andrea Banino, Jan Balaguer, Charles Blundell , title = · 2021 · arXiv 2107.05407

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

Structure over Pixels: Learning Variable-Length Visual Programs

cs.CV · 2026-05-26 · unverdicted · novelty 7.0

STROP learns variable-length discrete visual programs for images by training a length head against frozen DINOv3 features in a four-phase curriculum while bypassing pixel reconstruction.

Training-Free Looped Transformers

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.

A Mechanistic Analysis of Looped Reasoning Language Models

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.

Stabilizing Extrapolation in Looped Transformers via Learned Stochastic Stopping

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Stochastic loop counts during training of looped transformers reduce OOD variance on binary addition, Dyck-1, Unique Set and Copy tasks, with learned RL-Halting further improving the accuracy-stability trade-off.

Finding the Time to Think: Learning Planning Budgets in Real-Time RL

cs.LG · 2026-06-24 · unverdicted · novelty 6.0

Trains a gating policy to select state-dependent planning budgets in variable-delay real-time RL, outperforming fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

FPRM is a Transformer-based model using fixed-point convergence for adaptive halting in looped architectures, claimed effective on Sudoku, Maze, state-tracking, and ARC-AGI benchmarks.

A Dual-Path Architecture for Scaling Compute and Capacity in LLMs

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

Dual-path blocks with deep shared and wide non-shared sublayers plus per-token gates outperform iso-FLOP baselines on language modeling while using fewer parameters.

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.

Do Not Imitate, Reinforce: Iterative Classification via Belief Refinement

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

RIC replaces single-pass label imitation with RL-driven iterative belief refinement, recovering cross-entropy optima while enabling adaptive halting via a value function.

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

cs.LG · 2026-04-23 · conditional · novelty 6.0

Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.

When to Think Fast and Slow? AMOR: Adaptive Entropy Gate for Hybrid Models

cs.AI · 2026-01-22 · unverdicted · novelty 6.0

AMOR uses output entropy to gate attention in recurrent hybrids, matching full attention performance at roughly 22% attention invocations across 180M-1.5B models.

Review Residuals: Update-Conditioned Residual Gating for Transformers

cs.LG · 2026-06-30 · unverdicted · novelty 5.0

Review Residuals add an update-conditioned gate to transformer residual connections, yielding depth-stable training and performance gains that emerge and grow with model size from 590M parameters upward.

Dense Supervision Is Not Enough: The Readout Blind Spot in Looped Language Models

cs.LG · 2026-06-12 · unverdicted · novelty 5.0

Dense per-loop cross-entropy in looped transformers fails to control hidden-state scale with scale-invariant readouts like RMSNorm, driving norms to thousands, while scale-visible readouts or norm penalties keep norms small and improve perplexity.

Hierarchical Reasoning Model

cs.AI · 2025-06-26 · unverdicted · novelty 5.0

HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

cs.LG · 2026-05-27 · unverdicted · novelty 3.0

Presents CosmicFish-HRM, a compact LM using hierarchical recurrent reasoning to adapt computation depth per input.

Scaling Latent Reasoning via Looped Language Models

cs.CL · 2025-10-29

citing papers explorer

Showing 2 of 2 citing papers after filters.

Hierarchical Reasoning Model cs.AI · 2025-06-26 · unverdicted · none · ref 92
HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.
Scaling Latent Reasoning via Looped Language Models cs.CL · 2025-10-29 · unreviewed · ref 32

Pondernet: Learning to ponder.CoRR, abs/2107.05407

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer