Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
hub Canonical reference
Zico Kolter, and Vladlen Koltun
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A differentiable neural framework for learning state- and time-dependent parameters of finite-state mean field games from population trajectories via implicit differentiation.
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.
Thermodynamic networks using non-equilibrium steady states achieve universal function approximation when engineered with negative differential conductance, as shown in quantum dot and enzymatic examples for sine fitting and MNIST classification.
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.
SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
A contraction-theory separation principle yields global exponential stability for controller-observer pairs and sharp LMI certificates for contractive RNNs, enabling stable output tracking and implicit neural network design.
citing papers explorer
-
Quantum Injection Pathways for Implicit Graph Neural Networks
Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.