{"total":11,"items":[{"citing_arxiv_id":"2606.23155","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Neural Parameter Calibration for Finite-State Mean Field Games","primary_cat":"cs.GT","submitted_at":"2026-06-22T11:00:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A differentiable neural framework for learning state- and time-dependent parameters of finite-state mean field games from population trajectories via implicit differentiation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31215","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Fixed-Point Masked Generative Modeling","primary_cat":"cs.LG","submitted_at":"2026-05-29T12:19:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15985","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Thermodynamic Networks: Harnessing Non-Equilibrium Steady States for Computation","primary_cat":"quant-ph","submitted_at":"2026-05-15T14:19:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Thermodynamic networks using non-equilibrium steady states achieve universal function approximation when engineered with negative differential conductance, as shown in quantum dot and enzymatic examples for sine fitting and MNIST classification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09226","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Quantum Injection Pathways for Implicit Graph Neural Networks","primary_cat":"quant-ph","submitted_at":"2026-05-09T23:51:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"can improve accuracy but often results in growing optimization difficulty [6]. This motivates alternatives that reduce reliance on increasingly deep explicit propagation. Deep equilibrium models (DEQs) offer one such alternative: they define outputs as fixed points of a single nonlinear operator, trained by implicit differentiation through that fixed point [7].Implicit graph modelsadapt this idea to graph learning [8], where the solution of agraph-dependentfixed- point equation (instead of the output of an explicit propagation message-passing GNN) plays the role of the final layer node representation. This yields the representational reach of an arbitrarily deep network at the memory cost of a single layer."},{"citing_arxiv_id":"2605.00206","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning","primary_cat":"cs.LG","submitted_at":"2026-04-30T20:30:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"InInternational Conference on Learning Representations (ICLR 2024), 2023. URLhttps://arxiv.org/abs/2309.12252. [49] Federico Danieli, Pau Rodriguez, Miguel Sarabia, Xavier Suau, and Luca Zappella. ParaRNN: Unlocking parallel training of nonlinear RNNs for large language models. InInternational Conference on Learning Representations (ICLR 2026), 2025. URLhttps://arxiv.org/abs/2510.21450. Oral. [50] Patrick L. Combettes and Jean-Christophe Pesquet. Lipschitz certificates for layered network structures driven by averaged activation operators, 2019. URLhttps://arxiv.org/abs/1903.01014. [51] Biao Zhang and Rico Sennrich. Root mean square layer normalization. InAdvances in Neural Information Processing Systems (NeurIPS 2019), 2019. URLhttps://arxiv."},{"citing_arxiv_id":"2604.15259","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Stability and Generalization in Looped Transformers","primary_cat":"cs.LG","submitted_at":"2026-04-16T17:35:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15238","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Nonlinear Separation Principle via Contraction Theory: Applications to Neural Networks, Control, and Learning","primary_cat":"eess.SY","submitted_at":"2026-04-16T17:12:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A contraction-theory separation principle yields global exponential stability for controller-observer pairs and sharp LMI certificates for contractive RNNs, enabling stable output tracking and implicit neural network design.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"n= 8neurons andtanh (·)activations. We apply Propo- sitions 27 and 30 to design a controller and an observer respectively, and design a gain for the integral controller via Lemma 33. As shown in Figure 2, our control architecture successfully tracks piecewise constant references. VI. APPLICATIONS INMACHINELEARNING We now utilize S-contraction for Deep Equilibrium Models (DEQs) [3]. DEQs replace finite-depth architectures by defin- ing their output as solution of an implicit equation, which in turn is solved by iterating a dynamical system. For these models to be well-posed, the equilibrium must be unique and globally asymptotically stable-properties natively guaranteed by a contracting continuous-time FRNN. We first derive an unconstrained parameterization of weight"},{"citing_arxiv_id":"2604.12946","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Parcae: Scaling Laws For Stable Looped Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-14T16:43:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"modeling scaling laws (Section 2.3). Prior work has studied looped architectures along several design axes: loop placement (pre-, mid-, or post-looping) [68], halting mechanism (explicit routers [6, 94] vs. implicit stochastic depth [ 27, 49]), topology (single block [ 27] or hierarchical [ 38, 79]) and differentiation (explicit or implicit backpropagation [ 7]). Our work focuses on implicit-halting middle-looped architectures using explicit differentiation; an extended review is in Section B. 2.1 Existing Middle-Looped Architectures In this paper, we focus on middle-looped architectures [ 27, 68]. Middle-looped recurrent depth architecture contains three units: an initialpreludeunit P, a middlerecurrentunit R, and a"},{"citing_arxiv_id":"2604.09168","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ELT: Elastic Looped Transformers for Visual Generation","primary_cat":"cs.CV","submitted_at":"2026-04-10T09:53:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"input-dependent dynamic and adaptive depth in looped transformers. Mixture-of-Recursions-VIT [45] extends it for image understanding. Fan et al.[16] utilized looping for length generalization. Geiping et al. Geiping et al.[21] demonstrated that scaling test-time compute via recurrent depth allows language models to perform complex latent reasoning. Deep Equilibrium Models (DEQs) [4, 54, 22, 47, 1, 17], instead of unrolling a weight-tied layer for a fixed number of iterations, define the output as the fixed point of a non-linear transformation. Unlike DEQs that rely on black-box solvers for an analytical fixed point, our ELT framework explicitly optimizes unrolled intermediate states via Intra-Loop Self Distillation (ILSD), retaining the flexibility"},{"citing_arxiv_id":"2509.04154","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation","primary_cat":"cs.LG","submitted_at":"2025-09-04T12:29:14+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.05171","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach","primary_cat":"cs.LG","submitted_at":"2025-02-07T18:55:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}