super hub Mixed citations

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba · 2014 · cs.LG · arXiv 1412.6980

Mixed citation behavior. Most common role is background (50%).

697 Pith papers citing it

Background 50% of classified citations

open full Pith review browse 697 citing papers more from Diederik P. Kingma arXiv PDF

abstract

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 3

citation-polarity summary

background 4 use method 3 unclear 1

claims ledger

abstract We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little

authors

Diederik P. Kingma Jimmy Ba

co-cited works

representative citing papers

ENSEMBITS: an alphabet of protein conformational ensembles

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Ensembits creates a discrete vocabulary for protein conformational ensembles that outperforms static tokenizers on dynamics prediction tasks and enables ensemble token prediction from single structures via distillation.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · unverdicted · novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models

cs.LG · 2026-05-09 · unverdicted · novelty 8.0

In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.

Convergent Stochastic Training of Attention and Understanding LoRA

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.

SLayerGen: a Crystal Generative Model for all Space and Layer Groups

cond-mat.mtrl-sci · 2026-05-07 · unverdicted · novelty 8.0

SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.

3DSS: 3D Surface Splatting for Inverse Rendering

cs.GR · 2026-05-07 · unverdicted · novelty 8.0 · 3 refs

3DSS is the first differentiable surface splatting renderer that recovers shape, spatially-varying BRDF materials, and HDR illumination from multi-view images via a coverage-based compositing model derived from reconstruction kernels.

Random test functions, $H^{-1}$ norm equivalence, and stochastic variational physics-informed neural networks

math.NA · 2026-05-05 · unverdicted · novelty 8.0

H^{-1} norm equivalence to expected squared evaluations on domain-dependent random test functions enables SV-PINNs that recover accurate solutions to challenging second-order elliptic PDEs faster than standard PINNs.

A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate

math.OC · 2026-05-04 · conditional · novelty 8.0

PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.

Characterizing the Expressivity of Local Attention in Transformers

cs.CL · 2026-05-01 · unverdicted · novelty 8.0

Local attention strictly enlarges the class of regular languages recognizable by fixed-precision transformers by adding a second past operator in linear temporal logic, with global and local attention being expressively complementary.

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

cs.CR · 2026-05-01 · unverdicted · novelty 8.0

STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.

Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions

quant-ph · 2026-04-29 · unverdicted · novelty 8.0

Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.

Neural Spectral Bias and Conformal Correlators I: Introduction and Applications

hep-th · 2026-04-20 · unverdicted · novelty 8.0

Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.

MMGait: Towards Multi-Modal Gait Recognition

cs.CV · 2026-04-17 · conditional · novelty 8.0

MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.

Proton Structure from Neural Simulation-Based Inference at the LHC

hep-ph · 2026-04-14 · unverdicted · novelty 8.0

Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.

Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

math.OC · 2026-04-09 · unverdicted · novelty 8.0

Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

cs.CV · 2026-04-03 · unverdicted · novelty 8.0

The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Offline Reinforcement Learning with Implicit Q-Learning

cs.LG · 2021-10-12 · unverdicted · novelty 8.0

IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.

Passage Re-ranking with BERT

cs.IR · 2019-01-13 · unverdicted · novelty 8.0

Fine-tuning BERT for query-passage relevance classification achieves state-of-the-art results on TREC-CAR and MS MARCO, with a 27% relative gain in MRR@10 over prior methods.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

citing papers explorer

Showing 50 of 198 citing papers after filters.

ENSEMBITS: an alphabet of protein conformational ensembles cs.LG · 2026-05-13 · unverdicted · none · ref 10 · internal anchor
Ensembits creates a discrete vocabulary for protein conformational ensembles that outperforms static tokenizers on dynamics prediction tasks and enables ensemble token prediction from single structures via distillation.
Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models cs.LG · 2026-05-09 · unverdicted · none · ref 95 · internal anchor
In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.
Convergent Stochastic Training of Attention and Understanding LoRA cs.LG · 2026-05-08 · unverdicted · none · ref 2 · internal anchor
Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow cs.LG · 2022-09-07 · unverdicted · none · ref 31 · internal anchor
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Offline Reinforcement Learning with Implicit Q-Learning cs.LG · 2021-10-12 · unverdicted · none · ref 7 · internal anchor
IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 33 · internal anchor
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks cs.LG · 2015-11-19 · accept · none · ref 9 · internal anchor
DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
NICE: Non-linear Independent Components Estimation cs.LG · 2014-10-30 · accept · none · ref 17 · internal anchor
NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 62 · internal anchor
QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment cs.LG · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
Temperature adjustment on the reference model generalizes inference-time alignment to SLOP ensembles of reward models, with a calibration algorithm that improves robustness to reward hacking while preserving alignment performance.
Beyond Oversquashing: Understanding Signal Propagation in GNNs Via Observables cs.LG · 2026-05-13 · unverdicted · none · ref 4 · internal anchor
Quantum-inspired observables reveal poor signal routing in standard spectral GNNs and motivate Schrödinger GNNs with superior propagation capacity.
Identifying the nonlinear string dynamics with port-Hamiltonian neural networks cs.LG · 2026-05-12 · unverdicted · none · ref 30 · internal anchor
Port-Hamiltonian neural networks extended to PDEs recover the Hamiltonian and dissipation of nonlinear string dynamics from data and outperform non-physics-informed baselines.
Scaling Laws for Mixture Pretraining Under Data Constraints cs.LG · 2026-05-12 · conditional · none · ref 47 · internal anchor
Repetition-aware scaling laws show scarce target data in pretraining mixtures can be repeated 15-20 times optimally, with the best count depending on data size, compute, and model scale.
Spectral Energy Centroid: a Metric for Improving Performance and Analyzing Spectral Bias in Implicit Neural Representations cs.LG · 2026-05-12 · unverdicted · none · ref 4 · internal anchor
Spectral Energy Centroid is a new metric that quantifies signal frequency and INR spectral bias, supporting better hyperparameter selection and cross-architecture analysis.
Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale cs.LG · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.
Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning cs.LG · 2026-05-12 · unverdicted · none · ref 22 · internal anchor
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
Delightful Gradients Accelerate Corner Escape cs.LG · 2026-05-12 · unverdicted · none · ref 30 · internal anchor
Delightful Policy Gradient removes exponential corner trapping in softmax policy optimization for bandits and tabular MDPs, achieving logarithmic escape times and global O(1/t) convergence.
Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters cs.LG · 2026-05-12 · unverdicted · none · ref 33 · internal anchor
Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data cs.LG · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models cs.LG · 2026-05-11 · unverdicted · none · ref 28 · internal anchor
Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.
LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling cs.LG · 2026-05-11 · unverdicted · none · ref 104 · internal anchor
LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.
The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently cs.LG · 2026-05-11 · unverdicted · none · ref 119 · internal anchor
Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs cs.LG · 2026-05-11 · unverdicted · none · ref 31 · internal anchor
Update direction selection for PINN training is cast as a Chebyshev-center problem in the dual cone, yielding an efficient dual formulation with nonconvex convergence guarantees and automatic recovery of scale robustness and simultaneous descent.
End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor cs.LG · 2026-05-10 · conditional · none · ref 17 · internal anchor
An FPGA implementation of a neuromorphic auditory sensor plus graph neural network achieves 87.43% accuracy on Google Speech Commands v2 with sub-35 µs latency and 1.12 W power.
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic cs.LG · 2026-05-09 · unverdicted · none · ref 26 · internal anchor
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration cs.LG · 2026-05-09 · unverdicted · none · ref 7 · internal anchor
Partial orthogonalization from power iteration accelerates zeroth-order Muon by 1.5x-4x on LLM fine-tuning tasks while maintaining competitive accuracy.
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow cs.LG · 2026-05-09 · unverdicted · none · ref 57 · internal anchor
Spatio-Temporal MeanFlow adapts MeanFlow to PDEs by replacing the generative velocity field with the physical operator and extending the integral constraint to the spatio-temporal domain, yielding a unified solver for time-dependent and stationary equations with improved accuracy and generalization.
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning cs.LG · 2026-05-09 · unverdicted · none · ref 34 · internal anchor
The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
NeuralBench: A Unifying Framework to Benchmark NeuroAI Models cs.LG · 2026-05-08 · conditional · none · ref 261 · internal anchor
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data cs.LG · 2026-05-08 · unverdicted · none · ref 7 · internal anchor
ADD-PINN adaptively decomposes the spatial domain based on PINN residuals and a shock indicator to improve offline traffic state estimation under the LWR model, outperforming baselines in most sparse-sensor cases while training faster.
What Cohort INRs Encode and Where to Freeze Them cs.LG · 2026-05-08 · unverdicted · none · ref 30 · internal anchor
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction cs.LG · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
PLOT localizes causal variables in neural networks by fitting optimal transport couplings between abstract and neural intervention effect geometries, enabling fast handles or guided search.
Path-Coupled Bellman Flows for Distributional Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 4 · internal anchor
Path-Coupled Bellman Flows use source-consistent Bellman-coupled paths and a lambda-parameterized control-variate to learn return distributions via flow matching, improving fidelity and stability over prior DRL approaches.
Layer Collapse in Diffusion Language Models cs.LG · 2026-05-07 · unverdicted · none · ref 9 · 2 links · internal anchor
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
Fast Gauss-Newton for Multiclass Cross-Entropy cs.LG · 2026-05-07 · unverdicted · none · ref 16 · internal anchor
FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.
Accelerating LMO-Based Optimization via Implicit Gradient Transport cs.LG · 2026-05-07 · unverdicted · none · ref 9 · internal anchor
LMO-IGT achieves O(ε^{-3.5}) iteration complexity for stochastic LMO optimization via implicit gradient transport with a single gradient per step and introduces the regularized support function as a unified stationarity measure.
Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors cs.LG · 2026-05-06 · unverdicted · none · ref 32 · internal anchor
Diffusion model priors enable training-free Bayesian sampling for more accurate rain field reconstruction from path-integrated commercial microwave link measurements than Gaussian process baselines.
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning cs.LG · 2026-05-06 · unverdicted · none · ref 45 · internal anchor
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
Complex Equation Learner: Rational Symbolic Regression with Gradient Descent in Complex Domain cs.LG · 2026-05-05 · unverdicted · none · ref 20 · internal anchor
A complex-weight extension to the Equation Learner enables stable recovery of symbolic expressions containing real-domain poles and unconstrained use of singular operators such as division and logarithm.
Learning reveals invisible structure in low-rank RNNs cs.LG · 2026-05-05 · unverdicted · none · ref 34 · internal anchor
Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.
KANs need curvature: penalties for compositional smoothness cs.LG · 2026-05-04 · unverdicted · none · ref 23 · internal anchor
A curvature penalty for KANs, derived to respect compositional effects and equipped with a proven upper bound on full-model curvature, produces smoother activations while preserving accuracy.
Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions cs.LG · 2026-05-03 · unverdicted · none · ref 4 · internal anchor
The algorithm achieves a delay-regime-agnostic regret bound of O~(d(sqrt(T) + C + D)) with additive costs for corruption and delay under parametric post-serving context assumptions.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unverdicted · none · ref 12 · internal anchor
FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events cs.LG · 2026-05-02 · unverdicted · none · ref 50 · internal anchor
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
NLPOpt-Net: A Learning Method for Nonlinear Optimization with Feasibility Guarantees cs.LG · 2026-04-30 · unverdicted · none · ref 31 · internal anchor
NLPOpt-Net is an unsupervised neural architecture that learns parametric solutions to constrained NLPs by pairing a backbone network with quadratic projection layers that guarantee feasibility and near-zero constraint violations.
Low Rank Adaptation for Adversarial Perturbation cs.LG · 2026-04-30 · unverdicted · none · ref 24 · internal anchor
Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space cs.LG · 2026-04-30 · unverdicted · none · ref 32 · internal anchor
ABC enables any-subset autoregressive generation of continuous stochastic processes via non-Markovian diffusion bridges that track physical time and allow path-dependent conditioning.
Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport cs.LG · 2026-04-29 · unverdicted · none · ref 26 · internal anchor
HyCNNs are a new architecture that learns convex functions with exponentially fewer parameters than ICNNs and outperforms them in convex regression and high-dimensional optimal transport on synthetic and single-cell RNA data.
On-Device Vision Training, Deployment, and Inference on a Thumb-Sized Microcontroller cs.LG · 2026-04-24 · accept · none · ref 9 · internal anchor
A full on-device vision system trains a two-layer CNN with Adam optimization and runs inference at 6.3 FPS on a $15-40 ESP32 microcontroller using 1750 lines of self-contained C++.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 26 · internal anchor
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

Adam: A Method for Stochastic Optimization

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer