Introduces REST-ASMR multimodal dataset of PPG, stimuli, and continuous annotations for ASMR research, validated with 97% responder rate, significant agreement, PPG deceleration, and BiLSTM achieving 75.51% frame-level accuracy under strict subject-video independent 4-fold CV.
mega hub Mixed citations
Adam: A Method for Stochastic Optimization
Mixed citation behavior. Most common role is method (50%).
abstract
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.
Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.
Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.
3DSS is the first differentiable surface splatting renderer that recovers shape, spatially-varying BRDF materials, and HDR illumination from multi-view images via a coverage-based compositing model derived from reconstruction kernels.
PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.
STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.
Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.
Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.
MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.
Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
Adam achieves a δ^{-1/2} high-probability convergence rate while SGD requires at least δ^{-1} due to second-moment normalization, established via stopping-time/martingale analysis under bounded variance.
A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.
TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.
IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
citing papers explorer
-
A multimodal dataset of photoplethysmography and continuous behavioral responses to ASMR and nature videos
Introduces REST-ASMR multimodal dataset of PPG, stimuli, and continuous annotations for ASMR research, validated with 97% responder rate, significant agreement, PPG deceleration, and BiLSTM achieving 75.51% frame-level accuracy under strict subject-video independent 4-fold CV.
-
Canonical Regularisation of Wide Feature-Learning Neural Networks
Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.
-
ENSEMBITS: an alphabet of protein conformational ensembles
Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
-
Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models
In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.
-
Convergent Stochastic Training of Attention and Understanding LoRA
Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
-
SLayerGen: a Crystal Generative Model for all Space and Layer Groups
SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.
-
3DSS: 3D Surface Splatting for Inverse Rendering
3DSS is the first differentiable surface splatting renderer that recovers shape, spatially-varying BRDF materials, and HDR illumination from multi-view images via a coverage-based compositing model derived from reconstruction kernels.
-
A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate
PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.
-
STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack
STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.
-
Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions
Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.
-
Neural Spectral Bias and Conformal Correlators I: Introduction and Applications
Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.
-
MMGait: Towards Multi-Modal Gait Recognition
MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.
-
Proton Structure from Neural Simulation-Based Inference at the LHC
Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.
-
Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
-
CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification
The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.
-
Traces of Helium Detected in Type Ic Supernova 2014L
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
-
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
Adam achieves a δ^{-1/2} high-probability convergence rate while SGD requires at least δ^{-1} due to second-moment normalization, established via stopping-time/martingale analysis under bounded variance.
-
Automated discovery of heralded ballistic graph state generators for fusion-based photonic quantum computation
A two-pass optimization framework with polynomial-based simulation discovers heralded ballistic circuits for 3-5 qubit graph states achieving up to 7.5x higher success probabilities than fusion baselines, including first known circuits for some 5-qubit states.
-
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.
-
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
-
Discovering Latent Knowledge in Language Models Without Supervision
An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.
-
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
-
Locating and Editing Factual Associations in GPT
Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.
-
Offline Reinforcement Learning with Implicit Q-Learning
IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
-
PathVQA: 30000+ Questions for Medical Visual Question Answering
PathVQA is the first public dataset of over 32,000 questions on nearly 5,000 pathology images for medical visual question answering.
-
Scaling Laws for Neural Language Models
Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.
-
MIPaaL: Mixed Integer Program as a Layer
MIPaaL differentiates through mixed integer programs via cutting planes to enable decision-focused learning for general MIPs, outperforming two-stage prediction-plus-optimization and LP-relaxation baselines on real-world domains.
-
AGAN: Towards Automated Design of Generative Adversarial Networks
AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
-
Passage Re-ranking with BERT
Fine-tuning BERT for query-passage relevance classification achieves state-of-the-art results on TREC-CAR and MS MARCO, with a 27% relative gain in MRR@10 over prior methods.
-
Neural Ordinary Differential Equations
Neural networks are redefined as continuous dynamical systems by learning the derivative of the hidden state with a neural network and integrating it with an ODE solver.
-
Density estimation using Real NVP
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
-
Adaptive Computation Time for Recurrent Neural Networks
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
-
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
-
NICE: Non-linear Independent Components Estimation
NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.
-
ffortissimo: A Freeform Forward-Modeling Pipeline for High-Contrast Images of Circumstellar Disks Based on Automatic Differentiation
ffortissimo is a JAX-based freeform forward-modeling pipeline that fits complex dust distributions and infers scattering properties in KLIP-reduced images of circumstellar disks such as HR 4796A.
-
Neural Parameter Calibration for Finite-State Mean Field Games
A differentiable neural framework for learning state- and time-dependent parameters of finite-state mean field games from population trajectories via implicit differentiation.
-
Atomistic Mechanisms of Hard Carbon Formation from Polyvinylidene Chloride
Machine-learned potential simulations show hard carbon from PVDC forms via radical-mediated dehydrochlorination and sp2 cross-linking, with non-hexagonal rings inducing curvature that prevents graphitic order.
-
Optimal scenario design for climate emulation
Optimizing training data via a differentiable SCM yields climate emulators that outperform those trained on six standard ScenarioMIP pathways while using less data and isolating distinct forcing responses.
-
Spatially Masked Regression Reveals Local and Distributed Predictability in Electrophysiological Recordings
SMR applied to EEG and iEEG data shows strong reconstruction persists after excluding local neighbors, indicating that electrode signals contain both local redundancy and broader distributed structure.
-
Learning the Universe: The Structure of Dust Attenuation Curves in Galaxy Simulations
Four parameters suffice to describe dust attenuation curve diversity in TNG simulations, yielding a new symbolic-regression model that recovers curves and fluxes better than existing parameterizations while linking parameters to SFR surface density, metallicity, and geometry.
-
Discovering Misconceptions and Misunderstandings From Administrations of Research-Designed Multiple Choice Instruments
Multidimensional IRT analysis of 34k FCI administrations identifies 22 robust misconception dimensions and computes student/class scores revealing varied post-instruction remediation patterns.
-
DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions
DragOn provides a new drag-grounding benchmark and training dataset for GUI agents, with evaluations suggesting potential improvements on computer-use tasks.
-
Inverse Critical Experiment Design via Gradient Optimization and a Multigroup Attention-Based Neural Network Architecture
A U-Net surrogate with multigroup attention pooling is trained on OpenMC sensitivity data and combined with gradient optimization to generate grid-based critical experiment geometries that achieve c_k values up to 0.97757 for HALEU fuel validation.
-
Exploiting weight-space symmetries for approximating curvature
A framework that builds tractable structured Hessian approximations by averaging over user-chosen weight-space symmetry groups, recovering Shampoo-like estimates for one choice of group.
-
The Dynamic-Probabilistic Consistency Gap in Chaotic Surrogate Modeling
Exposes a dynamic-probabilistic consistency gap in chaotic dynamical systems reconstruction and introduces the KAFFEE differentiable extended Kalman filter training framework to address it.
-
Learning Through Noise: Why Subliminal Learning Works and When It Fails
Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.
-
Valid and Expressive Copulas for Irregular Multivariate Time Series
CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.
-
Non-normal spectral signatures of instability in neural network training dynamics
Non-normality in linearized optimizer update operators yields a pseudospectral bound where κ(V) warns of transient amplification before spectral radius indicates instability.
-
Classical State Preparation for Variational Quantum Algorithms via Reinforcement Learning
CRiSP uses neural-guided MCTS and curriculum learning to insert Clifford prefixes before parameterized rotations in VQAs, yielding mean 3.17x and max 45x gains in energy accuracy on 22-qubit QAOA benchmarks versus prior Clifford initializers.