super hub Mixed citations

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba · 2014 · cs.LG · arXiv 1412.6980

Mixed citation behavior. Most common role is background (50%).

691 Pith papers citing it

Background 50% of classified citations

open full Pith review browse 691 citing papers more from Diederik P. Kingma arXiv PDF

abstract

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 2

citation-polarity summary

background 3 use method 2 unclear 1

claims ledger

abstract We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little

authors

Diederik P. Kingma Jimmy Ba

co-cited works

representative citing papers

ENSEMBITS: an alphabet of protein conformational ensembles

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Ensembits creates a discrete vocabulary for protein conformational ensembles that outperforms static tokenizers on dynamics prediction tasks and enables ensemble token prediction from single structures via distillation.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · unverdicted · novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models

cs.LG · 2026-05-09 · unverdicted · novelty 8.0

In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.

Convergent Stochastic Training of Attention and Understanding LoRA

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.

SLayerGen: a Crystal Generative Model for all Space and Layer Groups

cond-mat.mtrl-sci · 2026-05-07 · unverdicted · novelty 8.0

SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.

Random test functions, $H^{-1}$ norm equivalence, and stochastic variational physics-informed neural networks

math.NA · 2026-05-05 · unverdicted · novelty 8.0

H^{-1} norm equivalence to expected squared evaluations on domain-dependent random test functions enables SV-PINNs that recover accurate solutions to challenging second-order elliptic PDEs faster than standard PINNs.

A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate

math.OC · 2026-05-04 · conditional · novelty 8.0

PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.

Characterizing the Expressivity of Local Attention in Transformers

cs.CL · 2026-05-01 · unverdicted · novelty 8.0

Local attention strictly enlarges the class of regular languages recognizable by fixed-precision transformers by adding a second past operator in linear temporal logic, with global and local attention being expressively complementary.

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

cs.CR · 2026-05-01 · unverdicted · novelty 8.0

STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.

Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions

quant-ph · 2026-04-29 · unverdicted · novelty 8.0

Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.

Neural Spectral Bias and Conformal Correlators I: Introduction and Applications

hep-th · 2026-04-20 · unverdicted · novelty 8.0

Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.

MMGait: Towards Multi-Modal Gait Recognition

cs.CV · 2026-04-17 · conditional · novelty 8.0

MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.

Proton Structure from Neural Simulation-Based Inference at the LHC

hep-ph · 2026-04-14 · unverdicted · novelty 8.0

Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.

Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

math.OC · 2026-04-09 · unverdicted · novelty 8.0

Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

cs.CV · 2026-04-03 · unverdicted · novelty 8.0

The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Offline Reinforcement Learning with Implicit Q-Learning

cs.LG · 2021-10-12 · unverdicted · novelty 8.0

IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.

Passage Re-ranking with BERT

cs.IR · 2019-01-13 · unverdicted · novelty 8.0

Fine-tuning BERT for query-passage relevance classification achieves state-of-the-art results on TREC-CAR and MS MARCO, with a 27% relative gain in MRR@10 over prior methods.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

cs.LG · 2015-11-19 · accept · novelty 8.0

DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.

citing papers explorer

Showing 50 of 691 citing papers.

ENSEMBITS: an alphabet of protein conformational ensembles cs.LG · 2026-05-13 · unverdicted · none · ref 10 · internal anchor
Ensembits creates a discrete vocabulary for protein conformational ensembles that outperforms static tokenizers on dynamics prediction tasks and enables ensemble token prediction from single structures via distillation.
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 32 · internal anchor
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
Online Learning-to-Defer with Varying Experts stat.ML · 2026-05-12 · unverdicted · none · ref 24 · internal anchor
Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models cs.LG · 2026-05-09 · unverdicted · none · ref 95 · internal anchor
In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to generative phenomena including double descent and out-of-equilibrium biases.
Convergent Stochastic Training of Attention and Understanding LoRA cs.LG · 2026-05-08 · unverdicted · none · ref 2 · internal anchor
Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
SLayerGen: a Crystal Generative Model for all Space and Layer Groups cond-mat.mtrl-sci · 2026-05-07 · unverdicted · none · ref 58 · internal anchor
SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting a prior loss inconsistency for hexagonal groups.
Random test functions, $H^{-1}$ norm equivalence, and stochastic variational physics-informed neural networks math.NA · 2026-05-05 · unverdicted · none · ref 36 · internal anchor
H^{-1} norm equivalence to expected squared evaluations on domain-dependent random test functions enables SV-PINNs that recover accurate solutions to challenging second-order elliptic PDEs faster than standard PINNs.
A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate math.OC · 2026-05-04 · conditional · none · ref 23 · internal anchor
PF-AGD is the first parameter-free deterministic accelerated first-order method with Õ(ε^{-5/3} log(1/ε)) complexity for smooth non-convex optimization.
Characterizing the Expressivity of Local Attention in Transformers cs.CL · 2026-05-01 · unverdicted · none · ref 18 · internal anchor
Local attention strictly enlarges the class of regular languages recognizable by fixed-precision transformers by adding a second past operator in linear temporal logic, with global and local attention being expressively complementary.
STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack cs.CR · 2026-05-01 · unverdicted · none · ref 4 · internal anchor
STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.
Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions quant-ph · 2026-04-29 · unverdicted · none · ref 35 · internal anchor
Qvine uses vine copula-inspired quantum circuit structures to achieve linear or quadratic depth scaling for loading high-dimensional distributions with high approximation quality.
Neural Spectral Bias and Conformal Correlators I: Introduction and Applications hep-th · 2026-04-20 · unverdicted · none · ref 18 · internal anchor
Neural networks optimized solely on crossing symmetry reconstruct CFT correlators from minimal input data to few-percent accuracy across generalized free fields, minimal models, Ising, N=4 SYM, and AdS diagrams.
MMGait: Towards Multi-Modal Gait Recognition cs.CV · 2026-04-17 · conditional · none · ref 41 · internal anchor
MMGait provides a new multi-sensor gait dataset and OmniGait baseline to support single-modal, cross-modal, and unified multi-modal person identification from walking patterns.
Proton Structure from Neural Simulation-Based Inference at the LHC hep-ph · 2026-04-14 · unverdicted · none · ref 130 · internal anchor
Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.
Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate math.OC · 2026-04-09 · unverdicted · none · ref 1 · internal anchor
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification cs.CV · 2026-04-03 · unverdicted · none · ref 22 · internal anchor
The paper introduces the CMCC-ReID task, constructs the SYSU-CMCC benchmark dataset, and proposes the PIA network with disentangling and prototype modules that outperforms prior methods on combined modality and clothing variations.
Traces of Helium Detected in Type Ic Supernova 2014L astro-ph.HE · 2026-03-31 · accept · none · ref 54 · internal anchor
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning cs.AI · 2023-06-05 · conditional · none · ref 32 · internal anchor
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow cs.LG · 2022-09-07 · unverdicted · none · ref 31 · internal anchor
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Offline Reinforcement Learning with Implicit Q-Learning cs.LG · 2021-10-12 · unverdicted · none · ref 7 · internal anchor
IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
Passage Re-ranking with BERT cs.IR · 2019-01-13 · unverdicted · none · ref 7 · internal anchor
Fine-tuning BERT for query-passage relevance classification achieves state-of-the-art results on TREC-CAR and MS MARCO, with a 27% relative gain in MRR@10 over prior methods.
Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 33 · internal anchor
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Adaptive Computation Time for Recurrent Neural Networks cs.NE · 2016-03-29 · accept · none · ref 18 · internal anchor
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks cs.LG · 2015-11-19 · accept · none · ref 9 · internal anchor
DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
NICE: Non-linear Independent Components Estimation cs.LG · 2014-10-30 · accept · none · ref 17 · internal anchor
NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 62 · internal anchor
QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.
Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo cond-mat.str-el · 2026-05-13 · conditional · none · ref 48 · internal anchor
PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
Learning to Optimize Radiotherapy Plans via Fluence Maps Diffusion Model Generation and LSTM-based Optimization cs.CV · 2026-05-13 · unverdicted · none · ref 18 · internal anchor
A distilled diffusion model generates clinically feasible fluence maps for VMAT and an LSTM-based optimizer refines them to meet dose objectives, improving efficiency and deliverability on prostate cancer data.
A Majorization-Minimization with Monte Carlo Approach for Hyperparameter Estimation math.NA · 2026-05-13 · unverdicted · none · ref 54 · internal anchor
M³C replaces the hard hyperparameter optimization with a sequence of simpler problems using a majorant for the log-determinant approximated via Monte Carlo, with proven high-probability convergence to a critical point under assumptions.
Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment cs.LG · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
Temperature adjustment on the reference model generalizes inference-time alignment to SLOP ensembles of reward models, with a calibration algorithm that improves robustness to reward hacking while preserving alignment performance.
VLTI/PIONIER imaging of post-AGB binaries. An INSPIRING hunt for inner rim substructures in circumbinary discs astro-ph.SR · 2026-05-13 · unverdicted · none · ref 108 · internal anchor
High-resolution interferometric imaging of eight post-AGB circumbinary discs reveals diverse inner-rim substructures including azimuthal brightness enhancements and arc-like features not explained by inclination alone.
Beyond Oversquashing: Understanding Signal Propagation in GNNs Via Observables cs.LG · 2026-05-13 · unverdicted · none · ref 4 · internal anchor
Quantum-inspired observables reveal poor signal routing in standard spectral GNNs and motivate Schrödinger GNNs with superior propagation capacity.
Spatial Competition for Low-Complexity Learned Image Compression eess.IV · 2026-05-13 · unverdicted · none · ref 25 · internal anchor
Spatial competition among specialized neural codecs with a transmitted mode map achieves up to 14.5% rate savings over a single codec while matching HEVC performance at single-codec decoding complexity.
Backdoor Channels Hidden in Latent Space: Cryptographic Undetectability in Modern Neural Networks cs.CR · 2026-05-13 · unverdicted · none · ref 20 · internal anchor
Backdoors can be realized as statistically natural latent directions in modern neural networks, achieving high attack success with negligible clean accuracy loss and resisting existing defenses.
STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition cs.CV · 2026-05-13 · conditional · none · ref 70 · internal anchor
STAR improves 1-shot action recognition by up to 8.1% on SSv2-Full through semantic-temporal alignment and Mamba-based prototype refinement.
On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods stat.ML · 2026-05-13 · unverdicted · none · ref 35 · internal anchor
Hallucinations in inverse problem reconstructions are fundamental to ill-posedness, with necessary and sufficient conditions plus computable bounds depending only on the forward model.
Identifying the nonlinear string dynamics with port-Hamiltonian neural networks cs.LG · 2026-05-12 · unverdicted · none · ref 30 · internal anchor
Port-Hamiltonian neural networks extended to PDEs recover the Hamiltonian and dissipation of nonlinear string dynamics from data and outperform non-physics-informed baselines.
Spectral Energy Centroid: a Metric for Improving Performance and Analyzing Spectral Bias in Implicit Neural Representations cs.LG · 2026-05-12 · unverdicted · none · ref 4 · internal anchor
Spectral Energy Centroid is a new metric that quantifies signal frequency and INR spectral bias, supporting better hyperparameter selection and cross-architecture analysis.
Newton methods beyond Hessian Lipschitz continuity: A nonlinear preconditioning approach math.OC · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
Nonlinear preconditioning extends Newton methods to objectives lacking Hessian Lipschitz continuity by analyzing a transformed mapping under a relaxed smoothness condition, with superlinear convergence and O(ε^{-3/2}) iteration complexity.
Revisiting Photometric Ambiguity for Accurate Gaussian-Splatting Surface Reconstruction cs.CV · 2026-05-12 · unverdicted · none · ref 87 · internal anchor
AmbiSuR adds intrinsic photometric disambiguation and a self-indication module to Gaussian Splatting to resolve ambiguities and improve surface reconstruction accuracy.
SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation cs.CV · 2026-05-12 · unverdicted · none · ref 82 · internal anchor
SEMIR replaces dense voxel computation with a learned topology-preserving graph minor that supports exact decoding and GNN-based inference for small-structure segmentation in large medical images.
Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale cs.LG · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.
Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning cs.LG · 2026-05-12 · unverdicted · none · ref 22 · internal anchor
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
Delightful Gradients Accelerate Corner Escape cs.LG · 2026-05-12 · unverdicted · none · ref 30 · internal anchor
Delightful Policy Gradient removes exponential corner trapping in softmax policy optimization for bandits and tabular MDPs, achieving logarithmic escape times and global O(1/t) convergence.
AccLock: Unlocking Identity with Heartbeat Using In-Ear Accelerometers cs.CR · 2026-05-12 · unverdicted · none · ref 34 · internal anchor
AccLock extracts user-specific features from in-ear ballistocardiogram signals via a disentanglement model and Siamese network to achieve average FAR of 3.13% and FRR of 2.99% in tests with 33 participants.
Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters cs.LG · 2026-05-12 · unverdicted · none · ref 33 · internal anchor
Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).
Bin Latent Transformer (BiLT): A shift-invariant autoencoder for calibration-free spectral unmixing of turbid media physics.optics · 2026-05-12 · unverdicted · none · ref 22 · internal anchor
The BiLT autoencoder recovers absorption and scattering spectra from integrating sphere data with high accuracy while remaining robust to wavelength shifts up to 10 bands and generalizing to different instrument line shapes without retraining.
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data cs.LG · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
Variational predictive resampling stat.ME · 2026-05-11 · conditional · none · ref 37 · internal anchor
Variational predictive resampling iteratively imputes data from a variational predictive to produce posterior samples that converge to the exact Bayesian posterior in Gaussian models where mean-field VI retains a gap.
Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models cs.LG · 2026-05-11 · unverdicted · none · ref 28 · internal anchor
Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.

Adam: A Method for Stochastic Optimization

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer