Aframe neural network achieves matched-filter sensitivity for binary neutron star GW searches at lower computational cost using heterodyning and a single GPU.
hub
On the difficulty of training Recurrent Neural Networks
17 Pith papers cite this work. Polarity classification is still indexing.
abstract
There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlinearity regime.
Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
A compact NQS architecture for VBS and doped sVBS states reaches high fidelity with fewer parameters than standard baselines by using solvable-point-guided designs and explicit spin-hole sector separation.
S-Adam modulates updates via an LGI-based damping term and proves almost-sure convergence to Clarke stationary points at O(1/sqrt(T)) while reporting accuracy gains on CIFAR-100 and TinyImageNet.
NEON provides uncertainty-aware operator learning for composite Bayesian optimization in function spaces using a single network, achieving claimed SOTA with orders of magnitude fewer parameters than ensembles.
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
GNMT deploys 8-layer LSTMs with attention, wordpieces, low-precision inference, and coverage-penalized beam search to match state-of-the-art on WMT'14 En-Fr and En-De while cutting translation errors by 60% in human evaluations.
A physics-informed CNN predicts pore-scale velocity fields from geometry and serves as a warm-start to accelerate Lattice-Boltzmann solvers in over 90% of tested cases.
Multimodal training with attention and contrastive multi-view learning improves both combined and acoustic-only emotion recognition on IEMOCAP over prior acoustic baselines.
A physics-informed neural network infers pT spectra of pi, K, p, Lambda, and Ks in unmeasured rapidity regions from PYTHIA8 pp collisions at 13.6 TeV, achieving 1.5-5.83% yield uncertainties while reproducing yield ratios and freeze-out parameters.
Differential privacy techniques can help prevent overfitting and improve generalization in deep neural networks.
Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.
Adaptive replacements for domain-specific components in deep RL agents can yield better learning on new tasks without additional tuning.
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
citing papers explorer
-
AI-enabled gravitational-waves searches for binary neutron stars at optimal sensitivity
Aframe neural network achieves matched-filter sensitivity for binary neutron star GW searches at lower computational cost using heterodyning and a single GPU.
-
Coherent-State Propagation: A Computational Framework for Simulating Bosonic Quantum Systems
Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlinearity regime.
-
In Defense of Information Leakage in Concept-based Models
Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.
-
Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
-
Compact Spin-Charge Separated Neural Quantum States for Valence-Bond States
A compact NQS architecture for VBS and doped sVBS states reaches high fidelity with fewer parameters than standard baselines by using solvable-point-guided designs and explicit spin-hole sector separation.
-
Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization
S-Adam modulates updates via an LGI-based damping term and proves almost-sure convergence to Clarke stationary points at O(1/sqrt(T)) while reporting accuracy gains on CIFAR-100 and TinyImageNet.
-
Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks
NEON provides uncertainty-aware operator learning for composite Bayesian optimization in function spaces using a single network, achieving claimed SOTA with orders of magnitude fewer parameters than ensembles.
-
Adaptive Federated Optimization
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
-
PaLM: Scaling Language Modeling with Pathways
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
-
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
GNMT deploys 8-layer LSTMs with attention, wordpieces, low-precision inference, and coverage-penalized beam search to match state-of-the-art on WMT'14 En-Fr and En-De while cutting translation errors by 60% in human evaluations.
-
Physics-informed convolutional neural networks for fluid flow through porous media
A physics-informed CNN predicts pore-scale velocity fields from geometry and serves as a warm-start to accelerate Lattice-Boltzmann solvers in over 90% of tested cases.
-
Multimodal and Multi-view Models for Emotion Recognition
Multimodal training with attention and contrastive multi-view learning improves both combined and acoustic-only emotion recognition on IEMOCAP over prior acoustic baselines.
-
Inferring identified hadron production in $pp$ collisions with physics-informed machine learning at the LHC
A physics-informed neural network infers pT spectra of pi, K, p, Lambda, and Ks in unmeasured rapidity regions from PYTHIA8 pp collisions at 13.6 TeV, achieving 1.5-5.83% yield uncertainties while reproducing yield ratios and freeze-out parameters.
-
Preventing overfitting in deep learning using differential privacy
Differential privacy techniques can help prevent overfitting and improve generalization in deep neural networks.
-
Autoencoding sensory substitution
Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.
-
On Inductive Biases in Deep Reinforcement Learning
Adaptive replacements for domain-specific components in deep RL agents can yield better learning on new tasks without additional tuning.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.