Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Joan Bruna; Michael M. Bronstein; Petar Veli\v{c}kovi\'c; Taco Cohen

arxiv: 2104.13478 · v2 · submitted 2021-04-27 · 💻 cs.LG · cs.AI· cs.CG· cs.CV· stat.ML

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M. Bronstein , Joan Bruna , Taco Cohen , Petar Veli\v{c}kovi\'c This is my paper

Pith reviewed 2026-05-13 02:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CGcs.CVstat.ML

keywords geometric deep learningErlangen programsymmetriesequivarianceneural architecturesCNNGNNunification

0 comments

The pith

Geometric principles provide a unified framework for CNNs, RNNs, GNNs, and Transformers while enabling the design of new architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to show that the success of deep learning stems from its ability to capture geometric regularities in data rather than learning arbitrary high-dimensional functions. By drawing on ideas from Klein's Erlangen program, it organizes different neural network types around symmetries and structures such as grids, groups, graphs, geodesics, and gauges. This approach supplies both a retrospective explanation for why certain architectures work well and a forward method for building models that respect physical priors. A sympathetic reader would care because it promises to make model design less empirical and more principled, potentially accelerating progress in areas where data has known structure.

Core claim

The authors claim that a geometric unification in the spirit of the Erlangen program furnishes a common mathematical framework for studying successful neural network architectures including CNNs, RNNs, GNNs, and Transformers, while simultaneously offering a constructive procedure to embed prior physical knowledge into neural architectures and to create future ones in a principled manner.

What carries the argument

The geometric structures corresponding to grids, groups, graphs, geodesics, and gauges, which encode symmetries and regularities of data domains to define appropriate neural network operations.

If this is right

CNNs on image grids are special cases of group-equivariant networks on the appropriate symmetry group.
Graph neural networks arise naturally when the data domain is a graph with its automorphism group.
Transformers can be viewed as operating on sets or sequences with permutation or other symmetries.
New models for data on manifolds or with gauge symmetries can be derived systematically rather than by trial and error.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying this lens could help identify which tasks are currently under-served by existing architectures due to mismatched geometric assumptions.
It suggests that improvements in one domain, such as better group convolutions, might transfer to others through the shared framework.
Scientific applications in physics and biology might benefit most, as their data often has explicit geometric structure.
Over time, this could shift machine learning from architecture search to geometry-informed design.

Load-bearing premise

The majority of interesting learning tasks possess essential pre-defined regularities that originate from the low-dimensional structure of the physical world and that can be captured by geometric principles.

What would settle it

Demonstrating a task with strong physical structure where no geometric neural network architecture matches or exceeds the performance of a generic black-box model would challenge the claim that geometric unification is broadly useful.

read the original abstract

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The manuscript is a survey articulating a geometric unification of deep learning architectures (CNNs on grids, GNNs on graphs, Transformers on gauges, etc.) in the spirit of Klein's Erlangen Program. It argues that successful models exploit pre-defined regularities arising from the low-dimensional structure of the physical world, providing both a retrospective common mathematical framework for existing architectures and a constructive procedure for incorporating prior physical knowledge into new designs.

Significance. If the unifying geometric lens holds, the survey offers a significant organizing principle for the field by linking disparate architectures through group theory, differential geometry, and symmetry considerations. It synthesizes established literature without new empirical claims, supplies design heuristics grounded in physical priors, and could guide future architecture development; the absence of free parameters, invented entities, or circular derivations strengthens its value as a reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and insightful review, which accurately summarizes the manuscript's goals of providing a geometric unification of deep learning architectures in the spirit of Klein's Erlangen Program. We appreciate the recognition of its value as a reference and organizing principle for the field.

Circularity Check

0 steps flagged

No significant circularity; survey organizes external results

full rationale

The manuscript is a survey that retrospectively organizes CNNs, GNNs, Transformers and related architectures under an Erlangen-style geometric lens drawn from standard group theory and differential geometry. It states its motivating assumption about physical regularities explicitly and offers design heuristics rather than new theorems or fitted predictions. No load-bearing step reduces by construction to a quantity defined inside the paper or to a self-citation chain; all cited results are independent external literature. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that physical-world regularities are low-dimensional and geometrically expressible; no free parameters are fitted, no new entities are postulated, and the axioms invoked are standard results from geometry and group theory.

axioms (2)

domain assumption Most tasks of interest come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world.
Invoked in the abstract and introduction as the motivation for geometric unification.
domain assumption Geometric principles (grids, groups, graphs, geodesics, gauges) can be applied throughout a wide spectrum of applications to expose these regularities.
Core premise of the Erlangen-program-inspired framework stated in the abstract.

pith-pipeline@v0.9.0 · 5579 in / 1439 out tokens · 61565 ms · 2026-05-13T02:33:20.624266+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

AlexanderDuality (for D=3 linking and invariance) alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Symmetries, Representations, and Invariance... Isomorphisms and Automorphisms... Deformation Stability... Scale Separation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Disentanglement Beyond Generative Models with Riemannian ICA
cs.LG 2026-05 unverdicted novelty 8.0

RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, whi...
HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts
cs.LG 2026-05 unverdicted novelty 8.0

HodgeCover isolates the harmonic kernel of a simplicial Laplacian on an expert 2-complex to identify irreducible merge cycles and selects experts for aggressive compression, matching or exceeding baselines on open-wei...
The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks
cs.LG 2026-05 unverdicted novelty 8.0

Hypergraph neural networks obey a strict expressivity hierarchy indexed by hypertree width, creating a Width Wall that no fixed-depth model, hidden dimension, or training procedure can cross for wider patterns.
Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning
cs.LG 2026-05 unverdicted novelty 8.0

Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.
Gradient-Based Program Synthesis with Neurally Interpreted Languages
cs.LG 2026-04 unverdicted novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
cs.LG 2025-06 conditional novelty 8.0

Authors release HSG-12M, a dataset of 16.7 million spatial multigraphs generated from non-Hermitian crystal energy spectra via the Poly2Graph pipeline, along with initial GNN benchmarks.
What are the Right Symmetries for Formal Theorem Proving?
cs.LG 2026-05 unverdicted novelty 7.0

Introduces rewriting categories to formalize proof equivariance and success invariance, shows LLM provers violate both, and demonstrates test-time aggregation recovers invariance and boosts performance.
Gaussian Sheaf Neural Networks
cs.LG 2026-05 unverdicted novelty 7.0

Gaussian Sheaf Neural Networks derive a sheaf Laplacian for Gaussian node features on graphs to preserve their geometric structure during message passing.
Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery
cs.LG 2026-05 unverdicted novelty 7.0 partial

A group-algebraic tensor framework delivers Eckart-Young optimal equivariant approximations and recovers physical selection rules from data alone via a Lean-formalized star_G algebra.
Physics-Aligned Canonical Equivariant Fourier Neural Operator under Symmetry-Induced Shifts
cs.LG 2026-05 conditional novelty 7.0

PACE-FNO reduces OOD relative error by up to 12x versus FNO with symmetry augmentation on Burgers, shallow-water, and Navier-Stokes equations by jointly training a frame estimator and operator under bounded symmetry p...
Geometric Observables for Financial Regime Detection
q-fin.ST 2026-05 unverdicted novelty 7.0

Geometric observables from spectral embeddings of stock returns detect financial regime shifts with competitive out-of-sample performance and fewer false alarms than supervised baselines.
Discretizing Group-Convolutional Neural Networks for 3D Geometry in Feature Space
cs.CV 2026-05 unverdicted novelty 7.0

Feature-space sampling in GCNNs preserves 3D classification accuracy with coarse discretization, enabling precomputation and faster training of equivariant models.
Cross-attention-based bipartite graph neural network for coupled nodal and elemental field prediction in large-deformation sheet material forming
cs.CE 2026-05 unverdicted novelty 7.0

A cross-attention-based bipartite GNN predicts coupled nodal displacement increments and elemental thinning directly on their native mesh domains for sheet material forming.
Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry
cs.LG 2026-05 unverdicted novelty 7.0

MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space
cs.CV 2026-05 unverdicted novelty 7.0

MLLMs scoring 70-83% on Cartesian visual tasks drop to 31-39% on logically equivalent polar versions, exposing reliance on grid discretization shortcuts instead of topology-invariant reasoning.
TokaMind for Power Grid: Cross-Domain Transfer from Fusion Plasma
physics.plasm-ph 2026-05 unverdicted novelty 7.0

TokaMind, pre-trained on MAST tokamak data, transfers to power grid PMU data for severe event classification with F1 0.837, where difficulty depends on grid topology and CSD indicators boost early-warning performance ...
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
stat.ML 2026-05 unverdicted novelty 7.0

Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
Operator-Guided Invariance Learning for Continuous Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.
Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves
cs.LG 2026-05 unverdicted novelty 7.0

HilbNets define convolutions via Hilbert bundle connection Laplacians, prove that sampled Hilbert cellular sheaf Laplacians converge to the continuous operator, and show that discretized networks are consistent and tr...
Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs
math.DS 2026-05 unverdicted novelty 7.0

Establishes well-posedness, compact global attractors, and delay-independent global stability for retarded functional differential equations modeling reentrant value fields as coupled reaction-diffusion systems on fin...
Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs
math.DS 2026-05 unverdicted novelty 7.0

A field theory of synthetic cognition is cast as a retarded functional differential equation on graphs, with proofs of well-posedness, compact global attractor existence, delay-independent stability under a coupling-s...
Cardiac Mesh Flow: One-Step Generation of 3D+t Cardiac Four-Chamber Meshes via Flow Matching
eess.IV 2026-05 unverdicted novelty 7.0

Cardiac Mesh Flow generates 3D+t four-chamber cardiac meshes with anatomical correspondence and volume conditioning via one-step flow matching on multi-scale deformation fields.
Data-driven discovery of polynomial ODEs with provably bounded solutions
math.DS 2026-04 unverdicted novelty 7.0

SILAS jointly optimizes polynomial ODE vector fields and polynomial Lyapunov functions from data to produce models with provably bounded trajectories via compact absorbing sets.
Complex-Valued GNNs for Distributed Basis-Invariant Control of Planar Systems
cs.LG 2026-04 unverdicted novelty 7.0

Complex-valued GNNs using phase-equivariant activations achieve global basis invariance for distributed planar control, outperforming real-valued baselines in data efficiency, tracking, and generalization on flocking.
Polarized Target Nuclear Magnetic Resonance Measurements with Deep Neural Networks
physics.ins-det 2026-03 unverdicted novelty 7.0

Deep neural networks reduce fitting uncertainties in CW-NMR polarization measurements for dynamically polarized targets.
Exact Verification of Graph Neural Networks with Incremental Constraint Solving
cs.LG 2025-08 unverdicted novelty 7.0

Develops an exact verification method for GNNs supporting sum, max and mean aggregations via incremental constraint solving with bound tightening for adversarial robustness on node and graph classification tasks.
HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
cs.LG 2025-06 unverdicted novelty 7.0

HSG-12M is a large dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra via the Poly2Graph pipeline, positioned as the first large-scale benchmark of this graph type.
Massive Activations in Large Language Models
cs.CL 2024-02 unverdicted novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Riemannian geometry meets fMRI: the advantages of modeling correlation manifolds and eigenvector subspaces
cs.LG 2026-05 unverdicted novelty 6.0

Introduces Off-log metric for correlation matrices and Grassmannian subspace distances to improve sensitivity and classification in fMRI brain network analysis across clinical and ageing datasets.
Protein Fold Classification at Scale: Benchmarking and Pretraining
cs.LG 2026-05 unverdicted novelty 6.0

Introduces TEDBench benchmark and MiAE self-supervised framework that outperforms baselines for large-scale protein fold classification.
Neural Point-Forms
cs.LG 2026-05 unverdicted novelty 6.0

Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and comp...
Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves
cs.LG 2026-05 unverdicted novelty 6.0

HilbNets discretize Hilbert bundle convolutions through Hilbert Cellular Sheaves whose Laplacians converge to the continuous connection Laplacian, enabling consistent learning across samplings.
LINC: Decoupling Local Consequence Scoring from Hidden Matching in Constructive Neural Routing
cs.LG 2026-05 unverdicted novelty 6.0

LINC decouples local consequence scoring from hidden matching in constructive neural routing solvers, cutting CVRPTW gaps for PolyNet from 13.83%/38.15% to 7.26%/14.71% on Solomon/Homberger benchmarks.
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA
cs.AI 2026-05 unverdicted novelty 6.0

Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks whe...
Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs
math.DS 2026-05 unverdicted novelty 6.0

Establishes well-posedness, compact global attractor existence, and delay-independent stability for a retarded functional differential equation coupling symbolic and geometric fields on graphs under fixed interfield o...
Symmetry-Protected Lyapunov Neutral Modes in Equivariant Recurrent Networks
cs.NE 2026-05 unverdicted novelty 6.0

Exact equivariance under a Lie group guarantees at least dim(G/H) zero Lyapunov exponents tangent to the group orbit on compact invariant sets with nondegenerate orbit bundles.
Geometric Quantum Physics Informed Neural Network
quant-ph 2026-05 unverdicted novelty 6.0

GQPINNs add symmetry awareness to quantum PINNs via equivariant circuits, yielding lower mean absolute error and fewer parameters than standard QPINNs on linear and nonlinear PDE benchmarks.
Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise
cs.LG 2026-05 unverdicted novelty 6.0

Exploiting data symmetries boosts k-NN to select near-optimal low-noise subsets from noisy datasets, approaching Bayes-optimal performance in high dimensions, with learned representations aiding partial symmetry knowledge.
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
cs.LG 2026-05 unverdicted novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
Stability Enhanced Gaussian Process Variational Autoencoders
cs.LG 2026-04 unverdicted novelty 6.0

SEGP-VAE learns stable low-dimensional LTI systems from video data by deriving GP mean and covariance from LTI equations and using a complete unconstrained parametrization of semi-contracting systems.
Toward a universal foundation model for graph-structured data
cs.LG 2026-04 unverdicted novelty 6.0

A pretrained graph model using feature-agnostic structural prompts matches or exceeds supervised baselines and shows strong zero-shot and few-shot transfer on held-out biomedical graphs, with a 21.8% ROC-AUC gain on SagePPI.
LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces
cs.CL 2026-04 unverdicted novelty 6.0

LAG-XAI treats paraphrasing as affine flows in semantic manifolds using Lie-inspired approximations, achieving AUC 0.7713 on paraphrase detection and 95.3% hallucination detection on HaluEval.
Metriplector: From Field Theory to Neural Architecture
cs.AI 2026-03 unverdicted novelty 6.0

Metriplector treats neural computation as coupled metriplectic field dynamics whose stress-energy tensor readout achieves competitive results on vision, control, Sudoku, language modeling, and pathfinding with small p...
AI-enhanced tuning of quantum dot Hamiltonians toward Majorana modes
cond-mat.mes-hall 2026-01 unverdicted novelty 6.0

A vision-transformer neural network trained unsupervised on synthetic conductance data proposes Hamiltonian parameter updates that drive quantum dot chains into the topological phase with Majorana modes, often succeed...
Torch Geometric Pool: the PyTorch library for pooling in Graph Neural Networks
cs.LG 2025-12 accept novelty 6.0

A new open-source library standardizes 20 hierarchical graph pooling operations under one SRCL interface with uniform outputs and batch handling for PyTorch Geometric.
Generalized Spherical Neural Operators: Green's Function Formulation
cs.LG 2025-12 unverdicted novelty 6.0

GSNO uses position-dependent spherical Green's functions to create flexible neural operators that adapt to non-equivariant systems on spheres while keeping spectral efficiency and grid invariance.
Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power
cs.LG 2025-12 unverdicted novelty 6.0

Enforcing equivariance reduces expressive power in 2-layer ReLU networks but enlarging the model compensates with proven size bounds and yields lower hypothesis space dimensionality for better generalization.
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
cs.LG 2025-09 unverdicted novelty 6.0

Adaptive canonicalization selects input canonical forms by maximizing network predictive confidence to yield continuous symmetry-preserving models with universal approximation for equivariant geometric networks.
Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
stat.ML 2025-09 unverdicted novelty 6.0

GABI learns geometry-conditioned latent priors from multi-geometry physical response datasets for use in Bayesian inversion, yielding geometry-adapted posteriors via ABC sampling.
Universal Representation of Generalized Convex Functions and their Gradients
math.OC 2025-08 unverdicted novelty 6.0

A new differentiable layer with convex parameter space universally approximates generalized convex functions and their gradients, enabling single-level reformulations of bilevel problems in optimal transport and multi...
Resource-efficient equivariant quantum convolutional neural networks
quant-ph 2024-10 unverdicted novelty 6.0

Equivariant sp-QCNN encodes general symmetries with group theory, splits circuits at pooling layers to preserve symmetry while enabling parallel measurements, and shows improved efficiency and trainability over standa...
Quantum Convolutional Neural Networks are Effectively Classically Simulable
quant-ph 2024-08 unverdicted novelty 6.0

QCNNs are classically simulable via Pauli shadows on low-bodyness subspaces of locally-easy datasets, with explicit simulation demonstrated up to 1024 qubits for phases of matter classification.
Graph State-Space Models and Latent Relational Inference
cs.LG 2023-01 unverdicted novelty 6.0

Graph State-Space Models jointly learn state-space dynamics and latent relational graphs end-to-end from time series for forecasting and structure extraction.
Abstraction for Offline Goal-Conditioned Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 5.0

Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
Stimulus symmetries can confound representational similarity analyses
q-bio.NC 2026-05 unverdicted novelty 5.0

Stimulus symmetries render many neural representations functionally equivalent yet produce qualitatively different RSMs, including drifting ones from SGD or regularization in image-encoding networks.
Axiomatizing Neural Networks via Pursuit of Subspaces
cs.LG 2026-05 unverdicted novelty 5.0

Authors introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic geometric framework that unifies explanations for representation, computation, and generalization in shallow and deep neural networks.
Dynamic Elliptical Graph Factor Models via Riemannian Optimization with Geodesic Temporal Regularization
cs.LG 2026-05 unverdicted novelty 5.0

DEGfM is a dynamic elliptical graph factor model that performs Riemannian optimization on the Grassmann manifold with geodesic temporal regularization to infer time-varying precision matrices.
Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
cs.LG 2026-05 unverdicted novelty 5.0

Transductive Sharpening adds an entropy-minimization term on unlabeled-node predictions to the training objective for graph node classification.
Symmetry in the Wild: The Role of Equivariance in Neural Fluid Surrogates
cs.LG 2026-05 unverdicted novelty 5.0

Explicit E(3)-equivariance in neural CFD surrogates improves generalization on diverse-geometry hemodynamics benchmarks but degrades in-distribution performance on strongly aligned aerodynamics data, consistently beat...
PhysEDA: Physics-Aware Learning Framework for Efficient EDA With Manhattan Distance Decay
cs.LG 2026-05 unverdicted novelty 5.0

PhysEDA folds separable Manhattan-distance exponential decay into linear attention and potential-based rewards, cutting complexity to linear while improving zero-shot transfer and sparse-reward performance on decoupli...

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages · cited by 67 Pith papers · 16 internal anchors

[1]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications.arXiv:2006.05205,

work page arXiv 2006
[2]

Cormorant: Covariant Molecular Neur al Networks

BrandonAnderson,Truong-SonHy,andRisiKondor. Cormorant: Covariant molecular neural networks.arXiv:1906.04015,

work page arXiv 1906
[3]

Layer Normalization

JimmyLeiBa,JamieRyanKiros,andGeoﬀreyEHinton. Layernormalization. arXiv:1607.06450,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv:1409.0473,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Discovering transforms: A tutorial on circulant matrices, circular convolution, and the discrete fourier transform.arXiv:1805.05533,

Bassam Bamieh. Discovering transforms: A tutorial on circulant matrices, circular convolution, and the discrete fourier transform.arXiv:1805.05533,

work page arXiv
[6]

Interaction Networks for Learning about Objects, Relations and Physics

PeterWBattaglia,RazvanPascanu,MatthewLai,DaniloRezende,andKoray Kavukcuoglu. Interaction networks for learning about objects, relations and physics.arXiv:1612.00222,

work page Pith review arXiv
[7]

Relational inductive biases, deep learning, and graph networks

PeterWBattaglia,JessicaBHamrick,VictorBapst,AlvaroSanchez-Gonzalez, ViniciusZambaldi,MateuszMalinowski,AndreaTacchetti,DavidRaposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks.arXiv:1806.01261,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Directional graph networks

Dominique Beaini, Saro Passaro, Vincent Létourneau, William L Hamil- ton, Gabriele Corso, and Pietro Liò. Directional graph networks. arXiv:2010.02863,

work page arXiv 2010
[9]

Size-invariant graph representations for graph classiﬁcation extrapolations.arXiv:2103.05045,

Beatrice Bevilacqua, Yangze Zhou, and Bruno Ribeiro. Size-invariant graph representations for graph classiﬁcation extrapolations.arXiv:2103.05045,

work page arXiv
[10]

Weisfeiler and lehman go topo- logical: Message passing simplicial networks.arXiv:2103.03212,

Cristian Bodnar, Fabrizio Frasca, Yu Guang Wang, Nina Otter, Guido Mon- túfar, Pietro Liò, and Michael Bronstein. Weisfeiler and lehman go topo- logical: Message passing simplicial networks.arXiv:2103.03212,

work page arXiv
[11]

Learning shape correspondence with anisotropic convolutional neural networks

Davide Boscaini, Jonathan Masci, Emanuele Rodoià, and Michael Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. InNIPS, 2016a. Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Michael M Bronstein, and Daniel Cremers. Anisotropic diﬀusion descriptors.Computer Graphics Forum, 35(2):431–441, 2016b. Sébastien Bougleux, ...

work page arXiv
[12]

Improving graph neural network expressivity via subgraph isomor- phism counting.arXiv:2006.09252,

Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M Bron- stein. Improving graph neural network expressivity via subgraph isomor- phism counting.arXiv:2006.09252,

work page arXiv 2006
[13]

Language Models are Few-Shot Learners

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv:2005.14165,

work page internal anchor Pith review Pith/arXiv arXiv 2005
[14]

Combinatorial optimization and reasoning with graph neural networks.arXiv:2102.09544,

Quentin Cappart, Didier Chételat, Elias Khalil, Andrea Lodi, Christopher Morris, and Petar VeliŁković. Combinatorial optimization and reasoning with graph neural networks.arXiv:2102.09544,

work page arXiv
[15]

Neural Ordinary Differential Equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary diﬀerential equations.arXiv:1806.07366,

work page internal anchor Pith review arXiv
[16]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bah- danau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Spherical CNNs

Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. arXiv:1801.10130,

work page Pith review arXiv
[18]

Recurrent Batch Normalization

Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, and Aaron Courville. Recurrent batch normalization.arXiv:1603.09025,

work page Pith review arXiv
[19]

Principal neighbourhood aggregation for graph nets

Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar VeliŁković. Principal neighbourhood aggregation for graph nets. arXiv:2004.05718,

work page arXiv 2004
[20]

Lagrangian neural networks,

Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks.arXiv:2003.04630,

work page arXiv 2003
[21]

Learning symbolic physics with graph networks,

MilesDCranmer,RuiXu,PeterBattaglia,andShirleyHo. Learningsymbolic physics with graph networks.arXiv:1909.05862,

work page arXiv 1909
[22]

2020 , month = dec, number =

BIBLIOGRAPHY 135 Andreea Deac, Petar VeliŁković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, and Mladen Nikolić. Xlvin: executed latent value iteration nets. arXiv:2010.13146,

work page arXiv 2010
[23]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understand- ing. arXiv:1810.04805,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

A generalization of transformer networks to graphs

Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs.arXiv:2012.09699,

work page arXiv 2012
[25]

Spin-weighted spherical CNNs.arXiv:2006.10731,

Carlos Esteves, Ameesh Makadia, and Kostas Daniilidis. Spin-weighted spherical CNNs.arXiv:2006.10731,

work page arXiv 2006
[26]

Hierarchical inter-message passing for learning on molecular graphs.arXiv:2006.12179,

Matthias Fey, Jan-Gin Yuen, and Frank Weichert. Hierarchical inter-message passing for learning on molecular graphs.arXiv:2006.12179,

work page arXiv 2006
[27]

Neural shuﬄe-exchange networks–sequence processing in o (n log n) time.arXiv:1907.07897,

K¯arlis Freivalds, Em¯ıls Ozolin,š, and Agris ’ostaks. Neural shuﬄe-exchange networks–sequence processing in o (n log n) time.arXiv:1907.07897,

work page arXiv 1907
[28]

Large-scale density and velocity field reconstructions with neural networks

Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. SE(3)-transformers: 3D roto-translation equivariant attention networks. arXiv:2006.10503,

work page arXiv 2006
[29]

Learninggraphrepresentations with embedding propagation.arXiv:1710.03059,

AlbertoGarcía-DuránandMathiasNiepert. Learninggraphrepresentations with embedding propagation.arXiv:1710.03059,

work page arXiv
[30]

Texture synthesis usingconvolutionalneuralnetworks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Texture synthesis usingconvolutionalneuralnetworks. arXivpreprintarXiv:1505.07376 ,2015. ThomasGaudelet,BenDay,ArianRJamasb,JyothishSoman,CristianRegep, Gertrude Liu, Jeremy BR Hayter, Richard Vickers, Charles Roberts, Jian Tang, et al. Utilising graph machine learning within drug discovery and develop...

work page arXiv 2015
[31]

Neural Message Passing for Quantum Chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv:1704.01212,

work page Pith review arXiv
[32]

Generative Adversarial Networks

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.arXiv:1406.2661,

work page internal anchor Pith review arXiv
[33]

Generating Sequences With Recurrent Neural Networks

Alex Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850,

work page Pith review arXiv
[34]

Neural Turing Machines

Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv:1410.5401,

work page internal anchor Pith review arXiv
[35]

Bootstrap your own latent: A new approach to self-supervised learn- ing

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H Richemond,ElenaBuchatskaya,CarlDoersch,BernardoAvilaPires,Zhao- han Daniel Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning.arXiv:2006.07733,

work page arXiv 2006
[36]

Network medicine framework for identifying drug repurposing opportunities for COVID-19.arXiv:2004.07229,

Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Helia Sanchez, Rebecca Marlene Baron, Dina Ghiassian, Joseph Loscalzo, et al. Network medicine framework for identifying drug repurposing opportunities for COVID-19.arXiv:2004.07229,

work page arXiv 2004
[37]

Identity matters in deep learning

Moritz Hardt and Tengyu Ma. Identity matters in deep learning. arXiv:1611.04231,

work page arXiv
[38]

Vain: Attentional multi-agent predictive modeling

Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. arXiv:1706.06122,

work page arXiv
[39]

LieTransformer: Equivariant self- attention for Lie groups.arXiv:2012.10885,

Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye Teh, and Hyunjik Kim. LieTransformer: Equivariant self- attention for Lie groups.arXiv:2012.10885,

work page arXiv 2012
[40]

Sarah Itani and Dorina Thanou

URLhttps: //doi.org/10.5281/zenodo.2526396. Sarah Itani and Dorina Thanou. Combining anatomical and functional net- worksforneuropathologyidentiﬁcation: Acasestudyonautismspectrum disorder. Medical Image Analysis, 69:101986,

work page doi:10.5281/zenodo.2526396
[41]

Neural GPUs Learn Algorithms

Šukasz Kaiser and Ilya Sutskever. Neural GPUs learn algorithms. arXiv:1511.08228,

work page Pith review arXiv
[42]

Neural Machine Translation in Linear Time

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. Neural machine translation in linear time.arXiv:1610.10099,

work page Pith review arXiv
[43]

Dif- ferentiable graph module (DGM) graph convolutional networks

Anees Kazi, Luca Cosmo, Nassir Navab, and Michael Bronstein. Dif- ferentiable graph module (DGM) graph convolutional networks. arXiv:2002.04999,

work page arXiv 2002
[44]

Interpretable stability bounds for spectral graph ﬁlters.arXiv:2102.09587,

Henry Kenlay, Dorina Thanou, and Xiaowen Dong. Interpretable stability bounds for spectral graph ﬁlters.arXiv:2102.09587,

work page arXiv
[45]

Adam: A Method for Stochastic Optimization

DiederikPKingmaandJimmyBa. Adam: Amethodforstochasticoptimiza- tion. arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[46]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Semi-Supervised Classification with Graph Convolutional Networks

ThomasNKipfandMaxWelling. Semi-supervisedclassiﬁcationwithgraph convolutional networks.arXiv:1609.02907, 2016a. Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv:1611.07308, 2016b. BIBLIOGRAPHY 141 Dmitry B Kireev. Chemnet: a novel neural network based method for graph/property mapping.J. Chemical Information and Computer Sciences, 35(...

work page internal anchor Pith review Pith/arXiv arXiv
[48]

arXiv preprint arXiv:2003.03123 , year=

Johannes Klicpera, Janek Groß, and Stephan Günnemann. Directional mes- sage passing for molecular graphs.arXiv:2003.03123,

work page arXiv 2003
[49]

Energyﬂownetworks: deep sets for particle jets.Journal of High Energy Physics, 2019(1):121,

PatrickTKomiske,EricMMetodiev,andJesseThaler. Energyﬂownetworks: deep sets for particle jets.Journal of High Energy Physics, 2019(1):121,

work page 2019
[50]

Neural random- access machines.arXiv:1511.06392,

Karol Kurach, Marcin Andrychowicz, and Ilya Sutskever. Neural random- access machines.arXiv:1511.06392,

work page arXiv
[51]

Gated Graph Sequence Neural Networks

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.arXiv:1511.05493,

work page Pith review arXiv
[52]

Neural arithmetic units

Andreas Madsen and Alexander Rosenberg Johansen. Neural arithmetic units. arXiv:2001.05016,

work page arXiv 2001
[53]

3d facial matching by spiral convolutional metric learning and a biometric fusion-net of demographic properties

Soha Sadat Mahdi, Nele Nauwelaers, Philip Joris, Giorgos Bouritsas, Shun- wang Gong, Sergiy Bokhnyak, Susan Walsh, Mark Shriver, Michael Bronstein, and Peter Claes. 3d facial matching by spiral convolutional metric learning and a biometric fusion-net of demographic properties. arXiv:2009.04746,

work page arXiv 2009
[54]

Learn- ing representations of missing data for predicting patient outcomes

BIBLIOGRAPHY 143 Brandon Malone, Alberto Garcia-Duran, and Mathias Niepert. Learn- ing representations of missing data for predicting patient outcomes. arXiv:1811.04752,

work page arXiv
[55]

Invariant and Equivariant Graph Networks

HaggaiMaron,HeliBen-Hamu,NadavShamir,andYaronLipman. Invariant and equivariant graph networks.arXiv:1812.09902,

work page Pith review arXiv
[56]

Prov- ably powerful graph networks.arXiv:1905.11136,

HaggaiMaron,HeliBen-Hamu,HadarServiansky,andYaronLipman. Prov- ably powerful graph networks.arXiv:1905.11136,

work page arXiv 1905
[57]

Scatteringnetworksonthesphereforscalableandrotationallyequivariant spherical cnns.arXiv:2102.02828,

Jason D McEwen, Christopher GR Wallis, and Augustine N Mavor-Parker. Scatteringnetworksonthesphereforscalableandrotationallyequivariant spherical cnns.arXiv:2102.02828,

work page arXiv
[58]

Learning with invariances in random features and kernel models.arXiv:2102.13219,

Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Learning with invariances in random features and kernel models.arXiv:2102.13219,

work page arXiv
[59]

Normalization Mandate

JovanaMitrovic, BrianMcWilliams, JacobWalker, Lars Buesing, andCharles Blundell. Representation learning via invariant causal mechanisms. arXiv:2010.07922,

work page arXiv 2010
[60]

Fake News Detection on Social Media using Geometric Deep Learning

Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and MichaelMBronstein. Fakenewsdetectiononsocialmediausinggeometric deep learning.arXiv:1902.06673,

work page Pith review arXiv 1902
[61]

Loopy Belief Propagation for Approximate Inference: An Empirical Study

BIBLIOGRAPHY 145 Kevin Murphy, Yair Weiss, and Michael I Jordan. Loopy belief propagation for approximate inference: An empirical study.arXiv:1301.6725,

work page Pith review arXiv
[62]

Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs

Ryan L Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs.arXiv:1811.01900,

work page Pith review arXiv
[63]

Fourier-based and rational graph ﬁlters for spectral pro- cessing

Giuseppe Patanè. Fourier-based and rational graph ﬁlters for spectral pro- cessing. arXiv:2011.04055,

work page arXiv 2011
[64]

Battaglia

Tobias Pfaﬀ, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning mesh-based simulation with graph networks. arXiv:2010.03409,

work page arXiv 2010
[65]

Qu and L

H Qu and L Gouskos. Particlenet: jet tagging via particle clouds. arXiv:1902.08570,

work page arXiv 1902
[66]

Implicit regularization in deep learning may not be explainable by norms.arXiv:2005.06398,

Noam Razin and Nadav Cohen. Implicit regularization in deep learning may not be explainable by norms.arXiv:2005.06398,

work page arXiv 2005
[67]

Neural Programmer-Interpreters

Scott Reed and Nando De Freitas. Neural programmer-interpreters. arXiv:1511.06279,

work page Pith review arXiv
[68]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r- cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497,

work page Pith review arXiv
[69]

Temporal pointwise convolutional networks for length of stay prediction in the intensive care unit

Emma Rocheteau, Pietro Liò, and Stephanie Hyland. Temporal pointwise convolutional networks for length of stay prediction in the intensive care unit. arXiv:2007.09483,

work page arXiv 2007
[70]

arXiv:2101.03940,

Emma Rocheteau, Catherine Tong, Petar VeliŁković, Nicholas Lane, and PietroLiò.Predictingpatientoutcomeswithgraphrepresentationlearning. arXiv:2101.03940,

work page arXiv
[71]

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Frank Rosenblatt. The perceptron: a probabilistic model for information storageandorganizationinthebrain. PsychologicalReview,65(6):386,1958. 148 BRONSTEIN, BRUNA, COHEN & VELIČKOVIﬂ EmanueleRossi,BenChamberlain,FabrizioFrasca,DavideEynard,Federico Monti,andMichaelBronstein. Temporalgraphnetworksfordeeplearning on dynamic graphs.arXiv:2006.10637,

work page internal anchor Pith review arXiv 1958
[72]

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Tim Salimans and Diederik P Kingma. Weight normalization: A sim- ple reparameterization to accelerate training of deep neural networks. arXiv:1602.07868,

work page Pith review arXiv
[73]

Hamiltonian graph networks with ODE integrators.arXiv:1909.12790,

Alvaro Sanchez-Gonzalez, Victor Bapst, Kyle Cranmer, and Peter Battaglia. Hamiltonian graph networks with ODE integrators.arXiv:1909.12790,

work page arXiv 1909
[74]

Relational recurrent neural networks.arXiv:1806.01822,

Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy Lillicrap. Relational recurrent neural networks.arXiv:1806.01822,

work page arXiv
[75]

How Does Batch Normalization Help Optimization?

Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How does batch normalization help optimization?arXiv:1805.11604,

work page Pith review arXiv
[76]

Random features strengthen graph neural networks.arXiv:2002.03155,

Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random features strengthen graph neural networks.arXiv:2002.03155,

work page arXiv 2002
[77]

E(n) equivariant graph neural networks, 2022

BIBLIOGRAPHY 149 Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivari- ant graph neural networks.arXiv:2102.09844,

work page arXiv
[78]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv
[79]

Implicit regularization in relu networks with the square loss.arXiv:2012.05156,

Ohad Shamir and Gal Vardi. Implicit regularization in relu networks with the square loss.arXiv:2012.05156,

work page arXiv 2012
[80]

Very Deep Convolutional Networks for Large-Scale Image Recognition

KarenSimonyanandAndrewZisserman. Verydeepconvolutionalnetworks for large-scale image recognition.arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv

Showing first 80 references.

[1] [1]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications.arXiv:2006.05205,

work page arXiv 2006

[2] [2]

Cormorant: Covariant Molecular Neur al Networks

BrandonAnderson,Truong-SonHy,andRisiKondor. Cormorant: Covariant molecular neural networks.arXiv:1906.04015,

work page arXiv 1906

[3] [3]

Layer Normalization

JimmyLeiBa,JamieRyanKiros,andGeoﬀreyEHinton. Layernormalization. arXiv:1607.06450,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv:1409.0473,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Discovering transforms: A tutorial on circulant matrices, circular convolution, and the discrete fourier transform.arXiv:1805.05533,

Bassam Bamieh. Discovering transforms: A tutorial on circulant matrices, circular convolution, and the discrete fourier transform.arXiv:1805.05533,

work page arXiv

[6] [6]

Interaction Networks for Learning about Objects, Relations and Physics

PeterWBattaglia,RazvanPascanu,MatthewLai,DaniloRezende,andKoray Kavukcuoglu. Interaction networks for learning about objects, relations and physics.arXiv:1612.00222,

work page Pith review arXiv

[7] [7]

Relational inductive biases, deep learning, and graph networks

PeterWBattaglia,JessicaBHamrick,VictorBapst,AlvaroSanchez-Gonzalez, ViniciusZambaldi,MateuszMalinowski,AndreaTacchetti,DavidRaposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks.arXiv:1806.01261,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Directional graph networks

Dominique Beaini, Saro Passaro, Vincent Létourneau, William L Hamil- ton, Gabriele Corso, and Pietro Liò. Directional graph networks. arXiv:2010.02863,

work page arXiv 2010

[9] [9]

Size-invariant graph representations for graph classiﬁcation extrapolations.arXiv:2103.05045,

Beatrice Bevilacqua, Yangze Zhou, and Bruno Ribeiro. Size-invariant graph representations for graph classiﬁcation extrapolations.arXiv:2103.05045,

work page arXiv

[10] [10]

Weisfeiler and lehman go topo- logical: Message passing simplicial networks.arXiv:2103.03212,

Cristian Bodnar, Fabrizio Frasca, Yu Guang Wang, Nina Otter, Guido Mon- túfar, Pietro Liò, and Michael Bronstein. Weisfeiler and lehman go topo- logical: Message passing simplicial networks.arXiv:2103.03212,

work page arXiv

[11] [11]

Learning shape correspondence with anisotropic convolutional neural networks

Davide Boscaini, Jonathan Masci, Emanuele Rodoià, and Michael Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. InNIPS, 2016a. Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Michael M Bronstein, and Daniel Cremers. Anisotropic diﬀusion descriptors.Computer Graphics Forum, 35(2):431–441, 2016b. Sébastien Bougleux, ...

work page arXiv

[12] [12]

Improving graph neural network expressivity via subgraph isomor- phism counting.arXiv:2006.09252,

Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M Bron- stein. Improving graph neural network expressivity via subgraph isomor- phism counting.arXiv:2006.09252,

work page arXiv 2006

[13] [13]

Language Models are Few-Shot Learners

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv:2005.14165,

work page internal anchor Pith review Pith/arXiv arXiv 2005

[14] [14]

Combinatorial optimization and reasoning with graph neural networks.arXiv:2102.09544,

Quentin Cappart, Didier Chételat, Elias Khalil, Andrea Lodi, Christopher Morris, and Petar VeliŁković. Combinatorial optimization and reasoning with graph neural networks.arXiv:2102.09544,

work page arXiv

[15] [15]

Neural Ordinary Differential Equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary diﬀerential equations.arXiv:1806.07366,

work page internal anchor Pith review arXiv

[16] [16]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bah- danau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Spherical CNNs

Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. arXiv:1801.10130,

work page Pith review arXiv

[18] [18]

Recurrent Batch Normalization

Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, and Aaron Courville. Recurrent batch normalization.arXiv:1603.09025,

work page Pith review arXiv

[19] [19]

Principal neighbourhood aggregation for graph nets

Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar VeliŁković. Principal neighbourhood aggregation for graph nets. arXiv:2004.05718,

work page arXiv 2004

[20] [20]

Lagrangian neural networks,

Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks.arXiv:2003.04630,

work page arXiv 2003

[21] [21]

Learning symbolic physics with graph networks,

MilesDCranmer,RuiXu,PeterBattaglia,andShirleyHo. Learningsymbolic physics with graph networks.arXiv:1909.05862,

work page arXiv 1909

[22] [22]

2020 , month = dec, number =

BIBLIOGRAPHY 135 Andreea Deac, Petar VeliŁković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, and Mladen Nikolić. Xlvin: executed latent value iteration nets. arXiv:2010.13146,

work page arXiv 2010

[23] [23]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understand- ing. arXiv:1810.04805,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

A generalization of transformer networks to graphs

Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs.arXiv:2012.09699,

work page arXiv 2012

[25] [25]

Spin-weighted spherical CNNs.arXiv:2006.10731,

Carlos Esteves, Ameesh Makadia, and Kostas Daniilidis. Spin-weighted spherical CNNs.arXiv:2006.10731,

work page arXiv 2006

[26] [26]

Hierarchical inter-message passing for learning on molecular graphs.arXiv:2006.12179,

Matthias Fey, Jan-Gin Yuen, and Frank Weichert. Hierarchical inter-message passing for learning on molecular graphs.arXiv:2006.12179,

work page arXiv 2006

[27] [27]

Neural shuﬄe-exchange networks–sequence processing in o (n log n) time.arXiv:1907.07897,

K¯arlis Freivalds, Em¯ıls Ozolin,š, and Agris ’ostaks. Neural shuﬄe-exchange networks–sequence processing in o (n log n) time.arXiv:1907.07897,

work page arXiv 1907

[28] [28]

Large-scale density and velocity field reconstructions with neural networks

Fabian B Fuchs, Daniel E Worrall, Volker Fischer, and Max Welling. SE(3)-transformers: 3D roto-translation equivariant attention networks. arXiv:2006.10503,

work page arXiv 2006

[29] [29]

Learninggraphrepresentations with embedding propagation.arXiv:1710.03059,

AlbertoGarcía-DuránandMathiasNiepert. Learninggraphrepresentations with embedding propagation.arXiv:1710.03059,

work page arXiv

[30] [30]

Texture synthesis usingconvolutionalneuralnetworks

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Texture synthesis usingconvolutionalneuralnetworks. arXivpreprintarXiv:1505.07376 ,2015. ThomasGaudelet,BenDay,ArianRJamasb,JyothishSoman,CristianRegep, Gertrude Liu, Jeremy BR Hayter, Richard Vickers, Charles Roberts, Jian Tang, et al. Utilising graph machine learning within drug discovery and develop...

work page arXiv 2015

[31] [31]

Neural Message Passing for Quantum Chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. arXiv:1704.01212,

work page Pith review arXiv

[32] [32]

Generative Adversarial Networks

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.arXiv:1406.2661,

work page internal anchor Pith review arXiv

[33] [33]

Generating Sequences With Recurrent Neural Networks

Alex Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850,

work page Pith review arXiv

[34] [34]

Neural Turing Machines

Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv:1410.5401,

work page internal anchor Pith review arXiv

[35] [35]

Bootstrap your own latent: A new approach to self-supervised learn- ing

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H Richemond,ElenaBuchatskaya,CarlDoersch,BernardoAvilaPires,Zhao- han Daniel Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning.arXiv:2006.07733,

work page arXiv 2006

[36] [36]

Network medicine framework for identifying drug repurposing opportunities for COVID-19.arXiv:2004.07229,

Deisy Morselli Gysi, Ítalo Do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Helia Sanchez, Rebecca Marlene Baron, Dina Ghiassian, Joseph Loscalzo, et al. Network medicine framework for identifying drug repurposing opportunities for COVID-19.arXiv:2004.07229,

work page arXiv 2004

[37] [37]

Identity matters in deep learning

Moritz Hardt and Tengyu Ma. Identity matters in deep learning. arXiv:1611.04231,

work page arXiv

[38] [38]

Vain: Attentional multi-agent predictive modeling

Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. arXiv:1706.06122,

work page arXiv

[39] [39]

LieTransformer: Equivariant self- attention for Lie groups.arXiv:2012.10885,

Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye Teh, and Hyunjik Kim. LieTransformer: Equivariant self- attention for Lie groups.arXiv:2012.10885,

work page arXiv 2012

[40] [40]

Sarah Itani and Dorina Thanou

URLhttps: //doi.org/10.5281/zenodo.2526396. Sarah Itani and Dorina Thanou. Combining anatomical and functional net- worksforneuropathologyidentiﬁcation: Acasestudyonautismspectrum disorder. Medical Image Analysis, 69:101986,

work page doi:10.5281/zenodo.2526396

[41] [41]

Neural GPUs Learn Algorithms

Šukasz Kaiser and Ilya Sutskever. Neural GPUs learn algorithms. arXiv:1511.08228,

work page Pith review arXiv

[42] [42]

Neural Machine Translation in Linear Time

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. Neural machine translation in linear time.arXiv:1610.10099,

work page Pith review arXiv

[43] [43]

Dif- ferentiable graph module (DGM) graph convolutional networks

Anees Kazi, Luca Cosmo, Nassir Navab, and Michael Bronstein. Dif- ferentiable graph module (DGM) graph convolutional networks. arXiv:2002.04999,

work page arXiv 2002

[44] [44]

Interpretable stability bounds for spectral graph ﬁlters.arXiv:2102.09587,

Henry Kenlay, Dorina Thanou, and Xiaowen Dong. Interpretable stability bounds for spectral graph ﬁlters.arXiv:2102.09587,

work page arXiv

[45] [45]

Adam: A Method for Stochastic Optimization

DiederikPKingmaandJimmyBa. Adam: Amethodforstochasticoptimiza- tion. arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Semi-Supervised Classification with Graph Convolutional Networks

ThomasNKipfandMaxWelling. Semi-supervisedclassiﬁcationwithgraph convolutional networks.arXiv:1609.02907, 2016a. Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv:1611.07308, 2016b. BIBLIOGRAPHY 141 Dmitry B Kireev. Chemnet: a novel neural network based method for graph/property mapping.J. Chemical Information and Computer Sciences, 35(...

work page internal anchor Pith review Pith/arXiv arXiv

[48] [48]

arXiv preprint arXiv:2003.03123 , year=

Johannes Klicpera, Janek Groß, and Stephan Günnemann. Directional mes- sage passing for molecular graphs.arXiv:2003.03123,

work page arXiv 2003

[49] [49]

Energyﬂownetworks: deep sets for particle jets.Journal of High Energy Physics, 2019(1):121,

PatrickTKomiske,EricMMetodiev,andJesseThaler. Energyﬂownetworks: deep sets for particle jets.Journal of High Energy Physics, 2019(1):121,

work page 2019

[50] [50]

Neural random- access machines.arXiv:1511.06392,

Karol Kurach, Marcin Andrychowicz, and Ilya Sutskever. Neural random- access machines.arXiv:1511.06392,

work page arXiv

[51] [51]

Gated Graph Sequence Neural Networks

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.arXiv:1511.05493,

work page Pith review arXiv

[52] [52]

Neural arithmetic units

Andreas Madsen and Alexander Rosenberg Johansen. Neural arithmetic units. arXiv:2001.05016,

work page arXiv 2001

[53] [53]

3d facial matching by spiral convolutional metric learning and a biometric fusion-net of demographic properties

Soha Sadat Mahdi, Nele Nauwelaers, Philip Joris, Giorgos Bouritsas, Shun- wang Gong, Sergiy Bokhnyak, Susan Walsh, Mark Shriver, Michael Bronstein, and Peter Claes. 3d facial matching by spiral convolutional metric learning and a biometric fusion-net of demographic properties. arXiv:2009.04746,

work page arXiv 2009

[54] [54]

Learn- ing representations of missing data for predicting patient outcomes

BIBLIOGRAPHY 143 Brandon Malone, Alberto Garcia-Duran, and Mathias Niepert. Learn- ing representations of missing data for predicting patient outcomes. arXiv:1811.04752,

work page arXiv

[55] [55]

Invariant and Equivariant Graph Networks

HaggaiMaron,HeliBen-Hamu,NadavShamir,andYaronLipman. Invariant and equivariant graph networks.arXiv:1812.09902,

work page Pith review arXiv

[56] [56]

Prov- ably powerful graph networks.arXiv:1905.11136,

HaggaiMaron,HeliBen-Hamu,HadarServiansky,andYaronLipman. Prov- ably powerful graph networks.arXiv:1905.11136,

work page arXiv 1905

[57] [57]

Scatteringnetworksonthesphereforscalableandrotationallyequivariant spherical cnns.arXiv:2102.02828,

Jason D McEwen, Christopher GR Wallis, and Augustine N Mavor-Parker. Scatteringnetworksonthesphereforscalableandrotationallyequivariant spherical cnns.arXiv:2102.02828,

work page arXiv

[58] [58]

Learning with invariances in random features and kernel models.arXiv:2102.13219,

Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Learning with invariances in random features and kernel models.arXiv:2102.13219,

work page arXiv

[59] [59]

Normalization Mandate

JovanaMitrovic, BrianMcWilliams, JacobWalker, Lars Buesing, andCharles Blundell. Representation learning via invariant causal mechanisms. arXiv:2010.07922,

work page arXiv 2010

[60] [60]

Fake News Detection on Social Media using Geometric Deep Learning

Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and MichaelMBronstein. Fakenewsdetectiononsocialmediausinggeometric deep learning.arXiv:1902.06673,

work page Pith review arXiv 1902

[61] [61]

Loopy Belief Propagation for Approximate Inference: An Empirical Study

BIBLIOGRAPHY 145 Kevin Murphy, Yair Weiss, and Michael I Jordan. Loopy belief propagation for approximate inference: An empirical study.arXiv:1301.6725,

work page Pith review arXiv

[62] [62]

Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs

Ryan L Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs.arXiv:1811.01900,

work page Pith review arXiv

[63] [63]

Fourier-based and rational graph ﬁlters for spectral pro- cessing

Giuseppe Patanè. Fourier-based and rational graph ﬁlters for spectral pro- cessing. arXiv:2011.04055,

work page arXiv 2011

[64] [64]

Battaglia

Tobias Pfaﬀ, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning mesh-based simulation with graph networks. arXiv:2010.03409,

work page arXiv 2010

[65] [65]

Qu and L

H Qu and L Gouskos. Particlenet: jet tagging via particle clouds. arXiv:1902.08570,

work page arXiv 1902

[66] [66]

Implicit regularization in deep learning may not be explainable by norms.arXiv:2005.06398,

Noam Razin and Nadav Cohen. Implicit regularization in deep learning may not be explainable by norms.arXiv:2005.06398,

work page arXiv 2005

[67] [67]

Neural Programmer-Interpreters

Scott Reed and Nando De Freitas. Neural programmer-interpreters. arXiv:1511.06279,

work page Pith review arXiv

[68] [68]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r- cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497,

work page Pith review arXiv

[69] [69]

Temporal pointwise convolutional networks for length of stay prediction in the intensive care unit

Emma Rocheteau, Pietro Liò, and Stephanie Hyland. Temporal pointwise convolutional networks for length of stay prediction in the intensive care unit. arXiv:2007.09483,

work page arXiv 2007

[70] [70]

arXiv:2101.03940,

Emma Rocheteau, Catherine Tong, Petar VeliŁković, Nicholas Lane, and PietroLiò.Predictingpatientoutcomeswithgraphrepresentationlearning. arXiv:2101.03940,

work page arXiv

[71] [71]

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Frank Rosenblatt. The perceptron: a probabilistic model for information storageandorganizationinthebrain. PsychologicalReview,65(6):386,1958. 148 BRONSTEIN, BRUNA, COHEN & VELIČKOVIﬂ EmanueleRossi,BenChamberlain,FabrizioFrasca,DavideEynard,Federico Monti,andMichaelBronstein. Temporalgraphnetworksfordeeplearning on dynamic graphs.arXiv:2006.10637,

work page internal anchor Pith review arXiv 1958

[72] [72]

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Tim Salimans and Diederik P Kingma. Weight normalization: A sim- ple reparameterization to accelerate training of deep neural networks. arXiv:1602.07868,

work page Pith review arXiv

[73] [73]

Hamiltonian graph networks with ODE integrators.arXiv:1909.12790,

Alvaro Sanchez-Gonzalez, Victor Bapst, Kyle Cranmer, and Peter Battaglia. Hamiltonian graph networks with ODE integrators.arXiv:1909.12790,

work page arXiv 1909

[74] [74]

Relational recurrent neural networks.arXiv:1806.01822,

Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy Lillicrap. Relational recurrent neural networks.arXiv:1806.01822,

work page arXiv

[75] [75]

How Does Batch Normalization Help Optimization?

Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How does batch normalization help optimization?arXiv:1805.11604,

work page Pith review arXiv

[76] [76]

Random features strengthen graph neural networks.arXiv:2002.03155,

Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random features strengthen graph neural networks.arXiv:2002.03155,

work page arXiv 2002

[77] [77]

E(n) equivariant graph neural networks, 2022

BIBLIOGRAPHY 149 Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivari- ant graph neural networks.arXiv:2102.09844,

work page arXiv

[78] [78]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv

[79] [79]

Implicit regularization in relu networks with the square loss.arXiv:2012.05156,

Ohad Shamir and Gal Vardi. Implicit regularization in relu networks with the square loss.arXiv:2012.05156,

work page arXiv 2012

[80] [80]

Very Deep Convolutional Networks for Large-Scale Image Recognition

KarenSimonyanandAndrewZisserman. Verydeepconvolutionalnetworks for large-scale image recognition.arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv