Relational inductive biases, deep learning, and graph networks

Adam Santoro; Alvaro Sanchez-Gonzalez; Andrea Tacchetti; Andrew Ballard; Ashish Vaswani; Caglar Gulcehre; Charles Nash; Chris Dyer; Daan Wierstra; David Raposo

arxiv: 1806.01261 · v3 · submitted 2018-06-04 · 💻 cs.LG · cs.AI· stat.ML

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia , Jessica B. Hamrick , Victor Bapst , Alvaro Sanchez-Gonzalez , Vinicius Zambaldi , Mateusz Malinowski , Andrea Tacchetti , David Raposo

show 19 more authors

Adam Santoro Ryan Faulkner Caglar Gulcehre Francis Song Andrew Ballard Justin Gilmer George Dahl Ashish Vaswani Kelsey Allen Charles Nash Victoria Langston Chris Dyer Nicolas Heess Daan Wierstra Pushmeet Kohli Matt Botvinick Oriol Vinyals Yujia Li Razvan Pascanu

This is my paper

Pith reviewed 2026-05-12 23:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords relational inductive biasgraph networkscombinatorial generalizationstructured representationsdeep learningrelational reasoningneural networks on graphs

0 comments

The pith

Graph networks unify neural approaches on graphs to embed relational structure and support combinatorial generalization in AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that achieving human-like generalization requires prioritizing combinatorial generalization, which current deep learning lacks due to weak relational inductive biases. It rejects a strict choice between hand-engineered structure and pure end-to-end learning, instead showing how relational biases can be incorporated into architectures to handle entities, relations, and composition rules. The central proposal is the graph network as a new modular building block that extends existing graph neural network methods into a general framework for operating on structured data. This setup allows models to manipulate structured knowledge and produce structured outputs while learning from data. The authors position this as a foundation for more interpretable and flexible reasoning systems.

Core claim

The paper presents graph networks as a general-purpose building block that generalizes and extends neural networks operating on graphs. A graph network takes a graph with nodes, edges, and global attributes as input and updates them through learned functions that respect relational structure, enabling the model to reason about entities and their relations in a way that supports combinatorial generalization beyond the training distribution.

What carries the argument

The graph network, a modular component that performs relational updates on graph-structured inputs by applying learned functions to nodes, edges, and global features while preserving the graph topology.

If this is right

Graph networks provide a direct interface for injecting structured knowledge into learning systems without sacrificing end-to-end trainability.
They enable models to learn and apply rules for composing entities and relations, supporting more systematic reasoning.
The framework unifies prior graph-based neural methods and extends them to handle global attributes and flexible message passing.
This approach can improve interpretability by making the relational computations explicit in the model's structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid systems could combine graph networks with symbolic rule engines to handle both learned patterns and explicit constraints.
Tasks in planning and causal reasoning might benefit from the built-in ability to represent and update relations dynamically.
Scaling laws for data efficiency could shift if relational biases reduce the need for exhaustive examples of combinations.

Load-bearing premise

That adding explicit relational inductive biases through structured graph representations will reliably produce combinatorial generalization where current deep learning architectures fall short.

What would settle it

An experiment showing that graph networks achieve no better generalization than standard feed-forward or recurrent networks on a task designed to test combinatorial generalization, such as extrapolating to novel combinations of objects and relations.

read the original abstract

Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is part position paper, part review, and part unification. It argues that combinatorial generalization is a top priority for achieving human-like AI capabilities and that relational inductive biases implemented via structured representations and computations are essential to this goal. The authors reject a strict dichotomy between hand-engineering and end-to-end learning, review existing graph-based neural network approaches, and introduce the graph network (GN) framework as a general building block that unifies and extends them while providing an interface for manipulating structured knowledge. They discuss applications to relational reasoning and release an open-source software library with demonstrations.

Significance. If the proposed framework is adopted, the work could have substantial significance by offering a flexible, extensible architecture for incorporating relational structure into deep learning models, potentially improving generalization on tasks involving entities, relations, and rules. The explicit release of an open-source library with practical demonstrations is a notable strength that supports reproducibility and further experimentation. The synthesis of inductive bias ideas provides a clear conceptual foundation that could guide subsequent research on structured reasoning.

minor comments (3)

[Abstract] Abstract: the claim that the GN 'generalizes and extends various approaches for neural networks that operate on graphs' is central to the unification argument but is not accompanied by an explicit mapping or comparison table; adding a brief enumeration of the covered prior methods would strengthen the abstract.
[§3] §3 (Graph networks): the update functions (e.g., edge, node, and global updates) are defined clearly, but the notation and variable choices could be cross-referenced more explicitly to the specific prior works they generalize to improve traceability for readers familiar with earlier GNN formulations.
Throughout: while the open-source library is highlighted as a companion resource, the main text contains no inline code snippet or minimal worked example of a GN forward pass; including one would make the 'straightforward interface' claim more concrete without lengthening the paper substantially.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, assessment of significance, and recommendation for minor revision. We appreciate the recognition of the graph network framework's potential to support relational reasoning and the value of the accompanying open-source library.

Circularity Check

0 steps flagged

No significant circularity; definitional framework independent of inputs

full rationale

The paper is explicitly a position/review/unification piece rather than a derivation with predictions or fitted results. It defines graph networks in §3 as a general interface that generalizes prior graph neural network approaches via explicit construction of nodes, edges, and global attributes with update functions; this definition does not reduce to any self-referential equation, fitted parameter, or author-only prior result. Claims about relational inductive biases and combinatorial generalization are presented as motivating hypotheses supported by literature synthesis and design rationale, not as outputs forced by the framework itself. No load-bearing self-citation chain or ansatz smuggling is used to justify uniqueness or force conclusions. The central proposal remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper rests on domain assumptions about the nature of intelligence and the sufficiency of relational biases, plus the introduction of graph networks as a conceptual primitive without independent falsifiable evidence supplied in the abstract.

axioms (2)

domain assumption Combinatorial generalization is a defining characteristic of human intelligence that current deep learning lacks.
Stated explicitly in the opening paragraphs as the core challenge to be solved.
domain assumption Structured representations and relational inductive biases are necessary and sufficient to achieve combinatorial generalization.
Central thesis of the position paper; no alternative mechanisms are seriously considered.

invented entities (1)

Graph network no independent evidence
purpose: A general-purpose building block that encodes strong relational inductive biases for operating on entities and relations.
Introduced as the main technical contribution; defined by generalizing prior graph NN approaches but without new data or proofs in the abstract.

pith-pipeline@v0.9.0 · 5709 in / 1434 out tokens · 40756 ms · 2026-05-12T23:47:15.518754+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Dynamic Stability Landscapes in Synchronization Networks
cs.LG 2026-05 unverdicted novelty 7.0

Introduces graph-to-image prediction of per-node dynamic stability landscapes in oscillator networks from topology, releases two 10k-graph datasets, and shows GNN-CNN models achieve good accuracy with cross-size gener...
A mathematical theory of balancing relational generalization and memorization
cs.LG 2026-05 unverdicted novelty 7.0

Introduces transitive inference with exceptions task and analytically shows kernel ridge regression balances relational generalization and memorization depending on representational geometry, with validation in finetu...
Can Graphs Help Vision SSMs See Better?
cs.CV 2026-05 unverdicted novelty 7.0

GraphScan replaces geometric or coordinate-based scanning in Vision SSMs with learned local semantic graph routing, yielding SOTA results among such models on classification and segmentation tasks.
Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning
cs.CV 2026-05 unverdicted novelty 7.0

Hilbert-Geo introduces a unified formal language framework with CDL predicates and theorem bank for solid geometry, using a Parse2Reason pipeline to achieve SOTA accuracy on new solid and plane geometry datasets.
Accelerating 3D Non-LTE Synthesis with Graph Neural Networks
astro-ph.SR 2026-05 unverdicted novelty 7.0

Graph neural networks can approximate full 3D non-LTE Ca II populations in solar models with correlations above 0.99 and extreme computational efficiency.
Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs
math.DS 2026-05 unverdicted novelty 7.0

A field theory of synthetic cognition is cast as a retarded functional differential equation on graphs, with proofs of well-posedness, compact global attractor existence, delay-independent stability under a coupling-s...
Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs
math.DS 2026-05 unverdicted novelty 7.0

Establishes well-posedness, compact global attractors, and delay-independent global stability for retarded functional differential equations modeling reentrant value fields as coupled reaction-diffusion systems on fin...
Graph World Models: Concepts, Taxonomy, and Future Directions
cs.AI 2026-04 unverdicted novelty 7.0

The paper unifies emerging graph-based world models under a new paradigm and proposes a taxonomy organized by spatial, physical, and logical relational inductive biases.
PiGGO: Physics-Guided Learnable Graph Kalman Filters for Virtual Sensing of Nonlinear Dynamic Structures under Uncertainty
cs.LG 2026-04 unverdicted novelty 7.0

PiGGO integrates a learned graph neural ODE as the continuous-time dynamics model within an extended Kalman filter to enable online virtual sensing and uncertainty-aware state estimation for nonlinear dynamic systems ...
One Scale at a Time: Scale-Autoregressive Modeling for Fluid Flow Distributions
cs.CE 2026-04 conditional novelty 7.0

Scale-autoregressive modeling (SAR) samples fluid flow distributions hierarchically from coarse to fine resolutions on meshes, achieving lower distributional error and 2-7x faster runtime than diffusion or flow-matchi...
Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems
cs.LG 2026-04 unverdicted novelty 7.0

A self-supervised multimodal alignment step plus equivariant GNN-based MARL yields over twofold sensing accuracy and 50% performance gains in decentralized V2I rate maximization.
Fast Wasserstein rates for estimating probability distributions of probabilistic graphical models
math.ST 2025-10 unverdicted novelty 7.0

Smoothness assumptions on graphical model kernels produce Wasserstein estimation rates determined by local graph structure rather than ambient dimension.
Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem
cs.LG 2025-07 unverdicted novelty 7.0

Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
Relational reasoning and inductive bias in transformers and large language models
cs.LG 2025-06 unverdicted novelty 7.0

In-weights learning induces linear embeddings enabling transitive inference in transformers, whereas in-context learning defaults to match-and-copy unless pre-trained on linear tasks or prompted with linear mental maps.
Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs
cs.LG 2025-05 unverdicted novelty 7.0

Unsupervised GNN model learns local updates for approximate MaxIS on dynamic graphs, achieving competitive ratios on 200-1000 node instances and 1.00-1.18x larger solutions than other unsupervised models when generali...
Temporal Graph Networks for Deep Learning on Dynamic Graphs
cs.LG 2020-06 unverdicted novelty 7.0

Temporal Graph Networks combine memory modules and graph operators to learn on dynamic graphs as timed event sequences, outperforming prior methods on transductive and inductive tasks while unifying earlier models as ...
Neural Operator: Graph Kernel Network for Partial Differential Equations
cs.LG 2020-03 unverdicted novelty 7.0

Graph Kernel Networks learn PDE solution operators that generalize across discretization methods and grid resolutions using graph-based kernel integration.
Language Models as Knowledge Bases?
cs.CL 2019-09 accept novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
cs.LG 2019-06 unverdicted novelty 7.0

Placeto learns generalizable RL policies for device placement via iterative improvements and graph embeddings, needing up to 6.1x fewer steps than prior methods and applying to unseen graphs without retraining.
Fast Graph Representation Learning with PyTorch Geometric
cs.LG 2019-03 accept novelty 7.0

PyTorch Geometric is a PyTorch library that delivers fast graph neural network training through sparse GPU kernels and variable-size mini-batching.
Learning Altruistic Collaboration in Heterogeneous Multi-Team Systems
cs.RO 2026-05 unverdicted novelty 6.0

A graph neural network learns to approximate altruistic robot transfers across heterogeneous teams using Hamilton's rule, achieving near-optimal allocation in simulated firefighting scenarios.
GOAL: Graph-based Objective-Aligned Diffusion Solvers for Dynamic Multi-Objective Optimization
cs.NE 2026-05 unverdicted novelty 6.0

GOAL uses conditioned diffusion on relational graphs with typed edges to produce feasible multi-objective solutions for scheduling problems, reporting 100% feasibility and sub-0.2% MAPE on FSP, JSP, and FJSP up to 20 jobs.
Universal Graph Backdoor Defense: A Feature-based Homophily Perspective
cs.CR 2026-05 unverdicted novelty 6.0

The paper proposes a universal defense against subgraph-based and feature-based graph backdoor attacks on GNNs by exploiting lower feature-based homophily in backdoored nodes via neighbor-aware reconstruction loss and...
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
cs.LG 2026-05 unverdicted novelty 6.0

Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
Neural Point-Forms
cs.LG 2026-05 unverdicted novelty 6.0

Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and comp...
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning
cs.LG 2026-05 conditional novelty 6.0

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SACHI uses graph transformer convolutions on inter-agent coordination graphs to enrich partial-observation agents with content-dependent teammate information, yielding statistically significant gains over baselines in...
LINC: Decoupling Local Consequence Scoring from Hidden Matching in Constructive Neural Routing
cs.LG 2026-05 unverdicted novelty 6.0

LINC decouples local consequence scoring from hidden matching in constructive neural routing solvers, cutting CVRPTW gaps for PolyNet from 13.83%/38.15% to 7.26%/14.71% on Solomon/Homberger benchmarks.
Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding
cs.AI 2026-05 unverdicted novelty 6.0

Sheet as Token represents each worksheet as a single dense token and uses a multi-channel graph retriever to improve retrieval of supporting sheets in multi-sheet workbooks.
Deep Wave Network for Modeling Multi-Scale Physical Dynamics
cs.LG 2026-05 unverdicted novelty 6.0

DW-Net improves the accuracy versus computational cost Pareto front over standard U-Nets for 2D and 3D multi-scale flow benchmarks by stacking multiple waves while keeping training settings identical.
Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs
math.DS 2026-05 unverdicted novelty 6.0

Establishes well-posedness, compact global attractor existence, and delay-independent stability for a retarded functional differential equation coupling symbolic and geometric fields on graphs under fixed interfield o...
Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Mesh Field Theory: Port-Hamiltonian Formulation of Mesh-Based Physics
cs.LG 2026-05 unverdicted novelty 6.0

Mesh Field Theory reduces mesh-based physics to port-Hamiltonian form with topology fixing interconnections and metrics entering only via constitutive relations, enabling MeshFT-Net to achieve near-zero energy drift, ...
Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework
cs.LG 2026-04 unverdicted novelty 6.0

ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
Scalable Production Scheduling: Linear Complexity via Unified Homogeneous Graphs
cs.LG 2026-04 unverdicted novelty 6.0

A unified homogeneous graph framework with feature homogenization enables linear-complexity RL policies for job shop scheduling that generalize zero-shot via structural saturation at balanced job-machine ratios.
TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering
cs.LG 2026-04 unverdicted novelty 6.0

TransXion supplies a 3-million-transaction graph benchmark with profile-aware normal activity and stochastic illicit subgraphs that produces lower detection scores than prior AML datasets.
Cluster Attention for Graph Machine Learning
cs.LG 2026-04 unverdicted novelty 6.0

Cluster attention uses off-the-shelf community detection to define attention scopes within graph clusters, augmenting MPNNs and Graph Transformers to achieve larger receptive fields with preserved structural inductive...
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
cs.LG 2026-04 unverdicted novelty 6.0

LLMs discover latent planning strategies up to five steps during training and execute them up to eight steps at test time, with larger models reaching seven under few-shot prompting, revealing a dissociation between d...
Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation
eess.AS 2026-04 unverdicted novelty 6.0

The CDMA speech depression model generalizes across languages, favors emotional speech, and aligns with EEG markers of emotional dysregulation.
Metriplector: From Field Theory to Neural Architecture
cs.AI 2026-03 unverdicted novelty 6.0

Metriplector treats neural computation as coupled metriplectic field dynamics whose stress-energy tensor readout achieves competitive results on vision, control, Sudoku, language modeling, and pathfinding with small p...
Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs
cs.LG 2026-03 unverdicted novelty 6.0

DLDMF disentangles latent dynamics for parameterized PDEs by feeding parameters into a latent embedding that initializes a parameter-conditioned Neural ODE, then uses dynamic manifold fusion with a shared decoder to r...
HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents
cs.AI 2026-03 unverdicted novelty 6.0

HiMAC decomposes LLM agent tasks into macro planning and micro execution using critic-free hierarchical RL and iterative co-evolution, outperforming baselines on ALFWorld, WebShop, and Sokoban.
Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility
cs.SI 2026-02 unverdicted novelty 6.0

LLM simulations of misinformation susceptibility overstate attitudinal associations and largely ignore personal network characteristics compared to human survey data.
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
cs.AI 2026-01 unverdicted novelty 6.0

Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persist...
Parameter-Efficient Conditioning for Material Generalization in Graph-Based Simulators
cs.LG 2025-11 unverdicted novelty 6.0

FiLM conditioning targeted at early message-passing layers lets pretrained GNS models generalize to new material properties using only 12 trajectories, a 5-fold data reduction versus multi-task baselines.
Graph-Based Alternatives to LLMs for Human Simulation
cs.CL 2025-11 conditional novelty 6.0

GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three ...
Learning to accelerate distributed ADMM using graph neural networks
cs.LG 2025-09 conditional novelty 6.0

A GNN is trained to predict adaptive step sizes and weights for distributed ADMM by unrolling a fixed number of iterations and minimizing solution error on a problem class.
Pretrained Event Classification Model for High Energy Physics Analysis
hep-ph 2024-12 unverdicted novelty 6.0

A GNN pretrained on 120M simulated HEP events generalizes to unseen processes and ATLAS data; fine-tuning boosts accuracy especially with small datasets, with CKA showing preserved encoders but altered intermediate layers.
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
cs.CV 2023-08 unverdicted novelty 6.0

DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks
cs.LG 2019-09 unverdicted novelty 6.0

DGL is a graph-centric library that optimizes GNNs via generalized sparse tensor operations, transparent graph-based optimizations, and framework-neutral design, claiming superior speed and memory use over other GNN f...
Representation Learning for Classical Planning from Partially Observed Traces
cs.AI 2019-07 unverdicted novelty 6.0

LP-GNN learns vectorized planning domain models via GNNs from partial traces and outperforms the ARMS learner on solving problems across five classical domains.
Graph Neural Based End-to-end Data Association Framework for Online Multiple-Object Tracking
cs.CV 2019-07 unverdicted novelty 6.0

A graph neural network framework learns affinities from appearance and motion then solves bipartite matching for online multiple-object tracking.
Graph-based Knowledge Distillation by Multi-head Attention Network
cs.LG 2019-07 unverdicted novelty 6.0

Multi-head attention constructs a graph of dataset relations from the teacher embedding procedure and transfers it to the student via multi-task learning, yielding 7.05% higher CIFAR-100 accuracy than the student alon...
WaveGraphNet: Physics-Consistent Guided-Wave Damage Localization through Coupled Inverse-Forward Graph Learning
cs.LG 2026-05 unverdicted novelty 5.0

WaveGraphNet is a graph-based coupled inverse-forward model that localizes damage in CFRP plates from sparse guided-wave measurements with improved extrapolation to unseen locations.
Physics-Informed Graph Neural Network Surrogates for Turbulent Nanoparticle Dispersion in Dental Clinical Environments
cs.LG 2026-05 unverdicted novelty 5.0

ELGIN is a graph-based physics-informed surrogate model that predicts carrier flow and polydisperse particle motion in dental aerosol scenarios, achieving lower tracking errors and 37x speedup versus full OpenFOAM CFD...
Attention-based graph neural networks: a survey
cs.SI 2026-05 unverdicted novelty 5.0

The survey groups attention-based GNNs into three stages—graph recurrent attention networks, graph attention networks, and graph transformers—while reviewing architectures and future directions.
Mesh Based Simulations with Spatial and Temporal awareness
cs.LG 2026-05 unverdicted novelty 5.0

A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention corre...
Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning
cs.LG 2026-04 unverdicted novelty 5.0

Inductive subgraphs serve as shortcuts in heterophilic graphs, and CD-GNN disentangles spurious from causal subgraphs by blocking non-causal paths to improve robustness and accuracy.
Extracting Money Laundering Transactions from Quasi-Temporal Graph Representation
cs.LG 2026-04 unverdicted novelty 5.0

ExSTraQt uses quasi-temporal graph representations and supervised learning to detect suspicious transactions, achieving F1 score uplifts of up to 1% on real data and over 8% on synthetic datasets compared to prior AML models.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 67 Pith papers · 1 internal anchor

[1]

Allamanis, M., Brockschmidt, M., and Khademi, M. (2018). Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations (ICLR) . Allamanis, M., Chanthirasegaran, P., Kohli, P., and Sutton, C. (2017). Learning continuous semantic representations of symbolic expressions. In Proceedings of the Internati...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Special Issue 1-2: On Connectionist Symbol Processing

Elsevier Science Publishers Ltd., Essex, UK. Special Issue 1-2: On Connectionist Symbol Processing. 25 Bojchevski, A., Shchur, O., Z¨ ugner, D., and G¨ unnemann, S. (2018). Netgan: Generating graphs via random walks. arXiv preprint arXiv:1803.00816 . Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013). Translating embeddings fo...

work page arXiv 2018
[3]

and Tenenbaum, J

Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences , 105(31):10687–10692. Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel, R. (2018). Neural relational inference for interacting systems. In Proceedings of the International Conference on Machine Learning (ICML). Kipf, T. N. and ...

work page arXiv 2008
[4]

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551. Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2016). Gated graph ...

work page Pith review arXiv 2015
[5]

Learning a SAT Solver from Single-Bit Supervision

MIT Press. Russell, S. J. and Norvig, P. (2009). Artiﬁcial Intelligence: A Modern Approach (3rd Edition) . Pearson. Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in Neural Information Processing Systems , pages 3859–3869. Sanchez-Gonzalez, A., Heess, N., Springenberg, J. T., Merel, J., Riedmiller, M., Hads...

work page Pith review arXiv 2009
[6]

S., Socher, R., and Manning, C

Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) . Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). Line: Large-scale information network embedding. In Proc...

work page 2015
[7]

Learning to reinforcement learn

Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., and Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763. Wang, T., Liao, R., Ba, J., and Fidler, S. (2018b). Nervenet: Learning structured policy with graph neural networks. In Proceedings of the International Conf...

work page Pith review arXiv 2016
[8]

Interaction networks Interaction Networks (Battaglia et al., 2016; Watters et al.,

work page 2016
[9]

and the Neural Physics Engine Chang et al. (2017) use a full GN but for the absence of the global to update the edge properties: φe (ek, vrk, vsk, u) :=fe (ek, vrk, vsk) = NNe ([ek, vrk, vsk]) φv( ¯ e′ i, vi, u ):=fv( ¯ e′ i, vi, u ) = NNv ( [¯ e′ i, vi, u] ) ρe→v( E′ i ):= = ∑ {k:rk=i} e′ k That work also included an extension to the above formulation wh...

work page 2017
[10]

Here each NNe,tk is a neural network with speciﬁc parameters

use a slightly generalized formulation where each edge has an attached type tk ∈ {1,..,T }, and the updates are: φe ((ek,tk), vrk, vsk, u) :=fe (ek, vsk) = NNe,tk (vsk) φv( ¯ e′ i, vi, u ):=fv( ¯ e′ i, vi ) = NNv ( [¯ e′ i, vi] ) ρe→v( E′ i ):= = ∑ {k:rk=i} e′ k These updates are applied recurrently (the NNv is a GRU (Cho et al., 2014)), followed by a glo...

work page 2014
[11]

They also use a multi-headed version which computes Nh parallel ¯ e′h i using diﬀerent NNαquery h , NNαkey h , NNβh, where h indexes the diﬀerent parameters

(in the slightly more general form described by (Hoshen, 2017)) uses: φe (ek, vrk, vsk, u) :=fe (vsk) = NN e (vsk) φv( ¯ e′ i, vi, u ):=fv( ¯ e′ i, vi ) = NNv ( [¯ e′ i, NNv′ (vi)] ) ρe→v( E′ i ):= = 1 |E′ i| ∑ {k:rk=i} e′ k 38 Attention-based approaches The various attention-based approaches use a φe which is factored into a scalar pairwise-interaction f...

work page 2017
[12]

Relative

are also similar to multi-headed SA, but use a neural network as the attentional similarity metric, with shared parameters across the attention inputs’ embeddings: αe (vrk, vsk) = exp (NN α′ ([NNα (vrk), NNα (vsk))) βe (vsk) = NN β (vsk) φv( ¯ e′ i, vi, u ):=fv ( {¯ e′h i }h=1...Nh ) = NNv ( [¯ e′1 i ,..., ¯ e′Nh i ] ) Stretching beyond the speciﬁc non-lo...

work page 2018

[1] [1]

Allamanis, M., Brockschmidt, M., and Khademi, M. (2018). Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations (ICLR) . Allamanis, M., Chanthirasegaran, P., Kohli, P., and Sutton, C. (2017). Learning continuous semantic representations of symbolic expressions. In Proceedings of the Internati...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Special Issue 1-2: On Connectionist Symbol Processing

Elsevier Science Publishers Ltd., Essex, UK. Special Issue 1-2: On Connectionist Symbol Processing. 25 Bojchevski, A., Shchur, O., Z¨ ugner, D., and G¨ unnemann, S. (2018). Netgan: Generating graphs via random walks. arXiv preprint arXiv:1803.00816 . Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013). Translating embeddings fo...

work page arXiv 2018

[3] [3]

and Tenenbaum, J

Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences , 105(31):10687–10692. Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel, R. (2018). Neural relational inference for interacting systems. In Proceedings of the International Conference on Machine Learning (ICML). Kipf, T. N. and ...

work page arXiv 2008

[4] [4]

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551. Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2016). Gated graph ...

work page Pith review arXiv 2015

[5] [5]

Learning a SAT Solver from Single-Bit Supervision

MIT Press. Russell, S. J. and Norvig, P. (2009). Artiﬁcial Intelligence: A Modern Approach (3rd Edition) . Pearson. Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in Neural Information Processing Systems , pages 3859–3869. Sanchez-Gonzalez, A., Heess, N., Springenberg, J. T., Merel, J., Riedmiller, M., Hads...

work page Pith review arXiv 2009

[6] [6]

S., Socher, R., and Manning, C

Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) . Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). Line: Large-scale information network embedding. In Proc...

work page 2015

[7] [7]

Learning to reinforcement learn

Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., and Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763. Wang, T., Liao, R., Ba, J., and Fidler, S. (2018b). Nervenet: Learning structured policy with graph neural networks. In Proceedings of the International Conf...

work page Pith review arXiv 2016

[8] [8]

Interaction networks Interaction Networks (Battaglia et al., 2016; Watters et al.,

work page 2016

[9] [9]

and the Neural Physics Engine Chang et al. (2017) use a full GN but for the absence of the global to update the edge properties: φe (ek, vrk, vsk, u) :=fe (ek, vrk, vsk) = NNe ([ek, vrk, vsk]) φv( ¯ e′ i, vi, u ):=fv( ¯ e′ i, vi, u ) = NNv ( [¯ e′ i, vi, u] ) ρe→v( E′ i ):= = ∑ {k:rk=i} e′ k That work also included an extension to the above formulation wh...

work page 2017

[10] [10]

Here each NNe,tk is a neural network with speciﬁc parameters

use a slightly generalized formulation where each edge has an attached type tk ∈ {1,..,T }, and the updates are: φe ((ek,tk), vrk, vsk, u) :=fe (ek, vsk) = NNe,tk (vsk) φv( ¯ e′ i, vi, u ):=fv( ¯ e′ i, vi ) = NNv ( [¯ e′ i, vi] ) ρe→v( E′ i ):= = ∑ {k:rk=i} e′ k These updates are applied recurrently (the NNv is a GRU (Cho et al., 2014)), followed by a glo...

work page 2014

[11] [11]

They also use a multi-headed version which computes Nh parallel ¯ e′h i using diﬀerent NNαquery h , NNαkey h , NNβh, where h indexes the diﬀerent parameters

(in the slightly more general form described by (Hoshen, 2017)) uses: φe (ek, vrk, vsk, u) :=fe (vsk) = NN e (vsk) φv( ¯ e′ i, vi, u ):=fv( ¯ e′ i, vi ) = NNv ( [¯ e′ i, NNv′ (vi)] ) ρe→v( E′ i ):= = 1 |E′ i| ∑ {k:rk=i} e′ k 38 Attention-based approaches The various attention-based approaches use a φe which is factored into a scalar pairwise-interaction f...

work page 2017

[12] [12]

Relative

are also similar to multi-headed SA, but use a neural network as the attentional similarity metric, with shared parameters across the attention inputs’ embeddings: αe (vrk, vsk) = exp (NN α′ ([NNα (vrk), NNα (vsk))) βe (vsk) = NN β (vsk) φv( ¯ e′ i, vi, u ):=fv ( {¯ e′h i }h=1...Nh ) = NNv ( [¯ e′1 i ,..., ¯ e′Nh i ] ) Stretching beyond the speciﬁc non-lo...

work page 2018