Incorporating Deep Learning Design in Database Queries

Benny Kimelfeld; Boaz Berger; Dean Light; Shunit Agmon; Yuval Lev Lubarsky

arxiv: 2605.24207 · v1 · pith:C6MUN2E2new · submitted 2026-05-22 · 💻 cs.DB · cs.LG

Incorporating Deep Learning Design in Database Queries

Yuval Lev Lubarsky , Dean Light , Boaz Berger , Shunit Agmon , Benny Kimelfeld This is my paper

Pith reviewed 2026-06-30 14:13 UTC · model grok-4.3

classification 💻 cs.DB cs.LG

keywords relational deep learningtuple embeddingsquery liftinggraph neural networksdeclarative machine learningdatabase integration

0 comments

The pith

Database queries can be lifted to jointly handle relational data and learnable tuple embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deep learning on relational data has required exporting tables to graph formats and running separate neural network code. The paper observes that the way these networks combine embeddings follows the structure of joins and other relational operations. By attaching a learnable embedding vector to each tuple and extending the query operators to transform these vectors, the same computations can be expressed inside the database. This makes relational deep learning declarative and integrated with existing database infrastructure, as shown by an implementation that reproduces several graph models with simple queries.

Core claim

By representing tuple provenance as learnable vector embeddings and lifting relational algebra operators to act on both the data and these embeddings, queries can directly realize the computations performed by graph neural networks over relational data.

What carries the argument

Lifted relational queries that propagate and aggregate tuple embeddings according to the query structure.

If this is right

Graph neural network models become expressible as standard database queries.
The engineering overhead of data export to external ML systems is eliminated.
Database optimizations can be applied directly to neural computations.
Models including graph convolutional networks, heterogeneous graph transformers, and hypergraph networks can be implemented this way.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may enable training and inference entirely inside the database without data movement.
It could generalize to other types of neural architectures that operate on relational structures.
Query planners might automatically optimize the embedding computations for better performance.

Load-bearing premise

The interactions induced by relational joins are fully captured by the manipulations that graph neural networks perform on tuple embeddings.

What would settle it

Finding a relational deep learning task where no lifted query reproduces the output of the corresponding graph neural network on the same input data and embeddings.

Figures

Figures reproduced from arXiv: 2605.24207 by Benny Kimelfeld, Boaz Berger, Dean Light, Shunit Agmon, Yuval Lev Lubarsky.

**Figure 2.** Figure 2: Term graph (logical plan) for DriverAgg. Leaves are database relations and inner nodes are NRA operators. that execute joins, projections, and learned transformations end-toend on GPU. Data loaders then connect the corresponding database relations to the physical plan via SQL queries. RelaNN is a Python-embedded DSL parsed with Lark [46]. Transformation operators such as Linear and ReLU are resolved by n… view at source ↗

**Figure 4.** Figure 4: Equation-to-rule correspondence for the HGT at [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Deep learning over relational databases is conventionally realized by translating data into graph representations and applying graph-based neural networks within external frameworks. This round-trip between the database and external machine learning (ML) systems introduces non-trivial engineering overhead. In effect, these graph neural networks operate on tuple embeddings and manipulate them in ways that capture the interactions induced by relational joins. Given this natural correspondence, there is no fundamental reason why specifying a neural network over relational data should be substantially harder than querying it. We propose an approach that naturally integrates deep learning with database queries. The key idea is to associate each tuple with provenance, represented as a vector embedding with learnable parameters. Queries are lifted to operate jointly on data and embeddings, mapping input relations with embedded tuples to output relations with embedded tuples. This approach provides a declarative foundation for relational deep learning, facilitating integration with database systems, optimization, and wide adoption. We describe RelaNN, a proof-of-concept implementation of this approach built on top of PyTorch and cuDF. We illustrate the utility of RelaNN by implementing various graph-learning models, including graph convolutional networks, heterogeneous graph transformers, hypergraph neural networks and deep homomorphism networks. The simplicity of the programs and their competitive runtime performance demonstrate a concrete path toward making the implementation of state-of-the-art neural networks over databases as simple as writing a query.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper lifts relational queries over learnable tuple embeddings to let GNN models be written as DB queries, but provides no checks that the lifted operators match standard GNN semantics.

read the letter

The main point is that they associate tuples with learnable embedding vectors as provenance and lift queries so that input relations with embeddings map to output relations with embeddings. This is meant to let you express things like graph convolutional networks or heterogeneous graph transformers directly inside the database without shipping data to an external framework.

They built RelaNN on PyTorch and cuDF and used it to implement several models including GCNs, heterogeneous graph transformers, hypergraph NNs, and deep homomorphism networks. The programs are short and the reported runtimes are competitive, which shows a practical path for reducing engineering overhead in relational deep learning.

The soft spot is the missing verification. The abstract claims the models were implemented and runtimes are competitive, but it gives no accuracy numbers, no embedding comparisons, and no equivalence tests against reference implementations. If the lifted join and aggregation steps handle multi-hop message passing, normalization, or heterogeneous edges differently from the original GNN code, the results would not be semantically equivalent even if the syntax is simpler. That gap is central to the claim and is not addressed in the provided text.

This is for researchers and engineers working on database-ML integration who want a declarative route for graph learning over relational data. A reader who cares about systems-level simplicity would find the implementation sketch useful.

It deserves peer review because the core idea is distinct and the proof-of-concept exists, even though the correctness argument needs strengthening.

Referee Report

2 major / 0 minor

Summary. The paper proposes lifting relational database queries to jointly operate on data tuples and associated learnable provenance embeddings, enabling declarative specification of deep learning models (such as GCNs, heterogeneous graph transformers, and hypergraph NNs) directly over relational data without external graph frameworks. It presents RelaNN, a PyTorch/cuDF proof-of-concept, and claims that the natural correspondence between relational joins and GNN operations on embeddings allows simple query-based implementations with competitive runtime.

Significance. If the lifted operators are shown to be semantically equivalent to reference GNN implementations (including multi-hop aggregation, normalization, and heterogeneous edge handling), the work could meaningfully reduce engineering overhead in relational deep learning and support tighter DB-ML integration. The absence of accuracy results, embedding comparisons, or equivalence verification in the provided description limits the assessed significance to a promising but unvalidated direction.

major comments (2)

[Abstract] Abstract: the central claim that queries can 'faithfully reproduce GNN message passing, aggregation, and update steps' for models like heterogeneous graph transformers rests on an unverified natural correspondence; the manuscript reports only that models 'were implemented' and runtime is competitive, with no accuracy numbers, embedding comparisons, or output-equivalence checks against reference implementations.
[Abstract] The description of RelaNN and the lifted operators provides no derivation, formal semantics, or proof that the embedding manipulations preserve the exact aggregation/normalization behavior of the target GNNs (e.g., attention in heterogeneous transformers or hyperedge aggregation); without this, the declarative foundation claim cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the need for stronger verification of the claimed correspondence between lifted relational operators and GNN computations. We address the two major comments below and will incorporate revisions to provide the requested evidence and formal details.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that queries can 'faithfully reproduce GNN message passing, aggregation, and update steps' for models like heterogeneous graph transformers rests on an unverified natural correspondence; the manuscript reports only that models 'were implemented' and runtime is competitive, with no accuracy numbers, embedding comparisons, or output-equivalence checks against reference implementations.

Authors: The manuscript grounds the claim in the natural structural correspondence between relational joins and the multi-hop neighborhood aggregations performed by GNNs, which is illustrated through the concrete RelaNN implementations of GCNs, heterogeneous graph transformers, and hypergraph NNs. We agree, however, that the abstract and evaluation sections would be strengthened by explicit verification. We will add a new subsection reporting (i) output-equivalence checks (element-wise L2 distance and cosine similarity on embeddings) against reference implementations in PyTorch Geometric and (ii) end-to-end accuracy on standard node-classification benchmarks for each model. revision: yes
Referee: [Abstract] The description of RelaNN and the lifted operators provides no derivation, formal semantics, or proof that the embedding manipulations preserve the exact aggregation/normalization behavior of the target GNNs (e.g., attention in heterogeneous transformers or hyperedge aggregation); without this, the declarative foundation claim cannot be evaluated.

Authors: The current text presents the lifting via an intuitive mapping from join-induced interactions to embedding operations but does not supply a formal semantics or equivalence proof for the more involved cases (attention coefficients, normalization constants, hyperedge pooling). We will revise the manuscript by inserting a new section that (a) defines the lifted relational operators with precise algebraic semantics and (b) sketches the equivalence arguments for the supported GNN families, including the handling of heterogeneous attention and hyperedge aggregation. revision: yes

Circularity Check

0 steps flagged

No circularity: new lifted-query machinery introduced without reduction to fitted inputs or self-citations

full rationale

The paper proposes associating tuples with learnable vector embeddings and lifting relational queries to operate jointly on data and embeddings. This is presented as a new declarative foundation rather than a derivation from prior fitted quantities. No equations define a target quantity in terms of itself, no parameters are fitted on a subset and then renamed as predictions, and no load-bearing self-citations or uniqueness theorems from the authors' prior work are invoked. The implementation of RelaNN and example models (GCN, HGT, hypergraph NNs) serves as direct evidence of utility, keeping the central claim self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central idea rests on the assumption that relational operations can be extended to vector embeddings while preserving the semantics needed for graph neural networks; this is postulated rather than derived from external benchmarks.

free parameters (1)

embedding dimension and parameters
Learnable vector parameters attached to each tuple; dimension and initialization not specified in abstract.

axioms (1)

domain assumption Relational joins induce interactions that can be captured by operations on tuple embeddings
Invoked to justify why lifting queries implements graph neural networks.

invented entities (1)

tuple provenance embeddings no independent evidence
purpose: Represent learnable parameters that capture relational interactions
New representation introduced to enable the lifted queries.

pith-pipeline@v0.9.1-grok · 5776 in / 1263 out tokens · 34078 ms · 2026-06-30T14:13:53.749592+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neuro-Relational Programs: Unifying Queries and Neural Computation over Structured Data
cs.DB 2026-06 unverdicted novelty 6.0

NRPs extend Datalog with embedding operations to create a single formalism readable as both query plans with trainable parts and neural architectures with relational structure.

Reference graph

Works this paper leans on

60 extracted references · 11 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Cal- vanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi. 2018. Research Directions for Principles ...

2018
[2]

Bronstein, İsmail İlkan Ceylan, and Matthias Lanzinger

Linus Bao, Emily Jin, Michael M. Bronstein, İsmail İlkan Ceylan, and Matthias Lanzinger. 2025. Homomorphism Counts as Structural Encodings for Graph Learning. InICLR. OpenReview.net

2025
[3]

Thomas Bonald, Nathan de Lara, Quentin Lutz, and Bertrand Charpentier. 2020. Scikit-network: Graph Analysis in Python.Journal of Machine Learning Research 21, 185 (2020), 1–6. http://jmlr.org/papers/v21/20-412.html

2020
[4]

Rajesh Bordawekar and Oded Shmueli. 2017. Using Word Embedding to Enable Semantic Queries in Relational Databases. InDEEM@SIGMOD. ACM, 5:1–5:4

2017
[5]

Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. InSIGMOD Conference. ACM, 1335–1349

2020
[6]

Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel. 2017. To- wards Linear Algebra over Normalized Data.Proc. VLDB Endow.10, 11 (2017), 1214–1225

2017
[7]

Tianlang Chen, Charilaos Kanatsoulis, and Jure Leskovec. 2025. RelGNN: Com- posite Message Passing for Relational Deep Learning. InICML

2025
[8]

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh
[9]

InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. ACM, 257–266. doi:10.1145/3292500.3330925

work page doi:10.1145/3292500.3330925 2019
[10]

E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM13, 6 (1970), 377–387

1970
[11]

Tamara Cucumides and Floris Geerts. 2025. From Features to Structure: Task- Aware Graph Construction for Relational and Tabular Learning with GNNs. In Tabular Data Analysis Workshop (TaDA) at VLDB

2025
[12]

Alexis Cvetkov-Iliev, Alexandre Allauzen, and Gaël Varoquaux. 2023. Relational data embeddings for feature enrichment with background information.Mach. Learn.112, 2 (2023), 687–720

2023
[13]

Pedro Domingos. 2025. Tensor Logic: The Language of AI.arXiv preprint arXiv:2510.12269(2025)

work page arXiv 2025
[14]

Matthias Fey. 2019. PyTorch Scatter: Optimized Scatter Operations for Py- Torch. https://github.com/rusty1s/pytorch_scatter. GPU-native scatter_add, scatter_mean, scatter_max with autograd support

2019
[15]

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. 2024. Position: Relational Deep Learning - Graph Representation Learning on Relational Databases. In ICML. OpenReview.net

2024
[16]

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. 2024. RelBench: A Benchmark for Deep Learning on Relational Databases. InAdvances in Neural Information Processing Systems 37 (NeurIPS), Datasets and Benchmarks Track. https://arxiv.org/abs/2407.20060

work page arXiv 2024
[17]

Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. InICLR Workshop on Representation Learning on Graphs and Manifolds

2019
[18]

Billy Joe Franks, Moshe Eliasof, Semih Cantürk, Guy Wolf, Carola-Bibiane Schön- lieb, Sophie Fellenz, and Marius Kloft. 2025. Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings.Trans. Mach. Learn. Res.2025 (2025)

2025
[19]

Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. 2020. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In WWW. ACM / IW3C2, 2331–2341

2020
[20]

Boris Glavic. 2021. Data Provenance - Origins, Applications, Algorithms, and Models.Foundations and Trends®in Databases9, 3-4 (2021), 209–441. doi:10. 1561/1900000068

2021
[21]

Green, Gregory Karvounarakis, and Val Tannen

Todd J. Green, Gregory Karvounarakis, and Val Tannen. 2007. Provenance semirings. InPODS. ACM, 31–40

2007
[22]

Hamilton, Rex Ying, and Jure Leskovec

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035

2017
[23]

Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar

Joseph M. Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib analytics library: or MAD skills, the SQL.Proc. VLDB Endow.5, 12 (Aug. 2012), 1700–1711. doi:10.14778/2367502. 2367510

work page doi:10.14778/2367502 2012
[24]

Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous Graph Transformer. InProceedings of The Web Conference 2020 (WWW). 2704–

2020
[25]

https://arxiv.org/abs/2003.01332

work page arXiv 2003
[26]

Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. 2026. KumoRFM-2: Scaling Foundation Models for Relational Learning.CoRRabs/2604.12596 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[27]

Hasan M. Jamil. 2024. Toward a Declarative Query Language for Machine Learn- ing. InVLDB Workshops. https://api.semanticscholar.org/CorpusID:273878548

2024
[28]

Matthias Jasny, Tobias Ziegler, Tim Kraska, Uwe Roehm, and Carsten Binnig
[29]

InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20)

DB4ML - An In-Memory Database Kernel with Machine Learning Support. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 159–173. doi:10.1145/3318464.3380575

work page doi:10.1145/3318464.3380575 2020
[30]

Fahim Shahriar Khan and Ashraf Aboulnaga. 2025. A Vision for SQL-Based Relational Deep Learning. InVLDB 2025 Workshop: Tabular Data Analysis (TaDA)

2025
[31]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. InInternational Conference on Learning Representations (ICLR)

2017
[32]

Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. InSIGMOD Conference. ACM, 1969–1984

2015
[33]

Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. An Intermediate Representation for Optimizing Machine Learning Pipelines.Proc. VLDB Endow.12, 11 (2019), 1553– 1567

2019
[34]

Guoliang Li, Ji Sun, Lijie Xu, Shifu Li, Jiang Wang, and Wen Nie. 2024. Gaussml: An end-to-end in-database machine learning system. In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE). IEEE, 5198–5210

2024
[35]

Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, and Ce Zhang. 2017. MLog: Towards Declarative In-Database Machine Learning.Proc. VLDB Endow.10, 12 (2017), 1933–1936

2017
[36]

Yuval Lev Lubarsky, Jan Tönshoff, Martin Grohe, and Benny Kimelfeld. 2023. Selecting Walk Schemes for Database Embedding. InCIKM. ACM, 1677–1686

2023
[37]

Takanori Maehara and Hoang NT. 2024. Deep Homomorphism Networks. InAdvances in Neural Information Processing Systems 37 (NeurIPS). https://proceedings.neurips.cc/paper_files/paper/2024/file/ 65f54fdf62cd5614dc5715ae7ece4ef6-Paper-Conference.pdf

2024
[38]

Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. 2019. Provably Powerful Graph Networks. InAdvances in Neural Information Processing Systems 32 (NeurIPS)

2019
[39]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. InSIGMOD Conference. ACM, 19–34

2018
[40]

Luis Müller, Mikhail Galkin, Christopher Morris, and Ladislav Rampásek. 2024. Attending to Graph Transformers.Trans. Mach. Learn. Res.2024 (2024)

2024
[41]

NVIDIA. 2024. RAPIDS cuDF: GPU DataFrame Library. https://github.com/ rapidsai/cudf. Pandas-compatible API with GPU acceleration

2024
[42]

Dan Olteanu. 2020. The Relational Data Borg is Learning.Proc. VLDB Endow.13, 12 (2020), 3502–3515

2020
[43]

Paolo Papotti and Carsten Binnig. 2025. Panel on Neural Relational Data: Tabular Foundation Models, LLMs... or both?Proc. VLDB Endow.18 (2025), 5513–5515. https://api.semanticscholar.org/CorpusID:281247089

2025
[44]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer
[45]

https://api.semanticscholar.org/ CorpusID:40027675

Automatic differentiation in PyTorch. https://api.semanticscholar.org/ CorpusID:40027675
[46]

Khaled Mohammed Saifuddin, Briana Bumgardner, Farhan Tanvir, and Esra Akbas. 2023. HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network. In2023 IEEE 39th International Conference on Data Engineering (ICDE). 1503–1516. doi:10.1109/ICDE55515.2023.00119

work page doi:10.1109/icde55515.2023.00119 2023
[47]

Modeling Relational Data with Graph Convolutional Networks

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. InThe Semantic Web – 15th International Confer- ence (ESWC) (Lecture Notes in Computer Science, Vol. 10843). Springer, 593–607. https://arxiv.org/abs/1703.06103

work page internal anchor Pith review Pith/arXiv arXiv 2018
[48]

Schüle, Matthias Bungeroth, Alfons Kemper, Stephan Günnemann, and Thomas Neumann

Maximilian E. Schüle, Matthias Bungeroth, Alfons Kemper, Stephan Günnemann, and Thomas Neumann. 2019. MLearn: A Declarative Machine Learning Language for Database Systems. InDEEM@SIGMOD

2019
[49]

Schüle, Matthias Bungeroth, Dimitri Vorona, Alfons Kemper, Stephan Günnemann, and Thomas Neumann

Maximilian E. Schüle, Matthias Bungeroth, Dimitri Vorona, Alfons Kemper, Stephan Günnemann, and Thomas Neumann. 2019. ML2SQL - Compiling a Declarative Machine Learning Language to SQL and Python. InInternational Conference on Extending Database Technology. https://api.semanticscholar.org/ CorpusID:81990872

2019
[50]

Erez Shinan. 2017. Lark: A Parsing Library for Python. https://github.com/lark- parser/lark. Accessed: 2026

2017
[51]

Thiviyan Thanapalasingam, Lucas van Berkel, Peter Bloem, and Paul Groth
[52]

doi:10.7717/peerj-cs.1073

Relational Graph Convolutional Networks: A Closer Look.PeerJ Computer Science8 (2022), e1073. doi:10.7717/peerj-cs.1073

work page doi:10.7717/peerj-cs.1073 2022
[53]

Jan Tönshoff, Neta Friedman, Martin Grohe, and Benny Kimelfeld. 2023. Stable Tuple Embeddings for Dynamic Databases. InICDE. IEEE, 1286–1299. 9

2023
[54]

Smola, and Zheng Zhang

Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J. Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs.ICLR Workshop on Representation Learning on Graphs and Manifo...

2019
[55]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous Graph Attention Network. InWWW

2019
[56]

Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. 2025. Griffin: Towards a Graph-Centric Relational Database Foundation Model. InICML (Proceedings of Machine Learning Research). PMLR / OpenReview.net

2025
[57]

Yu Philip

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S. Yu Philip. 2020. A comprehensive survey on graph neural networks.IEEE Transactions on Neural Networks and Learning Systems32, 1 (2020), 4–24

2020
[58]

Prasanna

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Vik- tor K. Prasanna. 2019. GraphSAINT: Graph Sampling Based Inductive Learning Method.ArXivabs/1907.04931 (2019). https://api.semanticscholar.org/CorpusID: 195886159

work page arXiv 2019
[59]

Jianan Zhao, Xiao Wang, Chuan Shi, Binbin Hu, Guojie Song, and Yanfang Ye
[60]

Heterogeneous Graph Structure Learning for Graph Neural Networks. In AAAI. AAAI Press, 4697–4705. 10

[1] [1]

Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Cal- vanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi. 2018. Research Directions for Principles ...

2018

[2] [2]

Bronstein, İsmail İlkan Ceylan, and Matthias Lanzinger

Linus Bao, Emily Jin, Michael M. Bronstein, İsmail İlkan Ceylan, and Matthias Lanzinger. 2025. Homomorphism Counts as Structural Encodings for Graph Learning. InICLR. OpenReview.net

2025

[3] [3]

Thomas Bonald, Nathan de Lara, Quentin Lutz, and Bertrand Charpentier. 2020. Scikit-network: Graph Analysis in Python.Journal of Machine Learning Research 21, 185 (2020), 1–6. http://jmlr.org/papers/v21/20-412.html

2020

[4] [4]

Rajesh Bordawekar and Oded Shmueli. 2017. Using Word Embedding to Enable Semantic Queries in Relational Databases. InDEEM@SIGMOD. ACM, 5:1–5:4

2017

[5] [5]

Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. InSIGMOD Conference. ACM, 1335–1349

2020

[6] [6]

Lingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel. 2017. To- wards Linear Algebra over Normalized Data.Proc. VLDB Endow.10, 11 (2017), 1214–1225

2017

[7] [7]

Tianlang Chen, Charilaos Kanatsoulis, and Jure Leskovec. 2025. RelGNN: Com- posite Message Passing for Relational Deep Learning. InICML

2025

[8] [8]

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh

[9] [9]

InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. ACM, 257–266. doi:10.1145/3292500.3330925

work page doi:10.1145/3292500.3330925 2019

[10] [10]

E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM13, 6 (1970), 377–387

1970

[11] [11]

Tamara Cucumides and Floris Geerts. 2025. From Features to Structure: Task- Aware Graph Construction for Relational and Tabular Learning with GNNs. In Tabular Data Analysis Workshop (TaDA) at VLDB

2025

[12] [12]

Alexis Cvetkov-Iliev, Alexandre Allauzen, and Gaël Varoquaux. 2023. Relational data embeddings for feature enrichment with background information.Mach. Learn.112, 2 (2023), 687–720

2023

[13] [13]

Pedro Domingos. 2025. Tensor Logic: The Language of AI.arXiv preprint arXiv:2510.12269(2025)

work page arXiv 2025

[14] [14]

Matthias Fey. 2019. PyTorch Scatter: Optimized Scatter Operations for Py- Torch. https://github.com/rusty1s/pytorch_scatter. GPU-native scatter_add, scatter_mean, scatter_max with autograd support

2019

[15] [15]

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. 2024. Position: Relational Deep Learning - Graph Representation Learning on Relational Databases. In ICML. OpenReview.net

2024

[16] [16]

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. 2024. RelBench: A Benchmark for Deep Learning on Relational Databases. InAdvances in Neural Information Processing Systems 37 (NeurIPS), Datasets and Benchmarks Track. https://arxiv.org/abs/2407.20060

work page arXiv 2024

[17] [17]

Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. InICLR Workshop on Representation Learning on Graphs and Manifolds

2019

[18] [18]

Billy Joe Franks, Moshe Eliasof, Semih Cantürk, Guy Wolf, Carola-Bibiane Schön- lieb, Sophie Fellenz, and Marius Kloft. 2025. Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings.Trans. Mach. Learn. Res.2025 (2025)

2025

[19] [19]

Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. 2020. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In WWW. ACM / IW3C2, 2331–2341

2020

[20] [20]

Boris Glavic. 2021. Data Provenance - Origins, Applications, Algorithms, and Models.Foundations and Trends®in Databases9, 3-4 (2021), 209–441. doi:10. 1561/1900000068

2021

[21] [21]

Green, Gregory Karvounarakis, and Val Tannen

Todd J. Green, Gregory Karvounarakis, and Val Tannen. 2007. Provenance semirings. InPODS. ACM, 31–40

2007

[22] [22]

Hamilton, Rex Ying, and Jure Leskovec

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035

2017

[23] [23]

Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar

Joseph M. Hellerstein, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib analytics library: or MAD skills, the SQL.Proc. VLDB Endow.5, 12 (Aug. 2012), 1700–1711. doi:10.14778/2367502. 2367510

work page doi:10.14778/2367502 2012

[24] [24]

Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous Graph Transformer. InProceedings of The Web Conference 2020 (WWW). 2704–

2020

[25] [25]

https://arxiv.org/abs/2003.01332

work page arXiv 2003

[26] [26]

Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. 2026. KumoRFM-2: Scaling Foundation Models for Relational Learning.CoRRabs/2604.12596 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[27] [27]

Hasan M. Jamil. 2024. Toward a Declarative Query Language for Machine Learn- ing. InVLDB Workshops. https://api.semanticscholar.org/CorpusID:273878548

2024

[28] [28]

Matthias Jasny, Tobias Ziegler, Tim Kraska, Uwe Roehm, and Carsten Binnig

[29] [29]

InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20)

DB4ML - An In-Memory Database Kernel with Machine Learning Support. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data(Portland, OR, USA)(SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 159–173. doi:10.1145/3318464.3380575

work page doi:10.1145/3318464.3380575 2020

[30] [30]

Fahim Shahriar Khan and Ashraf Aboulnaga. 2025. A Vision for SQL-Based Relational Deep Learning. InVLDB 2025 Workshop: Tabular Data Analysis (TaDA)

2025

[31] [31]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. InInternational Conference on Learning Representations (ICLR)

2017

[32] [32]

Arun Kumar, Jeffrey Naughton, and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. InSIGMOD Conference. ACM, 1969–1984

2015

[33] [33]

Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. An Intermediate Representation for Optimizing Machine Learning Pipelines.Proc. VLDB Endow.12, 11 (2019), 1553– 1567

2019

[34] [34]

Guoliang Li, Ji Sun, Lijie Xu, Shifu Li, Jiang Wang, and Wen Nie. 2024. Gaussml: An end-to-end in-database machine learning system. In2024 IEEE 40th Interna- tional Conference on Data Engineering (ICDE). IEEE, 5198–5210

2024

[35] [35]

Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, and Ce Zhang. 2017. MLog: Towards Declarative In-Database Machine Learning.Proc. VLDB Endow.10, 12 (2017), 1933–1936

2017

[36] [36]

Yuval Lev Lubarsky, Jan Tönshoff, Martin Grohe, and Benny Kimelfeld. 2023. Selecting Walk Schemes for Database Embedding. InCIKM. ACM, 1677–1686

2023

[37] [37]

Takanori Maehara and Hoang NT. 2024. Deep Homomorphism Networks. InAdvances in Neural Information Processing Systems 37 (NeurIPS). https://proceedings.neurips.cc/paper_files/paper/2024/file/ 65f54fdf62cd5614dc5715ae7ece4ef6-Paper-Conference.pdf

2024

[38] [38]

Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. 2019. Provably Powerful Graph Networks. InAdvances in Neural Information Processing Systems 32 (NeurIPS)

2019

[39] [39]

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. InSIGMOD Conference. ACM, 19–34

2018

[40] [40]

Luis Müller, Mikhail Galkin, Christopher Morris, and Ladislav Rampásek. 2024. Attending to Graph Transformers.Trans. Mach. Learn. Res.2024 (2024)

2024

[41] [41]

NVIDIA. 2024. RAPIDS cuDF: GPU DataFrame Library. https://github.com/ rapidsai/cudf. Pandas-compatible API with GPU acceleration

2024

[42] [42]

Dan Olteanu. 2020. The Relational Data Borg is Learning.Proc. VLDB Endow.13, 12 (2020), 3502–3515

2020

[43] [43]

Paolo Papotti and Carsten Binnig. 2025. Panel on Neural Relational Data: Tabular Foundation Models, LLMs... or both?Proc. VLDB Endow.18 (2025), 5513–5515. https://api.semanticscholar.org/CorpusID:281247089

2025

[44] [44]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer

[45] [45]

https://api.semanticscholar.org/ CorpusID:40027675

Automatic differentiation in PyTorch. https://api.semanticscholar.org/ CorpusID:40027675

[46] [46]

Khaled Mohammed Saifuddin, Briana Bumgardner, Farhan Tanvir, and Esra Akbas. 2023. HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network. In2023 IEEE 39th International Conference on Data Engineering (ICDE). 1503–1516. doi:10.1109/ICDE55515.2023.00119

work page doi:10.1109/icde55515.2023.00119 2023

[47] [47]

Modeling Relational Data with Graph Convolutional Networks

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. InThe Semantic Web – 15th International Confer- ence (ESWC) (Lecture Notes in Computer Science, Vol. 10843). Springer, 593–607. https://arxiv.org/abs/1703.06103

work page internal anchor Pith review Pith/arXiv arXiv 2018

[48] [48]

Schüle, Matthias Bungeroth, Alfons Kemper, Stephan Günnemann, and Thomas Neumann

Maximilian E. Schüle, Matthias Bungeroth, Alfons Kemper, Stephan Günnemann, and Thomas Neumann. 2019. MLearn: A Declarative Machine Learning Language for Database Systems. InDEEM@SIGMOD

2019

[49] [49]

Schüle, Matthias Bungeroth, Dimitri Vorona, Alfons Kemper, Stephan Günnemann, and Thomas Neumann

Maximilian E. Schüle, Matthias Bungeroth, Dimitri Vorona, Alfons Kemper, Stephan Günnemann, and Thomas Neumann. 2019. ML2SQL - Compiling a Declarative Machine Learning Language to SQL and Python. InInternational Conference on Extending Database Technology. https://api.semanticscholar.org/ CorpusID:81990872

2019

[50] [50]

Erez Shinan. 2017. Lark: A Parsing Library for Python. https://github.com/lark- parser/lark. Accessed: 2026

2017

[51] [51]

Thiviyan Thanapalasingam, Lucas van Berkel, Peter Bloem, and Paul Groth

[52] [52]

doi:10.7717/peerj-cs.1073

Relational Graph Convolutional Networks: A Closer Look.PeerJ Computer Science8 (2022), e1073. doi:10.7717/peerj-cs.1073

work page doi:10.7717/peerj-cs.1073 2022

[53] [53]

Jan Tönshoff, Neta Friedman, Martin Grohe, and Benny Kimelfeld. 2023. Stable Tuple Embeddings for Dynamic Databases. InICDE. IEEE, 1286–1299. 9

2023

[54] [54]

Smola, and Zheng Zhang

Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J. Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs.ICLR Workshop on Representation Learning on Graphs and Manifo...

2019

[55] [55]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous Graph Attention Network. InWWW

2019

[56] [56]

Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. 2025. Griffin: Towards a Graph-Centric Relational Database Foundation Model. InICML (Proceedings of Machine Learning Research). PMLR / OpenReview.net

2025

[57] [57]

Yu Philip

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S. Yu Philip. 2020. A comprehensive survey on graph neural networks.IEEE Transactions on Neural Networks and Learning Systems32, 1 (2020), 4–24

2020

[58] [58]

Prasanna

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Vik- tor K. Prasanna. 2019. GraphSAINT: Graph Sampling Based Inductive Learning Method.ArXivabs/1907.04931 (2019). https://api.semanticscholar.org/CorpusID: 195886159

work page arXiv 2019

[59] [59]

Jianan Zhao, Xiao Wang, Chuan Shi, Binbin Hu, Guojie Song, and Yanfang Ye

[60] [60]

Heterogeneous Graph Structure Learning for Graph Neural Networks. In AAAI. AAAI Press, 4697–4705. 10