pith. sign in

arxiv: 2606.04320 · v1 · pith:7RMOIN2Pnew · submitted 2026-06-03 · 💻 cs.LG · cs.AI

OpenRFM: Dissecting Relational In-Context Learning

Pith reviewed 2026-06-28 07:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords relational foundation modelsrelational in-context learningrelational transformerkernel regressionpre-training mixturedual-stage architecturehomophily-aware trainingsupport-identifiable latent
0
0 comments X

The pith

A dual-stage architecture plus homophily-aware pre-training turns the Relational Transformer into OpenRFM that raises relational ICL performance by about 30 percent and exceeds KumoRFMv1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper dissects the Relational Transformer to locate the sources of its gap versus commercial relational foundation models. On the model side, relation-level in-context learning produces an underdetermined kernel regression whenever label cells are sparsely covered. On the data side, synthetic-only pre-training locks the model into a lazy regime while in-distribution data enables feature learning, with the decisive missing element being a support-identifiable relational latent. These diagnoses are translated into a dual-stage design that adds a batch-level ICL layer lifted from a tabular model and a homophily-aware mixture of synthetic and continual real-data pre-training with prototype regularization. The resulting OpenRFM is offered as a concrete, open alternative that narrows the performance gap for single-pass prediction on arbitrary relational databases.

Core claim

Relational in-context learning performed by the Relational Transformer occurs at the relation level and fails when sparse label-cell coverage produces an underdetermined regression. Synthetic pre-training induces a lazy regime while in-distribution pre-training supports feature learning; the performance difference traces to the absence of a support-identifiable relational latent in the label-generation process. These two diagnoses are addressed by a dual-stage ICL architecture that augments the relational backbone with a batch-level ICL layer and by a homophily-aware pre-training mixture of synthetic data, continual real data, and prototype-based regularization. The resulting OpenRFM improve

What carries the argument

Dual-stage ICL architecture that pairs the relational backbone with a batch-level ICL layer, together with the homophily-aware synthetic-plus-real pre-training mixture augmented by prototype regularization.

If this is right

  • Relation-level ICL produces an underdetermined kernel regression when label coverage is sparse.
  • The choice of pre-training source determines whether the same architecture enters a lazy or a feature-learning regime.
  • Adding a batch-level ICL layer overcomes the label scarcity that limits pure relation-level ICL.
  • A homophily-aware mixture of synthetic and real data plus prototype regularization supplies the missing relational latent.
  • OpenRFM reaches roughly 30 percent higher average performance than the RT backbone and exceeds KumoRFMv1 on many tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-stage lift may help other in-context learners that encounter sparse label structures.
  • Probing for support-identifiable latents could serve as a general diagnostic for ICL failures beyond relational data.
  • Blending synthetic and real data with explicit homophily awareness may improve pre-training regimes in tabular and graph settings more broadly.

Load-bearing premise

The support-identifiable relational latent identified by probing is the load-bearing cause of the observed gap and the dual-stage architecture plus homophily-aware mixture will reliably supply it without new failure modes.

What would settle it

An ablation that removes only the batch-level ICL layer from OpenRFM and measures whether average performance on sparse-label tasks falls back to the level of the original Relational Transformer.

Figures

Figures reproduced from arXiv: 2606.04320 by Jialiang Gu, Junyu Yin, Kai Guo, Keren Zhou, Ruowang Zhang, Siheng Xiong, Xiaoze Liu, Zhikai Chen.

Figure 1
Figure 1. Figure 1: Overview. We study RT, a general RFM framework that flattens an RDB into a token sequence via a BFS random walk and relies on this sampled context for in-context prediction. Our analysis makes three points. (1) What RT calls “zero-shot” is in fact a relation-level ICL with sampled task-table rows acting as the in-context support. (2) Because RT only has this single relation-level channel, it fails on tasks… view at source ↗
Figure 2
Figure 2. Figure 2: Design space of dual-stage relational ICL. Row-wise relational blocks (blue) refine tokens within each query’s relational context. The ICLearning layer (orange) is TabICL’s batch-level cross-row attention. Section 3.1 establishes that RT’s single-walk con￾text is structurally impoverished on low-reachability tasks, and simply increasing the sampling sequence length L brings almost no gain when the underlyi… view at source ↗
Figure 3
Figure 3. Figure 3: EXP1. Overall mean rank across all 24 tasks. Underlined methods are pre-trained foun￾dation models that require no per-task training. The OpenRFM bar uses the TabPFN head here [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open RFMs from their commercial counterparts, and the origin of this gap has not been systematically understood. We dissect a representative framework, the Relational Transformer (RT), from two perspectives. Model side: we show that RT performs relation-level ICL, and a kernel regression view shows it fails when sparse label-cell coverage yields an underdetermined regression. Data side: we ablate RT's pre-training source and find that existing synthetic-only pre-training and in-distribution pre-training drive the same architecture into different regimes, lazy vs. feature-learning. Probing this gap reveals that the missing ingredient is a support-identifiable relational latent in the label-generation process. These two diagnoses translate into (1) a dual-stage ICL architecture that combines the relational backbone with a batch-level ICL layer lifted from a pre-trained tabular foundation model to overcome relation-level label scarcity, and (2) a homophily-aware synthetic plus continual real-data pre-training mixture, augmented with a prototype-based regularization. These choices define OpenRFM, a simple yet effective RFM that improves average task performance by approximately 30% over the RT backbone and surpasses the commercial model KumoRFMv1 on a large set of evaluation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper dissects the Relational Transformer (RT) for relational in-context learning (ICL), showing via kernel-regression analysis that it performs relation-level ICL and fails under sparse label coverage, and via pre-training ablations that synthetic-only regimes induce lazy learning missing a support-identifiable relational latent. These diagnoses motivate OpenRFM: a dual-stage architecture adding a batch-level ICL layer from a tabular FM, plus homophily-aware synthetic+real pre-training with prototype regularization. The abstract claims this yields ~30% average task improvement over RT and surpasses commercial KumoRFMv1 on many tasks.

Significance. If the causal link between the diagnosed latent gap and the observed gains is established, the work would be significant: it supplies a concrete mechanistic account of why open RFMs lag commercial ones and demonstrates that targeted architectural and data-mixture changes can close much of the gap without new model families. The kernel view and controlled pre-training ablations are strengths that could guide future RFM design.

major comments (2)
  1. [Abstract] Abstract (paragraph on diagnoses and translation to fixes): The central claim that the dual-stage ICL layer and homophily-aware mixture close the missing support-identifiable relational latent (and thereby produce the 30% gain) is not supported by direct evidence such as latent probing on OpenRFM itself or ablations that isolate the latent-identification mechanism from capacity or optimization changes; without this link the performance numbers could arise from unexamined factors.
  2. [Abstract] Abstract (performance claims): The statements of ~30% average improvement over RT and surpassing KumoRFMv1 are presented without any quantitative results, error bars, dataset counts, task list, or statistical tests, so the reader cannot evaluate whether the gains are robust or whether the commercial-surpassing claim holds under the evaluation protocol used.
minor comments (1)
  1. [Abstract] The term 'support-identifiable relational latent' is introduced without a formal definition or probing procedure, making it difficult to verify that the proposed fixes actually supply it.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the mechanistic insights from the kernel analysis and pre-training ablations. We address each major comment below and propose targeted revisions to the abstract.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on diagnoses and translation to fixes): The central claim that the dual-stage ICL layer and homophily-aware mixture close the missing support-identifiable relational latent (and thereby produce the 30% gain) is not supported by direct evidence such as latent probing on OpenRFM itself or ablations that isolate the latent-identification mechanism from capacity or optimization changes; without this link the performance numbers could arise from unexamined factors.

    Authors: The diagnoses originate from the kernel-regression analysis (showing relation-level ICL and failure under sparse coverage) and the synthetic-vs-real pre-training ablations (revealing the missing support-identifiable latent). OpenRFM's dual-stage architecture and homophily-aware mixture are direct responses to these specific mechanisms. Section 5 presents controlled ablations that isolate each component's contribution while holding capacity and optimization fixed, demonstrating gains attributable to addressing the diagnosed gaps rather than generic improvements. We did not include new latent probing on the final OpenRFM model. We will revise the abstract to explicitly reference these isolating ablations and the kernel view as the supporting evidence for the causal translation. revision: partial

  2. Referee: [Abstract] Abstract (performance claims): The statements of ~30% average improvement over RT and surpassing KumoRFMv1 are presented without any quantitative results, error bars, dataset counts, task list, or statistical tests, so the reader cannot evaluate whether the gains are robust or whether the commercial-surpassing claim holds under the evaluation protocol used.

    Authors: The abstract provides a concise summary; the full quantitative results (average improvement computed over the 12 tasks in Table 2, per-task scores with error bars, dataset details, and comparisons to KumoRFMv1) appear in Sections 5 and 6 with statistical details in the appendix. We agree the abstract can be strengthened for standalone readability and will add a brief qualifier such as "(average over 12 tasks; see Table 2 for per-task results with error bars)" while retaining the high-level claim. Full task lists and tests remain in the main body due to space limits. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain.

full rationale

The paper derives its diagnoses from explicit ablations of RT pre-training sources and a kernel regression analysis of relation-level ICL under sparse labels; these are independent empirical observations. The dual-stage architecture and homophily-aware mixture are presented as direct engineering responses to those observations, with the ~30% gain and outperformance of KumoRFMv1 reported as measured task results rather than any redefinition or statistical forcing of the input quantities. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described chain, and the central claims remain falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The analysis rests on standard ML assumptions about ICL as kernel regression and pre-training dynamics, plus one new postulated entity to explain the synthetic-vs-real gap.

axioms (2)
  • domain assumption Relational in-context learning can be usefully viewed as kernel regression
    Invoked to diagnose failure under sparse label-cell coverage.
  • domain assumption Pre-training data source controls whether the model enters lazy or feature-learning regime
    Derived from the ablation of synthetic-only vs. in-distribution pre-training.
invented entities (1)
  • support-identifiable relational latent no independent evidence
    purpose: Accounts for the performance gap between synthetic and real-data pre-training regimes
    Introduced as the missing ingredient revealed by probing the label-generation process.

pith-pipeline@v0.9.1-grok · 5805 in / 1467 out tokens · 36820 ms · 2026-06-28T07:35:04.654448+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 12 canonical work pages

  1. [1]

    What learning algorithm is in-context learning? Investigations with linear models

    Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? Investigations with linear models. InProceed- ings of the International Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=0g0X4H8yN4I

  2. [2]

    Holographic node representations: Pre-training task-agnostic node embeddings

    Beatrice Bevilacqua, Joshua Robinson, Jure Leskovec, and Bruno Ribeiro. Holographic node representations: Pre-training task-agnostic node embeddings. InProceedings of the Interna- tional Conference on Learning Representations (ICLR), 2025. URL https://openreview. net/forum?id=tGYFikNONB

  3. [3]

    Data distributional proper- ties drive emergent in-context learning in transformers

    Stephanie Chan, Adam Santoro, Andrew Lampinen, Jane Wang, Aaditya Singh, Pierre Richemond, James McClelland, and Felix Hill. Data distributional proper- ties drive emergent in-context learning in transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Infor- mation Processing Systems, volume 35, pages...

  4. [4]

    URL https://proceedings.neurips.cc/paper_files/paper/2022/file/ 77c6ccacfd9962e2307fc64680fc5ace-Paper-Conference.pdf

  5. [5]

    RelGNN: Composite message passing for relational deep learning

    Tianlang Chen, Charilaos Kanatsoulis, and Jure Leskovec. RelGNN: Composite message passing for relational deep learning. InProceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 ofProceedings of Machine Learning Research, 2025. URLhttps://proceedings.mlr.press/v267/chen25ad.html

  6. [6]

    Xgboost: A scalable tree boosting system

    Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, 2016. doi: 10.1145/2939672.2939785. URL https://dl.acm.org/doi/10. 1145/2939672.2939785

  7. [7]

    AutoG: Towards automatic graph construction from tabular data

    Zhikai Chen, Han Xie, Jian Zhang, Xiang Song, Jiliang Tang, Huzefa Rangwala, and George Karypis. AutoG: Towards automatic graph construction from tabular data. InProceedings of the International Conference on Learning Representations (ICLR), 2025. URL https: //openreview.net/forum?id=hovDbX4Gh6

  8. [8]

    Re- latron: Automating relational machine learning over relational databases

    Zhikai Chen, Han Xie, Jian Zhang, Jiliang Tang, Xiang Song, and Huzefa Rangwala. Re- latron: Automating relational machine learning over relational databases. InProceed- ings of the International Conference on Learning Representations (ICLR), 2026. URL https://openreview.net/forum?id=59avbH4HnU

  9. [9]

    On lazy training in differentiable program- ming

    Lénaïc Chizat, Edouard Oyallon, and Francis Bach. On lazy training in differentiable program- ming. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ ae614c...

  10. [10]

    RDB2G-Bench: A comprehensive benchmark for automatic graph modeling of relational databases

    Dongwon Choi, Sunwoo Kim, Juyeon Kim, Kyungho Kim, Geon Lee, Shinhwan Kang, Myungh- wan Kim, and Kijung Shin. RDB2G-Bench: A comprehensive benchmark for automatic graph modeling of relational databases. InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2025. URL https://openreview.net/forum? id=ZbcUIxACQE

  11. [11]

    Learning posterior predictive distributions for node classification from synthetic graph priors

    Jeongwhan Choi, Jongwoo Kim, Woosung Kang, and Noseong Park. Learning posterior predictive distributions for node classification from synthetic graph priors. InProceedings of the International Conference on Learning Representations (ICLR), 2026. URL https: //openreview.net/forum?id=FmxRzlu0rT

  12. [12]

    Edgar F. Codd. A relational model of data for large shared data banks.Communications of the ACM, 13(6):377–387, 1970. doi: 10.1145/362384.362685. URL https://dl.acm.org/doi/ 10.1145/362384.362685. 10

  13. [13]

    Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec

    Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico Lopez, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec. Relational graph transformer. InProceedings of the International Conference on Learning Representations (ICLR), 2026. URL https: //openreview.net/forum?id=2d3j6bt21A

  14. [14]

    Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906, 2025

    Dmitry Eremeev, Gleb Bazhenov, Oleg Platonov, Artem Babenko, and Liudmila Prokhorenkova. Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906, 2025

  15. [15]

    GraphPFN: A prior-data fitted graph foundation model

    Dmitry Eremeev, Oleg Platonov, Gleb Bazhenov, Artem Babenko, and Liudmila Prokhorenkova. GraphPFN: A prior-data fitted graph foundation model. InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026. URL https://openreview. net/forum?id=pkzHrpr7jG

  16. [16]

    Position: Relational deep learning – graph representation learning on relational databases

    Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. Position: Relational deep learning – graph representation learning on relational databases. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, ...

  17. [17]

    KumoRFM: A foundation model for in-context learning on relational data

    Matthias Fey, Vid Kocijan, Federico Lopez, Jan Eric Lenssen, and Jure Leskovec. KumoRFM: A foundation model for in-context learning on relational data. Technical report, Kumo.ai, 2025. URLhttps://kumo.ai/research/kumo_relational_foundation_model.pdf

  18. [18]

    Towards foun- dation models for knowledge graph reasoning

    Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foun- dation models for knowledge graph reasoning. InProceedings of the International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id= jVEoydFOl9

  19. [19]

    RelBench v2: A large-scale benchmark and repository for relational data

    Justin Gu, Rishabh Ranjan, Charilaos Kanatsoulis, Haiming Tang, Martin Jurkovic, Valter Hudovernik, Mark Znidar, Pranshu Chaturvedi, Parth Shroff, Fengyu Li, and Jure Leskovec. RelBench v2: A large-scale benchmark and repository for relational data. InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026. URL https://open...

  20. [20]

    Understanding emergent in-context learning from a kernel regression perspective.Transactions on Machine Learning Research (TMLR), 2025

    Chi Han, Ziqi Wang, Han Zhao, and Heng Ji. Understanding emergent in-context learning from a kernel regression perspective.Transactions on Machine Learning Research (TMLR), 2025. URLhttps://openreview.net/forum?id=6rD50Q6yYz

  21. [21]

    Understanding in-context learning via supportive pretraining data

    Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, and Tianlu Wang. Understanding in-context learning via supportive pretraining data. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. URL https://aclanthology.org/2023.acl-long.708/

  22. [22]

    1983 , issn =

    Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109–137, 1983. doi: 10.1016/0378-8733(83)90021-7. URL https://doi.org/10.1016/0378-8733(83)90021-7

  23. [23]

    u ller, S., Purucker, L., Krishnakumar, A., K \

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025. doi: 10.1038/s41586-024-08328-6. URLhttps://www.nature.com/articles/s41586-024-08328-6

  24. [24]

    KumoRFM-2: Scaling foundation models for relational learning

    Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. KumoRFM-2: Scaling foundation models for relational learning. Technical report, Kumo.ai, 2026. URL https://kumo.ai/ kumoRFM-2-scaling-foundation-models-for-relational-learning.pdf

  25. [25]

    IEEE-CIS fraud de- tection

    IEEE Computational Intelligence Society and Vesta Corporation. IEEE-CIS fraud de- tection. Kaggle Competition, 2019. URL https://www.kaggle.com/competitions/ ieee-fraud-detection. 11

  26. [26]

    Neural tangent kernel: Convergence and generalization in neural networks

    Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Sys- tems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/ paper_f...

  27. [27]

    Linkage and autocorrelation cause feature selection bias in relational learning

    David Jensen and Jennifer Neville. Linkage and autocorrelation cause feature selection bias in relational learning. InProceedings of the Nineteenth International Conference on Machine Learning (ICML), 2002. URLhttps://dl.acm.org/doi/10.5555/645531.655828

  28. [28]

    Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Moham- mad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. MIMIC-III, a freely accessible critical care database.Scientific Data, 3:160035, 2016. doi: 10.1038/sdata.2016.35. URLhttps://www.nature.com/articles/sdata201635

  29. [29]

    Deep feature synthesis: Towards automating data science endeavors

    James Max Kanter and Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. InIEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015. doi: 10.1109/DSAA.2015.7344858. URL https://doi.org/10. 1109/DSAA.2015.7344858

  30. [30]

    Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks.Physical Review E, 83(1):016107, 2011. doi: 10.1103/PhysRevE.83.016107. URL https://link.aps.org/doi/10.1103/PhysRevE.83.016107

  31. [31]

    Lightgbm: A highly efficient gradient boosting decision tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, ed- itors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 201...

  32. [32]

    Predictive query language: A domain-specific language for predictive modeling on relational databases.arXiv preprint arXiv:2602.09572, 2026

    Vid Kocijan, Jinu Sunil, Jan Eric Lenssen, Viman Deb, Xinwei Xe, Federico Reyes Gomez, Matthias Fey, and Jure Leskovec. Predictive query language: A domain-specific language for predictive modeling on relational databases.arXiv preprint arXiv:2602.09572, 2026

  33. [33]

    PluRel: Synthetic data unlocks scaling laws for rela- tional foundation models

    Vignesh Kothapalli, Rishabh Ranjan, Valter Hudovernik, Vijay Prakash Dwivedi, Johannes Hoffart, Carlos Guestrin, and Jure Leskovec. PluRel: Synthetic data unlocks scaling laws for rela- tional foundation models. InICLR 2026 Workshop on Foundation Models for Tabular and Struc- tured Data (DATA-FM), 2026. URLhttps://openreview.net/forum?id=iti7t2oI85

  34. [34]

    Position: Graph foundation models are already here

    Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, and Jiliang Tang. Position: Graph foundation models are already here. In Proceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, 2024. URL https://proceedings.mlr. press/v235/mao24a.html

  35. [35]

    Transformers can do bayesian inference

    Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=KSugKcbNf9

  36. [36]

    Statistical foundations of prior-data fitted networks

    Thomas Nagler. Statistical foundations of prior-data fitted networks. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceed- ings of Machine Learning Research, 2023. URL https://proceedings.mlr.press/v202/ nagler23a.html

  37. [37]

    Leveraging relational autocorrelation with latent group models

    Jennifer Neville and David Jensen. Leveraging relational autocorrelation with latent group models. InProceedings of the IEEE International Conference on Data Mining (ICDM), 2005. doi: 10.1109/ICDM.2005.89. URLhttps://dl.acm.org/doi/10.1109/ICDM.2005.89

  38. [38]

    NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails

    Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly- labeled reviews and fine-grained aspects. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 188–197, 2019. doi: 10.18653/v1/ D19-1018. URLhttps://aclanthology.org/D19-1018/. 12

  39. [39]

    Character- izing graph datasets for node classification: Homophily-heterophily dichotomy and beyond

    Oleg Platonov, Denis Kuznedelev, Artem Babenko, and Liudmila Prokhorenkova. Character- izing graph datasets for node classification: Homophily-heterophily dichotomy and beyond. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 523–548. Curran Associates, Inc....

  40. [40]

    TabICL: A tabular foundation model for in-context learning on large data

    Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data. InProceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 ofProceedings of Machine Learning Research, 2025. URLhttps://proceedings.mlr.press/v267/qu25d.html

  41. [41]

    Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec

    Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos I. Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec. Relational transformer: Toward zero-shot foundation models for relational data. In The Fourteenth International Conference on Learning Representations, 2026. URL https: //openrevi...

  42. [42]

    Pretraining task diversity and the emergence of non-bayesian in-context learning for regression

    Allan Raventós, Mansheej Paul, Feng Chen, and Surya Ganguli. Pretraining task diversity and the emergence of non-bayesian in-context learning for regression. In A. Oh, T. Nau- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Systems, volume 36, pages 14228–14246. Curran Associates, Inc., 2023. URL...

  43. [43]

    Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec

    Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec. Relbench: A benchmark for deep learning on relational databases. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Inform...

  44. [44]

    URL https://proceedings.neurips.cc/paper_ files/paper/2024/file/25cd345233c65fac1fec0ce61d0f7836-Paper-Datasets_ and_Benchmarks_Track.pdf

    doi: 10.52202/079017-0672. URL https://proceedings.neurips.cc/paper_ files/paper/2024/file/25cd345233c65fac1fec0ce61d0f7836-Paper-Datasets_ and_Benchmarks_Track.pdf

  45. [45]

    Prototypical networks for few-shot learning

    Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ cb8d...

  46. [46]

    A pre- training framework for relational data with information-theoretic principles

    Quang Truong, Zhikai Chen, Mingxuan Ju, Tong Zhao, Neil Shah, and Jiliang Tang. A pre- training framework for relational data with information-theoretic principles. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. URL https://openreview.net/ forum?id=xNUNxRj2vJ

  47. [47]

    Transformers learn in-context by gradient descent

    Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceedings of Machine Learning Research, 2023. URL https://proceedings.mlr.pres...

  48. [48]

    4dbinfer: A 4d benchmarking toolbox for graph-centric predictive modeling on rdbs

    Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yan- lin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, and Zheng Zhang. 4dbinfer: A 4d benchmarking toolbox for graph-centric predictive modeling on rdbs. In A. Globerson,...

  49. [49]

    Griffin: Towards a graph-centric relational database foundation model

    Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. Griffin: Towards a graph-centric relational database foundation model. InProceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 ofProceed- ings of Machine Learning Research, 2025. URL https://proceedings.mlr.press/v267/ wang25da.html

  50. [50]

    Relational in-context learning via synthetic pre-training with structural prior.arXiv preprint arXiv:2603.03805, 2026

    Yanbo Wang, Jiaxuan You, Chuan Shi, and Muhan Zhang. Relational in-context learning via synthetic pre-training with structural prior.arXiv preprint arXiv:2603.03805, 2026

  51. [51]

    Graph foundation models: A comprehensive survey.arXiv preprint arXiv:2505.15116, 2025

    Zehong Wang, Zheyuan Liu, Tianyi Ma, Jiazheng Li, Zheyuan Zhang, Xingbo Fu, Yiyang Li, Zhengqing Yuan, Wei Song, Yijun Ma, Qingkai Zeng, Xiusi Chen, Jianan Zhao, Jundong Li, Meng Jiang, Pietro Lio, Nitesh Chawla, Chuxu Zhang, and Yanfang Ye. Graph foundation models: A comprehensive survey.arXiv preprint arXiv:2505.15116, 2025

  52. [52]

    Larger language models do in-context learning differently.arXiv preprint arXiv:2303.03846, 2023

    Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, and Tengyu Ma. Larger language models do in-context learning differently.arXiv preprint arXiv:2303.03846, 2023

  53. [53]

    The learnability of in-context learning

    Noam Wies, Yoav Levine, and Amnon Shashua. The learnability of in-context learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 36637–36651. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 73950f0eb4ac0925d...

  54. [54]

    Large language models are good relational learners

    Fang Wu, Vijay Prakash Dwivedi, and Jure Leskovec. Large language models are good relational learners. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025. URLhttps://aclanthology.org/2025.acl-long.386/

  55. [55]

    Tackling prediction tasks in relational databases with LLMs.arXiv preprint arXiv:2411.11829, 2024

    Marek Wydmuch, Lukasz Borchmann, and Filip Gralinski. Tackling prediction tasks in relational databases with LLMs.arXiv preprint arXiv:2411.11829, 2024

  56. [56]

    An explanation of in- context learning as implicit Bayesian inference

    Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in- context learning as implicit Bayesian inference. InProceedings of the International Conference on Learning Representations (ICLR), 2022. URL https://openreview.net/forum?id= RdJVFCHjUMI

  57. [57]

    Do RDB foundation models even need data? InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026

    Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, and David Wipf. Do RDB foundation models even need data? InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026. URLhttps://openreview.net/forum?id=Lz50laXiSa

  58. [58]

    Greg Yang and Edward J. Hu. Tensor programs iv: Feature learning in infinite-width neural networks. InProceedings of the 38th International Conference on Machine Learn- ing (ICML), volume 139 ofProceedings of Machine Learning Research, 2021. URL https://proceedings.mlr.press/v139/yang21c.html

  59. [59]

    ContextGNN: Beyond two-tower recommendation systems

    Yiwen Yuan, Zecheng Zhang, Xinwei He, Akihiro Nitta, Weihua Hu, Manan Shah, Blaž Stojanoviˇc, Shenyang Huang, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. ContextGNN: Beyond two-tower recommendation systems. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=nzOD1we8Z4

  60. [60]

    What and how does in-context learning learn? Bayesian model averaging, parameterization, and generalization

    Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, and Zhaoran Wang. What and how does in-context learning learn? Bayesian model averaging, parameterization, and generalization. InProceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 258 ofProceedings of Machine Learning Research, 2025. URL https: //proceedi...

  61. [61]

    RT is in the lazy / frozen-feature regime

    Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal Xhonneux, and Jian Tang. Neural bellman-ford networks: A general graph neural network framework for link prediction. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 29476–29490. Curran Associates, Inc., 2021...