OpenRFM: Dissecting Relational In-Context Learning
Pith reviewed 2026-06-28 07:35 UTC · model grok-4.3
The pith
A dual-stage architecture plus homophily-aware pre-training turns the Relational Transformer into OpenRFM that raises relational ICL performance by about 30 percent and exceeds KumoRFMv1.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Relational in-context learning performed by the Relational Transformer occurs at the relation level and fails when sparse label-cell coverage produces an underdetermined regression. Synthetic pre-training induces a lazy regime while in-distribution pre-training supports feature learning; the performance difference traces to the absence of a support-identifiable relational latent in the label-generation process. These two diagnoses are addressed by a dual-stage ICL architecture that augments the relational backbone with a batch-level ICL layer and by a homophily-aware pre-training mixture of synthetic data, continual real data, and prototype-based regularization. The resulting OpenRFM improve
What carries the argument
Dual-stage ICL architecture that pairs the relational backbone with a batch-level ICL layer, together with the homophily-aware synthetic-plus-real pre-training mixture augmented by prototype regularization.
If this is right
- Relation-level ICL produces an underdetermined kernel regression when label coverage is sparse.
- The choice of pre-training source determines whether the same architecture enters a lazy or a feature-learning regime.
- Adding a batch-level ICL layer overcomes the label scarcity that limits pure relation-level ICL.
- A homophily-aware mixture of synthetic and real data plus prototype regularization supplies the missing relational latent.
- OpenRFM reaches roughly 30 percent higher average performance than the RT backbone and exceeds KumoRFMv1 on many tasks.
Where Pith is reading between the lines
- The same dual-stage lift may help other in-context learners that encounter sparse label structures.
- Probing for support-identifiable latents could serve as a general diagnostic for ICL failures beyond relational data.
- Blending synthetic and real data with explicit homophily awareness may improve pre-training regimes in tabular and graph settings more broadly.
Load-bearing premise
The support-identifiable relational latent identified by probing is the load-bearing cause of the observed gap and the dual-stage architecture plus homophily-aware mixture will reliably supply it without new failure modes.
What would settle it
An ablation that removes only the batch-level ICL layer from OpenRFM and measures whether average performance on sparse-label tasks falls back to the level of the original Relational Transformer.
Figures
read the original abstract
Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open RFMs from their commercial counterparts, and the origin of this gap has not been systematically understood. We dissect a representative framework, the Relational Transformer (RT), from two perspectives. Model side: we show that RT performs relation-level ICL, and a kernel regression view shows it fails when sparse label-cell coverage yields an underdetermined regression. Data side: we ablate RT's pre-training source and find that existing synthetic-only pre-training and in-distribution pre-training drive the same architecture into different regimes, lazy vs. feature-learning. Probing this gap reveals that the missing ingredient is a support-identifiable relational latent in the label-generation process. These two diagnoses translate into (1) a dual-stage ICL architecture that combines the relational backbone with a batch-level ICL layer lifted from a pre-trained tabular foundation model to overcome relation-level label scarcity, and (2) a homophily-aware synthetic plus continual real-data pre-training mixture, augmented with a prototype-based regularization. These choices define OpenRFM, a simple yet effective RFM that improves average task performance by approximately 30% over the RT backbone and surpasses the commercial model KumoRFMv1 on a large set of evaluation tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper dissects the Relational Transformer (RT) for relational in-context learning (ICL), showing via kernel-regression analysis that it performs relation-level ICL and fails under sparse label coverage, and via pre-training ablations that synthetic-only regimes induce lazy learning missing a support-identifiable relational latent. These diagnoses motivate OpenRFM: a dual-stage architecture adding a batch-level ICL layer from a tabular FM, plus homophily-aware synthetic+real pre-training with prototype regularization. The abstract claims this yields ~30% average task improvement over RT and surpasses commercial KumoRFMv1 on many tasks.
Significance. If the causal link between the diagnosed latent gap and the observed gains is established, the work would be significant: it supplies a concrete mechanistic account of why open RFMs lag commercial ones and demonstrates that targeted architectural and data-mixture changes can close much of the gap without new model families. The kernel view and controlled pre-training ablations are strengths that could guide future RFM design.
major comments (2)
- [Abstract] Abstract (paragraph on diagnoses and translation to fixes): The central claim that the dual-stage ICL layer and homophily-aware mixture close the missing support-identifiable relational latent (and thereby produce the 30% gain) is not supported by direct evidence such as latent probing on OpenRFM itself or ablations that isolate the latent-identification mechanism from capacity or optimization changes; without this link the performance numbers could arise from unexamined factors.
- [Abstract] Abstract (performance claims): The statements of ~30% average improvement over RT and surpassing KumoRFMv1 are presented without any quantitative results, error bars, dataset counts, task list, or statistical tests, so the reader cannot evaluate whether the gains are robust or whether the commercial-surpassing claim holds under the evaluation protocol used.
minor comments (1)
- [Abstract] The term 'support-identifiable relational latent' is introduced without a formal definition or probing procedure, making it difficult to verify that the proposed fixes actually supply it.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the mechanistic insights from the kernel analysis and pre-training ablations. We address each major comment below and propose targeted revisions to the abstract.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on diagnoses and translation to fixes): The central claim that the dual-stage ICL layer and homophily-aware mixture close the missing support-identifiable relational latent (and thereby produce the 30% gain) is not supported by direct evidence such as latent probing on OpenRFM itself or ablations that isolate the latent-identification mechanism from capacity or optimization changes; without this link the performance numbers could arise from unexamined factors.
Authors: The diagnoses originate from the kernel-regression analysis (showing relation-level ICL and failure under sparse coverage) and the synthetic-vs-real pre-training ablations (revealing the missing support-identifiable latent). OpenRFM's dual-stage architecture and homophily-aware mixture are direct responses to these specific mechanisms. Section 5 presents controlled ablations that isolate each component's contribution while holding capacity and optimization fixed, demonstrating gains attributable to addressing the diagnosed gaps rather than generic improvements. We did not include new latent probing on the final OpenRFM model. We will revise the abstract to explicitly reference these isolating ablations and the kernel view as the supporting evidence for the causal translation. revision: partial
-
Referee: [Abstract] Abstract (performance claims): The statements of ~30% average improvement over RT and surpassing KumoRFMv1 are presented without any quantitative results, error bars, dataset counts, task list, or statistical tests, so the reader cannot evaluate whether the gains are robust or whether the commercial-surpassing claim holds under the evaluation protocol used.
Authors: The abstract provides a concise summary; the full quantitative results (average improvement computed over the 12 tasks in Table 2, per-task scores with error bars, dataset details, and comparisons to KumoRFMv1) appear in Sections 5 and 6 with statistical details in the appendix. We agree the abstract can be strengthened for standalone readability and will add a brief qualifier such as "(average over 12 tasks; see Table 2 for per-task results with error bars)" while retaining the high-level claim. Full task lists and tests remain in the main body due to space limits. revision: yes
Circularity Check
No significant circularity detected in the derivation chain.
full rationale
The paper derives its diagnoses from explicit ablations of RT pre-training sources and a kernel regression analysis of relation-level ICL under sparse labels; these are independent empirical observations. The dual-stage architecture and homophily-aware mixture are presented as direct engineering responses to those observations, with the ~30% gain and outperformance of KumoRFMv1 reported as measured task results rather than any redefinition or statistical forcing of the input quantities. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described chain, and the central claims remain falsifiable against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Relational in-context learning can be usefully viewed as kernel regression
- domain assumption Pre-training data source controls whether the model enters lazy or feature-learning regime
invented entities (1)
-
support-identifiable relational latent
no independent evidence
Reference graph
Works this paper leans on
-
[1]
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? Investigations with linear models. InProceed- ings of the International Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=0g0X4H8yN4I
2023
-
[2]
Holographic node representations: Pre-training task-agnostic node embeddings
Beatrice Bevilacqua, Joshua Robinson, Jure Leskovec, and Bruno Ribeiro. Holographic node representations: Pre-training task-agnostic node embeddings. InProceedings of the Interna- tional Conference on Learning Representations (ICLR), 2025. URL https://openreview. net/forum?id=tGYFikNONB
2025
-
[3]
Data distributional proper- ties drive emergent in-context learning in transformers
Stephanie Chan, Adam Santoro, Andrew Lampinen, Jane Wang, Aaditya Singh, Pierre Richemond, James McClelland, and Felix Hill. Data distributional proper- ties drive emergent in-context learning in transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Infor- mation Processing Systems, volume 35, pages...
-
[4]
URL https://proceedings.neurips.cc/paper_files/paper/2022/file/ 77c6ccacfd9962e2307fc64680fc5ace-Paper-Conference.pdf
2022
-
[5]
RelGNN: Composite message passing for relational deep learning
Tianlang Chen, Charilaos Kanatsoulis, and Jure Leskovec. RelGNN: Composite message passing for relational deep learning. InProceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 ofProceedings of Machine Learning Research, 2025. URLhttps://proceedings.mlr.press/v267/chen25ad.html
2025
-
[6]
Xgboost: A scalable tree boosting system
Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794, 2016. doi: 10.1145/2939672.2939785. URL https://dl.acm.org/doi/10. 1145/2939672.2939785
-
[7]
AutoG: Towards automatic graph construction from tabular data
Zhikai Chen, Han Xie, Jian Zhang, Xiang Song, Jiliang Tang, Huzefa Rangwala, and George Karypis. AutoG: Towards automatic graph construction from tabular data. InProceedings of the International Conference on Learning Representations (ICLR), 2025. URL https: //openreview.net/forum?id=hovDbX4Gh6
2025
-
[8]
Re- latron: Automating relational machine learning over relational databases
Zhikai Chen, Han Xie, Jian Zhang, Jiliang Tang, Xiang Song, and Huzefa Rangwala. Re- latron: Automating relational machine learning over relational databases. InProceed- ings of the International Conference on Learning Representations (ICLR), 2026. URL https://openreview.net/forum?id=59avbH4HnU
2026
-
[9]
On lazy training in differentiable program- ming
Lénaïc Chizat, Edouard Oyallon, and Francis Bach. On lazy training in differentiable program- ming. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ ae614c...
2019
-
[10]
RDB2G-Bench: A comprehensive benchmark for automatic graph modeling of relational databases
Dongwon Choi, Sunwoo Kim, Juyeon Kim, Kyungho Kim, Geon Lee, Shinhwan Kang, Myungh- wan Kim, and Kijung Shin. RDB2G-Bench: A comprehensive benchmark for automatic graph modeling of relational databases. InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2025. URL https://openreview.net/forum? id=ZbcUIxACQE
2025
-
[11]
Learning posterior predictive distributions for node classification from synthetic graph priors
Jeongwhan Choi, Jongwoo Kim, Woosung Kang, and Noseong Park. Learning posterior predictive distributions for node classification from synthetic graph priors. InProceedings of the International Conference on Learning Representations (ICLR), 2026. URL https: //openreview.net/forum?id=FmxRzlu0rT
2026
-
[12]
Edgar F. Codd. A relational model of data for large shared data banks.Communications of the ACM, 13(6):377–387, 1970. doi: 10.1145/362384.362685. URL https://dl.acm.org/doi/ 10.1145/362384.362685. 10
-
[13]
Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec
Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico Lopez, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec. Relational graph transformer. InProceedings of the International Conference on Learning Representations (ICLR), 2026. URL https: //openreview.net/forum?id=2d3j6bt21A
2026
-
[14]
Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906, 2025
Dmitry Eremeev, Gleb Bazhenov, Oleg Platonov, Artem Babenko, and Liudmila Prokhorenkova. Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906, 2025
Pith/arXiv arXiv 2025
-
[15]
GraphPFN: A prior-data fitted graph foundation model
Dmitry Eremeev, Oleg Platonov, Gleb Bazhenov, Artem Babenko, and Liudmila Prokhorenkova. GraphPFN: A prior-data fitted graph foundation model. InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026. URL https://openreview. net/forum?id=pkzHrpr7jG
2026
-
[16]
Position: Relational deep learning – graph representation learning on relational databases
Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. Position: Relational deep learning – graph representation learning on relational databases. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, ...
2024
-
[17]
KumoRFM: A foundation model for in-context learning on relational data
Matthias Fey, Vid Kocijan, Federico Lopez, Jan Eric Lenssen, and Jure Leskovec. KumoRFM: A foundation model for in-context learning on relational data. Technical report, Kumo.ai, 2025. URLhttps://kumo.ai/research/kumo_relational_foundation_model.pdf
2025
-
[18]
Towards foun- dation models for knowledge graph reasoning
Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foun- dation models for knowledge graph reasoning. InProceedings of the International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id= jVEoydFOl9
2024
-
[19]
RelBench v2: A large-scale benchmark and repository for relational data
Justin Gu, Rishabh Ranjan, Charilaos Kanatsoulis, Haiming Tang, Martin Jurkovic, Valter Hudovernik, Mark Znidar, Pranshu Chaturvedi, Parth Shroff, Fengyu Li, and Jure Leskovec. RelBench v2: A large-scale benchmark and repository for relational data. InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026. URL https://open...
2026
-
[20]
Understanding emergent in-context learning from a kernel regression perspective.Transactions on Machine Learning Research (TMLR), 2025
Chi Han, Ziqi Wang, Han Zhao, and Heng Ji. Understanding emergent in-context learning from a kernel regression perspective.Transactions on Machine Learning Research (TMLR), 2025. URLhttps://openreview.net/forum?id=6rD50Q6yYz
2025
-
[21]
Understanding in-context learning via supportive pretraining data
Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, and Tianlu Wang. Understanding in-context learning via supportive pretraining data. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. URL https://aclanthology.org/2023.acl-long.708/
2023
-
[22]
Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109–137, 1983. doi: 10.1016/0378-8733(83)90021-7. URL https://doi.org/10.1016/0378-8733(83)90021-7
-
[23]
u ller, S., Purucker, L., Krishnakumar, A., K \
Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025. doi: 10.1038/s41586-024-08328-6. URLhttps://www.nature.com/articles/s41586-024-08328-6
-
[24]
KumoRFM-2: Scaling foundation models for relational learning
Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. KumoRFM-2: Scaling foundation models for relational learning. Technical report, Kumo.ai, 2026. URL https://kumo.ai/ kumoRFM-2-scaling-foundation-models-for-relational-learning.pdf
2026
-
[25]
IEEE-CIS fraud de- tection
IEEE Computational Intelligence Society and Vesta Corporation. IEEE-CIS fraud de- tection. Kaggle Competition, 2019. URL https://www.kaggle.com/competitions/ ieee-fraud-detection. 11
2019
-
[26]
Neural tangent kernel: Convergence and generalization in neural networks
Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Sys- tems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/ paper_f...
2018
-
[27]
Linkage and autocorrelation cause feature selection bias in relational learning
David Jensen and Jennifer Neville. Linkage and autocorrelation cause feature selection bias in relational learning. InProceedings of the Nineteenth International Conference on Machine Learning (ICML), 2002. URLhttps://dl.acm.org/doi/10.5555/645531.655828
-
[28]
Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Moham- mad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. MIMIC-III, a freely accessible critical care database.Scientific Data, 3:160035, 2016. doi: 10.1038/sdata.2016.35. URLhttps://www.nature.com/articles/sdata201635
-
[29]
Deep feature synthesis: Towards automating data science endeavors
James Max Kanter and Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. InIEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015. doi: 10.1109/DSAA.2015.7344858. URL https://doi.org/10. 1109/DSAA.2015.7344858
-
[30]
Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks.Physical Review E, 83(1):016107, 2011. doi: 10.1103/PhysRevE.83.016107. URL https://link.aps.org/doi/10.1103/PhysRevE.83.016107
-
[31]
Lightgbm: A highly efficient gradient boosting decision tree
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, ed- itors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 201...
2017
-
[32]
Vid Kocijan, Jinu Sunil, Jan Eric Lenssen, Viman Deb, Xinwei Xe, Federico Reyes Gomez, Matthias Fey, and Jure Leskovec. Predictive query language: A domain-specific language for predictive modeling on relational databases.arXiv preprint arXiv:2602.09572, 2026
arXiv 2026
-
[33]
PluRel: Synthetic data unlocks scaling laws for rela- tional foundation models
Vignesh Kothapalli, Rishabh Ranjan, Valter Hudovernik, Vijay Prakash Dwivedi, Johannes Hoffart, Carlos Guestrin, and Jure Leskovec. PluRel: Synthetic data unlocks scaling laws for rela- tional foundation models. InICLR 2026 Workshop on Foundation Models for Tabular and Struc- tured Data (DATA-FM), 2026. URLhttps://openreview.net/forum?id=iti7t2oI85
2026
-
[34]
Position: Graph foundation models are already here
Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, and Jiliang Tang. Position: Graph foundation models are already here. In Proceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, 2024. URL https://proceedings.mlr. press/v235/mao24a.html
2024
-
[35]
Transformers can do bayesian inference
Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions, 2022. URLhttps://openreview.net/forum?id=KSugKcbNf9
2022
-
[36]
Statistical foundations of prior-data fitted networks
Thomas Nagler. Statistical foundations of prior-data fitted networks. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceed- ings of Machine Learning Research, 2023. URL https://proceedings.mlr.press/v202/ nagler23a.html
2023
-
[37]
Leveraging relational autocorrelation with latent group models
Jennifer Neville and David Jensen. Leveraging relational autocorrelation with latent group models. InProceedings of the IEEE International Conference on Data Mining (ICDM), 2005. doi: 10.1109/ICDM.2005.89. URLhttps://dl.acm.org/doi/10.1109/ICDM.2005.89
-
[38]
NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails
Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying recommendations using distantly- labeled reviews and fine-grained aspects. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 188–197, 2019. doi: 10.18653/v1/ D19-1018. URLhttps://aclanthology.org/D19-1018/. 12
-
[39]
Character- izing graph datasets for node classification: Homophily-heterophily dichotomy and beyond
Oleg Platonov, Denis Kuznedelev, Artem Babenko, and Liudmila Prokhorenkova. Character- izing graph datasets for node classification: Homophily-heterophily dichotomy and beyond. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 523–548. Curran Associates, Inc....
2023
-
[40]
TabICL: A tabular foundation model for in-context learning on large data
Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data. InProceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 ofProceedings of Machine Learning Research, 2025. URLhttps://proceedings.mlr.press/v267/qu25d.html
2025
-
[41]
Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec
Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos I. Kanatsoulis, Roshan Reddy Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec. Relational transformer: Toward zero-shot foundation models for relational data. In The Fourteenth International Conference on Learning Representations, 2026. URL https: //openrevi...
2026
-
[42]
Pretraining task diversity and the emergence of non-bayesian in-context learning for regression
Allan Raventós, Mansheej Paul, Feng Chen, and Surya Ganguli. Pretraining task diversity and the emergence of non-bayesian in-context learning for regression. In A. Oh, T. Nau- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Systems, volume 36, pages 14228–14246. Curran Associates, Inc., 2023. URL...
2023
-
[43]
Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec
Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec. Relbench: A benchmark for deep learning on relational databases. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Inform...
-
[44]
doi: 10.52202/079017-0672. URL https://proceedings.neurips.cc/paper_ files/paper/2024/file/25cd345233c65fac1fec0ce61d0f7836-Paper-Datasets_ and_Benchmarks_Track.pdf
-
[45]
Prototypical networks for few-shot learning
Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/ cb8d...
2017
-
[46]
A pre- training framework for relational data with information-theoretic principles
Quang Truong, Zhikai Chen, Mingxuan Ju, Tong Zhao, Neil Shah, and Jiliang Tang. A pre- training framework for relational data with information-theoretic principles. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. URL https://openreview.net/ forum?id=xNUNxRj2vJ
2025
-
[47]
Transformers learn in-context by gradient descent
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceedings of Machine Learning Research, 2023. URL https://proceedings.mlr.pres...
2023
-
[48]
4dbinfer: A 4d benchmarking toolbox for graph-centric predictive modeling on rdbs
Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yan- lin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, and Zheng Zhang. 4dbinfer: A 4d benchmarking toolbox for graph-centric predictive modeling on rdbs. In A. Globerson,...
-
[49]
Griffin: Towards a graph-centric relational database foundation model
Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. Griffin: Towards a graph-centric relational database foundation model. InProceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 ofProceed- ings of Machine Learning Research, 2025. URL https://proceedings.mlr.press/v267/ wang25da.html
2025
-
[50]
Yanbo Wang, Jiaxuan You, Chuan Shi, and Muhan Zhang. Relational in-context learning via synthetic pre-training with structural prior.arXiv preprint arXiv:2603.03805, 2026
Pith/arXiv arXiv 2026
-
[51]
Graph foundation models: A comprehensive survey.arXiv preprint arXiv:2505.15116, 2025
Zehong Wang, Zheyuan Liu, Tianyi Ma, Jiazheng Li, Zheyuan Zhang, Xingbo Fu, Yiyang Li, Zhengqing Yuan, Wei Song, Yijun Ma, Qingkai Zeng, Xiusi Chen, Jianan Zhao, Jundong Li, Meng Jiang, Pietro Lio, Nitesh Chawla, Chuxu Zhang, and Yanfang Ye. Graph foundation models: A comprehensive survey.arXiv preprint arXiv:2505.15116, 2025
arXiv 2025
-
[52]
Larger language models do in-context learning differently.arXiv preprint arXiv:2303.03846, 2023
Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, and Tengyu Ma. Larger language models do in-context learning differently.arXiv preprint arXiv:2303.03846, 2023
arXiv 2023
-
[53]
The learnability of in-context learning
Noam Wies, Yoav Levine, and Amnon Shashua. The learnability of in-context learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 36637–36651. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 73950f0eb4ac0925d...
2023
-
[54]
Large language models are good relational learners
Fang Wu, Vijay Prakash Dwivedi, and Jure Leskovec. Large language models are good relational learners. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025. URLhttps://aclanthology.org/2025.acl-long.386/
2025
-
[55]
Tackling prediction tasks in relational databases with LLMs.arXiv preprint arXiv:2411.11829, 2024
Marek Wydmuch, Lukasz Borchmann, and Filip Gralinski. Tackling prediction tasks in relational databases with LLMs.arXiv preprint arXiv:2411.11829, 2024
arXiv 2024
-
[56]
An explanation of in- context learning as implicit Bayesian inference
Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in- context learning as implicit Bayesian inference. InProceedings of the International Conference on Learning Representations (ICLR), 2022. URL https://openreview.net/forum?id= RdJVFCHjUMI
2022
-
[57]
Do RDB foundation models even need data? InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026
Linjie Xu, Yanlin Zhang, Quan Gan, Minjie Wang, and David Wipf. Do RDB foundation models even need data? InICLR 2026 Workshop on Foundation Models for Tabular and Structured Data (DATA-FM), 2026. URLhttps://openreview.net/forum?id=Lz50laXiSa
2026
-
[58]
Greg Yang and Edward J. Hu. Tensor programs iv: Feature learning in infinite-width neural networks. InProceedings of the 38th International Conference on Machine Learn- ing (ICML), volume 139 ofProceedings of Machine Learning Research, 2021. URL https://proceedings.mlr.press/v139/yang21c.html
2021
-
[59]
ContextGNN: Beyond two-tower recommendation systems
Yiwen Yuan, Zecheng Zhang, Xinwei He, Akihiro Nitta, Weihua Hu, Manan Shah, Blaž Stojanoviˇc, Shenyang Huang, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. ContextGNN: Beyond two-tower recommendation systems. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=nzOD1we8Z4
2025
-
[60]
What and how does in-context learning learn? Bayesian model averaging, parameterization, and generalization
Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, and Zhaoran Wang. What and how does in-context learning learn? Bayesian model averaging, parameterization, and generalization. InProceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 258 ofProceedings of Machine Learning Research, 2025. URL https: //proceedi...
2025
-
[61]
RT is in the lazy / frozen-feature regime
Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal Xhonneux, and Jian Tang. Neural bellman-ford networks: A general graph neural network framework for link prediction. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 29476–29490. Curran Associates, Inc., 2021...
arXiv 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.