TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Alex Xue; Chao Zhang; Hao Xu; Hehan Li; Jinyang Li; M. Tamer \"Ozsu; Reynold Cheng; Tianshu Yu; Wei Pang; Xiangru Jian

arxiv: 2606.09323 · v1 · pith:ESHKEKQUnew · submitted 2026-06-08 · 💻 cs.AI · cs.DB

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Wei Pang , Xiangru Jian , Hehan Li , Zhixuan Yu , Alex Xue , Jinyang Li , Zhengyuan Dong , Xinjian Zhao

show 5 more authors

Hao Xu Chao Zhang Reynold Cheng M. Tamer \"Ozsu Tianshu Yu

This is my paper

Pith reviewed 2026-06-27 16:46 UTC · model grok-4.3

classification 💻 cs.AI cs.DB

keywords tabular representation learningencoder evaluationbenchmarkcross-paradigm comparisoncolumn embeddingsrow embeddingsdata lake enrichmentrepresentation-level evaluation

0 comments

The pith

Standardizing downstream conditions reveals that tabular encoder quality is capability-specific rather than ranked by any single leaderboard.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TRL-Bench to evaluate tabular encoders from different training paradigms directly at the representation level. Encoders export row, column, or table embeddings through wrappers, and the same lightweight heads are applied across standardized tasks in three suites covering column/table prediction, row linkage, and compositional data-lake enrichment. Results across 20 models show performance tracks how well each encoder's pretraining objective matches the task's signal demands, with text encoders leading on surface-text tasks and specialists winning where objectives align. This setup matters because it replaces end-to-end pipeline comparisons with a common protocol for measuring reusable signal in exported representations.

Core claim

Once downstream conditions are standardized, encoder quality is capability-specific rather than captured by a single leaderboard. In column and table tasks generic text encoders often lead where surface-text signal is strong while tabular specialists win where their pretraining aligns with the task. Within-table prediction and cross-table linkage favor different regimes, and atomic linkage performance correlates with the row-matching stage of enrichment pipelines. Strongest pipelines combine capability-matched specialists rather than reuse one encoder, and end-to-end quality depends on non-additive compositional fit.

What carries the argument

TRL-Bench, a multi-granular benchmark that standardizes export of row-, column-, or table embeddings and probes them with shared lightweight heads across TRL-CTbench, TRL-Rbench, and TRL-DLTE suites.

If this is right

Generic text encoders lead on tasks with strong surface-text signal while tabular specialists win on aligned objectives.
Within-table prediction and cross-table linkage favor different training regimes.
Atomic linkage performance correlates strongly with the row-matching stage of DLTE pipelines.
Top end-to-end quality depends on non-additive compositional fit rather than per-stage marginal rank.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could select encoders by matching capability to task type instead of overall rank.
Extending the benchmark with additional head architectures would test whether the capability-specific pattern persists.
Similar standardization might reveal capability splits in other modalities such as graph or time-series encoders.

Load-bearing premise

The chosen lightweight heads, task reformulations, and wrapper interfaces do not systematically favor or disfavor particular training paradigms.

What would settle it

A single encoder ranking first across all three suites under the fixed protocol would falsify the claim that quality is capability-specific.

Figures

Figures reproduced from arXiv: 2606.09323 by Alex Xue, Chao Zhang, Hao Xu, Hehan Li, Jinyang Li, M. Tamer \"Ozsu, Reynold Cheng, Tianshu Yu, Wei Pang, Xiangru Jian, Xinjian Zhao, Zhengyuan Dong, Zhixuan Yu.

**Figure 1.** Figure 1: TRL-Bench at a glance. Each model is processed once through its supported wrapper to export row-, column-, or table embeddings, and shared lightweight modules then evaluate those embeddings across TRL-CTBENCH (schema, joinability, unionability, grounding), TRL-RBENCH (row prediction, record linkage), and TRL-DLTE (multi-stage data-lake enrichment). embeddings indexed and reused across tasks and large multi… view at source ↗

**Figure 2.** Figure 2: Curation of TRL-RBENCH row-prediction tables and assembly of the TRL-DLTE lake. (a) Row prediction curation: 158 candidate tables filtered through rule screening, degeneracy audit, and human review with label repair into 50 tables with 123 targets. (b) DLTE lake assembly: 1,379 TabFact/WTQ parents fragmented into seed queries and union/join targets at four noise tiers. 11,032 targets are embedded alongside… view at source ↗

**Figure 3.** Figure 3: DLTE pipeline landscape (test split). Axes sort by per-stage marginal UJ-H. Color encodes end-to-end UJ-H. Halo boxes mark marginal and dev-selected compositions. A shared identity-resolution capability across RBench and DLTE. DLTE Stage 3 (row matching) is the compositional counterpart of RBench’s Record Linkage task: both test whether exported row embeddings can resolve cross-table row identity under n… view at source ↗

**Figure 4.** Figure 4: Row-Prediction dataset inventory. Real statistics computed from the 50 source datasets and their per-target metadata. (a) Subject-domain distribution of the 50 OpenML tables, hand-curated from each dataset’s public OpenML description: 12 Finance & Economics, 8 Business & Marketing, 6 Healthcare & Medicine, 6 Natural Sciences, 6 Engineering & Industrial, 6 Software & Security, 2 Education, and 4 Games & Oth… view at source ↗

**Figure 5.** Figure 5: Joint rows-columns footprint of TRL-Bench table inputs. Each panel plots a 2D density of (nrow(T), ncol(T)) over the counted loadable table inputs of one suite, on log–log axes. Bin intensities are normalized within each suite to “% of suite”, so a smaller suite is not visually dominated by a larger one. Gray dashed diagonals mark constant-footprint contours Fcell ∈ {102 , 103 , 104 , 105 , 106}. The black… view at source ↗

**Figure 6.** Figure 6: Per-suite cell-footprint distributions of TRL-Bench table inputs. Each panel histograms the footprint values Fcell(T) = nrow(T) ncol(T) for the counted loadable table inputs in one suite, using 50 common logarithmically spaced bins over 100–109 cells. The black dashed line marks the median footprint within that suite, and the gray dotted line marks the corresponding 95th percentile. Each panel’s y-axis rep… view at source ↗

**Figure 7.** Figure 7: Granularity-dependent transfer profiles. Radar plots summarize family-level performance for representative models on the two atomic suites. Panel (a) compares column/table encoders across the four TRL-CTBENCH capability families: Schema, Join, Union, and Grounding. Panel (b) compares row encoders across Classification, Regression, Clean Linkage, and Robust Linkage in TRL-RBENCH. BERT, GTE, and TABBIE appe… view at source ↗

**Figure 8.** Figure 8: Per-target pairwise comparison of TABICL against the strongest comparator. (a) Classification AUROC: TABICL vs. BERT across 77 targets. (b) Regression nRMSE: TABICL vs. DAE across 46 targets. Points above the diagonal in (a) and below it in (b) indicate TABICL wins. Each color represents a different source dataset. two models, so the Kruskal–Wallis χ 2 approximation is borderline at the α = 0.05 level for… view at source ↗

**Figure 9.** Figure 9: Rank–rank density heatmaps for row tasks. For every task, models are ranked by the best-correlated intrinsic-geometry diagnostic (x-axis, 1 = highest value) and by direction-corrected downstream performance (y-axis, 1 = best). Cell numbers count (task, model) pairs per rank bin. Diagonal concentration indicates positive rank agreement. Anti-diagonal concentration indicates negative agreement [PITH_FULL_IM… view at source ↗

**Figure 10.** Figure 10: Voxel visualisation of the DLTE Stage-3 pipeline space over UJ-H. Axes: Stage 1 (table model, 10) × Stage 2 (column model, 8) × Stage 3 (row model, 14). Colour encodes UJ-H (light blue → deep purple). Axes are reordered by marginal-mean UJ-H so the best-performing corner is contiguous. See the full per-pipeline breakdown in [PITH_FULL_IMAGE:figures/full_fig_p058_10.png] view at source ↗

**Figure 11.** Figure 11: DLTE category-level UJ-H heatmap (5-round average, test set). Panels correspond to Stage 3 (row model) families. Rows = Stage 1 (table model), columns = Stage 2 (column model). The column-driven gradient confirms Stage 2’s dominant effect. Near-identical panels show Stage 3 differences are largely masked end to end. main-effect score mT (t) + mC (c) + mR(r)−2¯y is maximized by the per-stage rank-1 assembl… view at source ↗

**Figure 12.** Figure 12: Embedding generation cost vs. table size (log-log scale). Training-based models scale super-linearly with rows. on the widest anchor table (1 775 columns). These limits are important for practitioners selecting models for large-scale deployment. Methodology. TABICL and TABPFN are timed only on EFF-REAL anchor tables. Their context-fit step is most representative of production usage when run on real labele… view at source ↗

**Figure 13.** Figure 13: CTBench critical-difference diagram (Demšar style). The horizontal axis shows mean [PITH_FULL_IMAGE:figures/full_fig_p080_13.png] view at source ↗

read the original abstract

Tabular encoders are usually evaluated inside task-specific end-to-end pipelines, so models from different training paradigms are difficult to compare directly even when they operate on similar tabular signals. We introduce TRL-Bench, a multi-granular tabular representation learning (TRL) benchmark that standardizes cross-paradigm representation-level evaluation: each encoder exports row-, column-, or table embeddings through its supported wrapper, and shared lightweight heads probe them across three suites: TRL-CTbench (column/table), TRL-Rbench (row), and TRL-DLTE (compositional Data-Lake Table Enrichment spanning all three granularities). To support this standardized setting, we release curated benchmark assets and task reformulations, including 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47,772-table DLTE lake derived from 1,379 parent tables. Across 20 models and 16 tasks, TRL-Bench shows that once downstream conditions are standardized, encoder quality is capability-specific rather than captured by a single leaderboard. In TRL-CTbench, generic text encoders often lead on tasks with strong surface-text signal, while tabular specialists win where their pretraining objective aligns with the task. In TRL-Rbench, within-table prediction and cross-table linkage favor different training regimes, with atomic linkage performance correlating strongly with the row-matching stage of DLTE pipelines. In TRL-DLTE, the strongest pipelines combine capability-matched specialists rather than reuse a single encoder, and top end-to-end quality depends on non-additive compositional fit rather than per-stage marginal rank alone. TRL-Bench provides a common protocol for measuring reusable signal in exported tabular representations under shared downstream conditions. Code and data: https://github.com/LOGO-CUHKSZ/TRL-Bench

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRL-Bench ships concrete dataset releases and a cross-paradigm protocol that could make tabular encoder comparisons more consistent, but the capability-specific claim rests on untested neutrality of the wrappers and heads.

read the letter

This paper introduces TRL-Bench, a benchmark that standardizes representation-level evaluation of tabular encoders by exporting embeddings through wrappers and probing them with shared lightweight heads across three suites: column/table tasks, row tasks, and compositional data-lake enrichment.

It releases usable assets: 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47k-table lake from 1,379 parents. Experiments on 20 models and 16 tasks indicate that once downstream conditions are fixed, no single encoder leads across the board. Text encoders handle surface-text signals, specialists align with their pretraining on matching tasks, and DLTE pipelines require non-additive combinations rather than top-ranked stages alone.

The protocol and releases address a practical gap where end-to-end pipelines prevent direct comparison. The observation that atomic linkage correlates with DLTE row-matching stages is a clear takeaway.

The soft spot is the lack of ablation on the evaluation choices themselves. The claim that quality is capability-specific assumes the wrappers, reformulations, and heads do not embed paradigm biases, yet no swaps of MLP versus linear heads or further standardization of interfaces are reported. Without those checks, the pattern could shift under different but still standardized conditions. The abstract also omits statistical details and exclusion rules, so the evidence strength is difficult to judge fully.

This is for tabular representation learning researchers who want reusable assets and a common protocol. The released data and setup give it enough grounding to deserve peer review, even if the central finding would need more validation on neutrality.

Referee Report

1 major / 0 minor

Summary. The paper introduces TRL-Bench, a multi-granular benchmark for standardizing cross-paradigm representation-level evaluation of tabular encoders. Encoders export row/column/table embeddings via supported wrappers; shared lightweight heads then probe them on three suites (TRL-CTbench for column/table tasks, TRL-Rbench for row tasks, TRL-DLTE for compositional Data-Lake Table Enrichment). The benchmark releases 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47,772-table DLTE lake. Experiments across 20 models and 16 tasks support the claim that, once downstream conditions are standardized, encoder quality is capability-specific rather than captured by any single leaderboard (text encoders lead on surface-text signals, specialists win on aligned objectives, and DLTE pipelines require non-additive compositional fit).

Significance. If the central claim holds, the work is significant for supplying a reusable protocol and assets that enable direct comparison of tabular encoders from different paradigms at the representation level. It supplies concrete evidence against single-leaderboard rankings and illustrates that top end-to-end quality arises from capability-matched combinations rather than marginal per-stage ranks. The public release of curated tables, reformulations, and code is a clear strength supporting reproducibility and follow-on work.

major comments (1)

[Abstract and evaluation protocol description] The central claim (abstract) that encoder quality is capability-specific once downstream conditions are standardized rests on the assumption that the shared evaluation protocol itself is neutral across paradigms. The described setup uses encoder-specific wrappers, 123 verified targets, 16 row-pair linkage rewrites, and lightweight heads for the three suites, yet no ablation is reported that swaps head architectures (e.g., MLP vs. linear), alters reformulation templates, or further standardizes wrapper interfaces. Without such checks it remains possible that the observed patterns (text encoders on surface-text tasks, non-additive DLTE pipelines) are partly artifacts of the particular protocol choices rather than intrinsic capability differences.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the evaluation protocol. We address the major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and evaluation protocol description] The central claim (abstract) that encoder quality is capability-specific once downstream conditions are standardized rests on the assumption that the shared evaluation protocol itself is neutral across paradigms. The described setup uses encoder-specific wrappers, 123 verified targets, 16 row-pair linkage rewrites, and lightweight heads for the three suites, yet no ablation is reported that swaps head architectures (e.g., MLP vs. linear), alters reformulation templates, or further standardizes wrapper interfaces. Without such checks it remains possible that the observed patterns (text encoders on surface-text tasks, non-additive DLTE pipelines) are partly artifacts of the particular protocol choices rather than intrinsic capability differences.

Authors: We agree that additional ablations would provide stronger evidence for the neutrality of the protocol. The wrappers must be encoder-specific to handle the heterogeneous output formats and embedding spaces of models from different paradigms, but the probing heads are deliberately shared and lightweight (primarily linear or small MLP) to minimize downstream bias. The 123 targets and 16 rewrites were manually verified for consistency. In the revised manuscript we will add (i) a head-architecture ablation (linear vs. two-layer MLP) on a representative subset of tasks from each suite and (ii) a sensitivity analysis to the reformulation templates used in TRL-Rbench and TRL-DLTE. We will also expand the methods section to document the exact interface standardization steps already taken. These additions will directly address the concern that the reported patterns could be protocol artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark comparisons on released assets

full rationale

The paper introduces TRL-Bench and reports direct empirical results across 20 models and 16 tasks using standardized wrappers, heads, and reformulations on curated assets (50 OpenML tables, 123 targets, 16 rewrites, 47k-table lake). The central claim—that encoder quality is capability-specific once conditions are standardized—is an observation from these runs, not a quantity derived from equations, fitted parameters renamed as predictions, or self-citation chains. No load-bearing step reduces to its own inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard machine-learning evaluation practices (lightweight probe heads, curated public datasets) and the domain assumption that representation quality can be isolated from full pipeline effects.

axioms (1)

domain assumption Lightweight heads applied to exported embeddings provide a fair probe of encoder quality across paradigms.
This assumption underpins the standardization claim in the abstract.

pith-pipeline@v0.9.1-grok · 5914 in / 1271 out tokens · 23115 ms · 2026-06-27T16:46:45.536608+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

93 extracted references · 2 canonical work pages

[1]

α-ReQ: Assessing representation quality in self-supervised learning by measuring eigenspectrum decay

Kumar Krishna Agrawal, Arnab Kumar Mondal, Arna Ghosh, and Blake Richards. α-ReQ: Assessing representation quality in self-supervised learning by measuring eigenspectrum decay. InNeurIPS, 2022

2022
[2]

Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016

Pith/arXiv arXiv 2016
[3]

TabularS3L: A PyTorch Lightning-based library for self- and semi-supervised learning on tabular data.https://github.com/Alcoholrithm/TabularS3L, 2024

Alcoholrithm. TabularS3L: A PyTorch Lightning-based library for self- and semi-supervised learning on tabular data.https://github.com/Alcoholrithm/TabularS3L, 2024

2024
[4]

Macke, and Davide Zoccolan

Alessio Ansuini, Alessandro Laio, Jakob H. Macke, and Davide Zoccolan. Intrinsic dimension of data representations in deep neural networks. InNeurIPS, 2019

2019
[5]

Transformers for tabular data representa- tion: A survey of models and applications.Transactions of the Association for Computational Linguistics, 11:227–249, 2023

Gilbert Badaro, Mohammed Saeed, and Paolo Papotti. Transformers for tabular data representa- tion: A survey of models and applications.Transactions of the Association for Computational Linguistics, 11:227–249, 2023

2023
[6]

SCARF: Self-supervised contrastive learning using random feature corruption

Dara Bahri, Heinrich Jiang, Yi Tay, and Donald Metzler. SCARF: Self-supervised contrastive learning using random feature corruption. InICLR, 2022

2022
[7]

Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013

2013
[8]

van Rijn, and Joaquin Vanschoren

Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Pieter Gijsbers, Frank Hutter, Michel Lang, Rafael Gomes Mantovani, Jan N. van Rijn, and Joaquin Vanschoren. OpenML bench- marking suites. InNeurIPS Datasets and Benchmarks Track, 2021

2021
[9]

Alex Bogatu, Alvaro A. A. Fernandes, Norman W. Paton, and Nikolaos Konstantinou. Dataset discovery in data lakes. InICDE, pages 709–720, 2020

2020
[10]

Language models are realistic tabular data generators

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. Language models are realistic tabular data generators. InICLR, 2023

2023
[11]

Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 35(6):7499–7519, 2024

Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 35(6):7499–7519, 2024

2024
[12]

ExcelFormer: A neural network surpassing GBDTs on tabular data.arXiv preprint arXiv:2301.02819, 2023

Jintai Chen, Jiahuan Yan, Qiyuan Chen, Danny Ziyi Chen, Jian Wu, and Jimeng Sun. ExcelFormer: A neural network surpassing GBDTs on tabular data.arXiv preprint arXiv:2301.02819, 2023

arXiv 2023
[13]

TabFact: A large-scale dataset for table-based fact verification

Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. TabFact: A large-scale dataset for table-based fact verification. InICLR, 2020

2020
[14]

Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 2018

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 2018

2018
[15]

Tianji Cong, Madelon Hulsebos, Zhenjie Sun, Paul Groth, and H. V . Jagadish. Observatory: Characterizing embeddings of relational tables.Proceedings of the VLDB Endowment, 17(4): 849–862, 2023. 10

2023
[16]

Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7:1–30, 2006

Janez Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7:1–30, 2006

2006
[17]

TURL: Table understanding through representation learning.Proceedings of the VLDB Endowment, 14(3):307–319, 2020

Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. TURL: Table understanding through representation learning.Proceedings of the VLDB Endowment, 14(3):307–319, 2020

2020
[18]

LakeBench: A benchmark for discovering joinable and unionable tables in data lakes.Proceedings of the VLDB Endowment, 17(8):1925–1938, 2024

Yuhao Deng, Chengliang Chai, Lei Cao, Qin Yuan, Siyuan Chen, Yanrui Yu, Zhaoze Sun, Junyi Wang, Jiajun Li, Ziqi Cao, Kaisen Jin, Chi Zhang, Yuqing Jiang, Yuanfang Zhang, Yuping Wang, Ye Yuan, Guoren Wang, and Nan Tang. LakeBench: A benchmark for discovering joinable and unionable tables in data lakes.Proceedings of the VLDB Endowment, 17(8):1925–1938, 202...

work page doi:10.14778/3659437.3659448 1925
[19]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InNAACL, 2019

2019
[20]

AutoGluon-Tabular: Robust and accurate AutoML for structured data.arXiv preprint arXiv:2003.06505, 2020

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. AutoGluon-Tabular: Robust and accurate AutoML for structured data.arXiv preprint arXiv:2003.06505, 2020

Pith/arXiv arXiv 2003
[21]

TabArena: A living benchmark for machine learning on tabular data

Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, and Frank Hutter. TabArena: A living benchmark for machine learning on tabular data. InNeurIPS Datasets and Benchmarks Track, 2025

2025
[22]

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7(1):12140, 2017

Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7(1):12140, 2017

2017
[23]

Grace Fan, Jin Wang, Yuliang Li, and Renée J. Miller. Table discovery in data lakes: State-of- the-art and future directions. InSIGMOD Companion, 2023

2023
[24]

Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, and Renée J. Miller. Semantics-aware dataset dis- covery from data lakes with contextualized column-based representation learning.Proceedings of the VLDB Endowment, 16(7):1726–1739, 2023

2023
[25]

OpenML-CTR23: A curated tabular regression benchmarking suite

Sebastian Fischer, Liana Harutyunyan, Matthias Feurer, and Bernd Bischl. OpenML-CTR23: A curated tabular regression benchmarking suite. InAutoML Conference, 2023

2023
[26]

RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

Quentin Garrido, Randall Balestriero, Laurent Najman, and Yann LeCun. RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank. In ICML, 2023

2023
[27]

Revisiting deep learning models for tabular data

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. Revisiting deep learning models for tabular data. InNeurIPS, 2021

2021
[28]

TabR: Tabular deep learning meets nearest neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. TabR: Tabular deep learning meets nearest neighbors. InICLR, 2024

2024
[29]

TaPas: Weakly supervised table parsing via pre-training

Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Mar- tin Eisenschlos. TaPas: Weakly supervised table parsing via pre-training. InACL, 2020

2020
[30]

Open domain question answering over tables via dense retrieval

Jonathan Herzig, Thomas Müller, Syrine Krichene, and Julian Eisenschlos. Open domain question answering over tables via dense retrieval. InNAACL, 2021

2021
[31]

TabPFN: A transformer That solves small tabular classification problems in a second

Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer That solves small tabular classification problems in a second. InICLR, 2023

2023
[32]

Accurate predictions on small data with a tabular foundation model.Nature, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 2025

2025
[33]

TabTransformer: Tabular data modeling using contextual embeddings.arXiv preprint arXiv:2012.06678, 2020

Xin Huang, Ashish Khetan, Milan Cvitkovic, and Zohar Karnin. TabTransformer: Tabular data modeling using contextual embeddings.arXiv preprint arXiv:2012.06678, 2020

Pith/arXiv arXiv 2012
[34]

TABBIE: Pretrained representa- tions of tabular data

Hiroshi Iida, Dung Thai, Varun Manjunatha, and Mohit Iyyer. TABBIE: Pretrained representa- tions of tabular data. InNAACL, 2021. 11

2021
[35]

OmniTab: Pretraining with natural and synthetic data for few-shot table-based question answering

Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, and Weizhu Chen. OmniTab: Pretraining with natural and synthetic data for few-shot table-based question answering. In NAACL, 2022

2022
[36]

SemTab 2019: Resources to benchmark tabular data to knowledge graph matching systems

Ernesto Jiménez-Ruiz, Oktie Hassanzadeh, Vasilis Efthymiou, Jiaoyan Chen, and Kavitha Srinivas. SemTab 2019: Resources to benchmark tabular data to knowledge graph matching systems. InESWC, pages 514–530, 2020. doi: 10.1007/978-3-030-49461-2_30

work page doi:10.1007/978-3-030-49461-2_30 2019
[37]

Billion-scale similarity search with GPUs

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2021

2021
[38]

PATE-GAN: Generating synthetic data with differential privacy guarantees

James Jordon, Jinsung Yoon, and Mihaela van der Schaar. PATE-GAN: Generating synthetic data with differential privacy guarantees. InICLR, 2019

2019
[39]

Miller, and Mirek Riedewald

Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, and Mirek Riedewald. SANTOS: Relationship-based semantic table union search. In SIGMOD, 2023

2023
[40]

TabSketchFM: Sketch-based tabular representation learning for data discovery over data lakes

Aamod Khatiwada, Harsha Kokel, Ibrahim Abdelaziz, Subhajit Chaudhury, Julian Dolby, Oktie Hassanzadeh, Zhenhan Huang, Tejaswini Pedapati, Horst Samulowitz, and Kavitha Srinivas. TabSketchFM: Sketch-based tabular representation learning for data discovery over data lakes. InICDE, 2025

2025
[41]

CARTE: Pretraining and transfer for tabular learning

Myung Jun Kim, Léo Grinsztajn, and Gaël Varoquaux. CARTE: Pretraining and transfer for tabular learning. InICML, 2024

2024
[42]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

2015
[43]

SOTAB: The WDC Schema.org table annotation benchmark

Keti Korini, Ralph Peeters, and Christian Bizer. SOTAB: The WDC Schema.org table annotation benchmark. InSemTab @ ISWC, 2022

2022
[44]

TabDDPM: Mod- elling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. TabDDPM: Mod- elling tabular data with diffusion models. InICML, 2023

2023
[45]

Valentine: Evaluating matching techniques for dataset discovery

Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsifodimos. Valentine: Evaluating matching techniques for dataset discovery. InICDE, 2021

2021
[46]

Harold W. Kuhn. The Hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2(1–2):83–97, 1955

1955
[47]

Word translation without parallel data

Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. Word translation without parallel data. InICLR, 2018

2018
[48]

Binning as a pretext task: Improving self-supervised learning in tabular domains

Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, and Woohyung Lim. Binning as a pretext task: Improving self-supervised learning in tabular domains. InICML, 2024

2024
[49]

Deep entity matching with pre-trained language models.Proceedings of the VLDB Endowment, 14(1): 50–60, 2020

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. Deep entity matching with pre-trained language models.Proceedings of the VLDB Endowment, 14(1): 50–60, 2020

2020
[50]

Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023

Pith/arXiv arXiv 2023
[51]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InICDM, 2008

2008
[52]

TAPEX: Table pre-training via learning a neural SQL executor

Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian-Guang Lou. TAPEX: Table pre-training via learning a neural SQL executor. InICLR, 2022

2022
[53]

Deep learning for entity matching: A design space exploration

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. Deep learning for entity matching: A design space exploration. InSIGMOD, 2018. 12

2018
[54]

Pu, and Renée J

Fatemeh Nargesian, Erkang Zhu, Ken Q. Pu, and Renée J. Miller. Table union search on open data.Proceedings of the VLDB Endowment, 11(7):813–825, 2018

2018
[55]

Text and code embeddings by contrastive pre-training

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, P...

Pith/arXiv arXiv 2022
[56]

Hall, Daniel Cer, and Yinfei Yang

Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of ACL, 2022

2022
[57]

New embedding models and API updates

OpenAI. New embedding models and API updates. https://openai.com/index/ new-embedding-models-and-api-updates/, 2024. Released January 25, 2024

2024
[58]

Koyena Pal, Aamod Khatiwada, Roee Shraga, and Renée J. Miller. Generative benchmark creation for table union search.arXiv preprint arXiv:2308.03883, 2023

arXiv 2023
[59]

ClavaDDPM: Multi-relational data synthesis with cluster-guided diffusion models

Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, and Xi He. ClavaDDPM: Multi-relational data synthesis with cluster-guided diffusion models. InNeurIPS, 2024

2024
[60]

Compositional semantic parsing on semi-structured tables

Panupong Pasupat and Percy Liang. Compositional semantic parsing on semi-structured tables. InACL, 2015

2015
[61]

The synthetic data vault

Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The synthetic data vault. InDSAA, 2016

2016
[62]

Using schema.org annotations for training and maintaining product matchers

Ralph Peeters, Anna Primpeli, Benedikt Wichtlhuber, and Christian Bizer. Using schema.org annotations for training and maintaining product matchers. InWIMS, 2020

2020
[63]

The WDC training dataset and gold standard for large-scale product matching

Anna Primpeli, Ralph Peeters, and Christian Bizer. The WDC training dataset and gold standard for large-scale product matching. InCompanion of The 2019 World Wide Web Conference (WWW ’19 Companion), ECNLP Workshop, 2019

2019
[64]

TabICL: A tabular foundation model for in-context learning on large data

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data. InICML, 2025

2025
[65]

Tabular data: Deep learning is not all you need

Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022

2022
[66]

Bayan Bruss, and Tom Goldstein

Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, and Tom Goldstein. SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342, 2021

arXiv 2021
[67]

MPNet: Masked and permuted pre-training for language understanding

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. MPNet: Masked and permuted pre-training for language understanding. InNeurIPS, 2020

2020
[68]

LakeBench: Benchmarks for data discovery over data lakes.arXiv preprint arXiv:2307.04217, 2023

Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, and Horst Samulowitz. LakeBench: Benchmarks for data discovery over data lakes.arXiv preprint arXiv:2307.04217, 2023

arXiv 2023
[69]

TableGPT2: A large multimodal model with tabular data integration.arXiv preprint arXiv:2411.02059, 2024

Aofeng Su, Aowen Wang, Chao Ye, et al. TableGPT2: A large multimodal model with tabular data integration.arXiv preprint arXiv:2411.02059, 2024

arXiv 2024
[70]

Annotating columns with pre-trained language models

Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Ça ˘gatay Demiralp, Chen Chen, and Wang-Chiew Tan. Annotating columns with pre-trained language models. InSIGMOD, 2022

2022
[71]

Unsupervised embedding quality evaluation.arXiv preprint arXiv:2305.16562, 2023

Anton Tsitsulin, Marina Munkhoeva, and Bryan Perozzi. Unsupervised embedding quality evaluation.arXiv preprint arXiv:2305.16562, 2023

arXiv 2023
[72]

SubTab: Subsetting features of tabular data for self-supervised representation learning

Talip Ucar, Ehsan Hajiramezanali, and Lindsay Edwards. SubTab: Subsetting features of tabular data for self-supervised representation learning. InNeurIPS, 2021. 13

2021
[73]

van Rijn, Bernd Bischl, and Luis Torgo

Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: Networked science in machine learning.ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2014

2014
[74]

Extracting and composing robust features with denoising autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InICML, 2008

2008
[75]

TUTA: Tree-based transformers for generally structured table pre-training

Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, and Dongmei Zhang. TUTA: Tree-based transformers for generally structured table pre-training. InKDD, 2021

2021
[76]

TransTab: Learning transferable tabular transformers across tables

Zifeng Wang and Jimeng Sun. TransTab: Learning transferable tabular transformers across tables. InNeurIPS, 2022

2022
[77]

Modeling tabular data using conditional GAN

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional GAN. InNeurIPS, 2019

2019
[78]

TaBERT: Pretraining for joint understanding of textual and tabular data

Pengcheng Yin, Graham Neubig, Wen tau Yih, and Sebastian Riedel. TaBERT: Pretraining for joint understanding of textual and tabular data. InACL, 2020

2020
[79]

GAIN: Missing data imputation using generative adversarial nets

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. GAIN: Missing data imputation using generative adversarial nets. InICML, 2018

2018
[80]

VIME: Extending the success of self- and semi-supervised learning to tabular domain

Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. VIME: Extending the success of self- and semi-supervised learning to tabular domain. InNeurIPS, 2020

2020

Showing first 80 references.

[1] [1]

α-ReQ: Assessing representation quality in self-supervised learning by measuring eigenspectrum decay

Kumar Krishna Agrawal, Arnab Kumar Mondal, Arna Ghosh, and Blake Richards. α-ReQ: Assessing representation quality in self-supervised learning by measuring eigenspectrum decay. InNeurIPS, 2022

2022

[2] [2]

Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016

Pith/arXiv arXiv 2016

[3] [3]

TabularS3L: A PyTorch Lightning-based library for self- and semi-supervised learning on tabular data.https://github.com/Alcoholrithm/TabularS3L, 2024

Alcoholrithm. TabularS3L: A PyTorch Lightning-based library for self- and semi-supervised learning on tabular data.https://github.com/Alcoholrithm/TabularS3L, 2024

2024

[4] [4]

Macke, and Davide Zoccolan

Alessio Ansuini, Alessandro Laio, Jakob H. Macke, and Davide Zoccolan. Intrinsic dimension of data representations in deep neural networks. InNeurIPS, 2019

2019

[5] [5]

Transformers for tabular data representa- tion: A survey of models and applications.Transactions of the Association for Computational Linguistics, 11:227–249, 2023

Gilbert Badaro, Mohammed Saeed, and Paolo Papotti. Transformers for tabular data representa- tion: A survey of models and applications.Transactions of the Association for Computational Linguistics, 11:227–249, 2023

2023

[6] [6]

SCARF: Self-supervised contrastive learning using random feature corruption

Dara Bahri, Heinrich Jiang, Yi Tay, and Donald Metzler. SCARF: Self-supervised contrastive learning using random feature corruption. InICLR, 2022

2022

[7] [7]

Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013

2013

[8] [8]

van Rijn, and Joaquin Vanschoren

Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Pieter Gijsbers, Frank Hutter, Michel Lang, Rafael Gomes Mantovani, Jan N. van Rijn, and Joaquin Vanschoren. OpenML bench- marking suites. InNeurIPS Datasets and Benchmarks Track, 2021

2021

[9] [9]

Alex Bogatu, Alvaro A. A. Fernandes, Norman W. Paton, and Nikolaos Konstantinou. Dataset discovery in data lakes. InICDE, pages 709–720, 2020

2020

[10] [10]

Language models are realistic tabular data generators

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. Language models are realistic tabular data generators. InICLR, 2023

2023

[11] [11]

Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 35(6):7499–7519, 2024

Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 35(6):7499–7519, 2024

2024

[12] [12]

ExcelFormer: A neural network surpassing GBDTs on tabular data.arXiv preprint arXiv:2301.02819, 2023

Jintai Chen, Jiahuan Yan, Qiyuan Chen, Danny Ziyi Chen, Jian Wu, and Jimeng Sun. ExcelFormer: A neural network surpassing GBDTs on tabular data.arXiv preprint arXiv:2301.02819, 2023

arXiv 2023

[13] [13]

TabFact: A large-scale dataset for table-based fact verification

Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. TabFact: A large-scale dataset for table-based fact verification. InICLR, 2020

2020

[14] [14]

Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 2018

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 2018

2018

[15] [15]

Tianji Cong, Madelon Hulsebos, Zhenjie Sun, Paul Groth, and H. V . Jagadish. Observatory: Characterizing embeddings of relational tables.Proceedings of the VLDB Endowment, 17(4): 849–862, 2023. 10

2023

[16] [16]

Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7:1–30, 2006

Janez Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7:1–30, 2006

2006

[17] [17]

TURL: Table understanding through representation learning.Proceedings of the VLDB Endowment, 14(3):307–319, 2020

Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. TURL: Table understanding through representation learning.Proceedings of the VLDB Endowment, 14(3):307–319, 2020

2020

[18] [18]

LakeBench: A benchmark for discovering joinable and unionable tables in data lakes.Proceedings of the VLDB Endowment, 17(8):1925–1938, 2024

Yuhao Deng, Chengliang Chai, Lei Cao, Qin Yuan, Siyuan Chen, Yanrui Yu, Zhaoze Sun, Junyi Wang, Jiajun Li, Ziqi Cao, Kaisen Jin, Chi Zhang, Yuqing Jiang, Yuanfang Zhang, Yuping Wang, Ye Yuan, Guoren Wang, and Nan Tang. LakeBench: A benchmark for discovering joinable and unionable tables in data lakes.Proceedings of the VLDB Endowment, 17(8):1925–1938, 202...

work page doi:10.14778/3659437.3659448 1925

[19] [19]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. InNAACL, 2019

2019

[20] [20]

AutoGluon-Tabular: Robust and accurate AutoML for structured data.arXiv preprint arXiv:2003.06505, 2020

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. AutoGluon-Tabular: Robust and accurate AutoML for structured data.arXiv preprint arXiv:2003.06505, 2020

Pith/arXiv arXiv 2003

[21] [21]

TabArena: A living benchmark for machine learning on tabular data

Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, and Frank Hutter. TabArena: A living benchmark for machine learning on tabular data. InNeurIPS Datasets and Benchmarks Track, 2025

2025

[22] [22]

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7(1):12140, 2017

Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7(1):12140, 2017

2017

[23] [23]

Grace Fan, Jin Wang, Yuliang Li, and Renée J. Miller. Table discovery in data lakes: State-of- the-art and future directions. InSIGMOD Companion, 2023

2023

[24] [24]

Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, and Renée J. Miller. Semantics-aware dataset dis- covery from data lakes with contextualized column-based representation learning.Proceedings of the VLDB Endowment, 16(7):1726–1739, 2023

2023

[25] [25]

OpenML-CTR23: A curated tabular regression benchmarking suite

Sebastian Fischer, Liana Harutyunyan, Matthias Feurer, and Bernd Bischl. OpenML-CTR23: A curated tabular regression benchmarking suite. InAutoML Conference, 2023

2023

[26] [26]

RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

Quentin Garrido, Randall Balestriero, Laurent Najman, and Yann LeCun. RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank. In ICML, 2023

2023

[27] [27]

Revisiting deep learning models for tabular data

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. Revisiting deep learning models for tabular data. InNeurIPS, 2021

2021

[28] [28]

TabR: Tabular deep learning meets nearest neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. TabR: Tabular deep learning meets nearest neighbors. InICLR, 2024

2024

[29] [29]

TaPas: Weakly supervised table parsing via pre-training

Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Mar- tin Eisenschlos. TaPas: Weakly supervised table parsing via pre-training. InACL, 2020

2020

[30] [30]

Open domain question answering over tables via dense retrieval

Jonathan Herzig, Thomas Müller, Syrine Krichene, and Julian Eisenschlos. Open domain question answering over tables via dense retrieval. InNAACL, 2021

2021

[31] [31]

TabPFN: A transformer That solves small tabular classification problems in a second

Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer That solves small tabular classification problems in a second. InICLR, 2023

2023

[32] [32]

Accurate predictions on small data with a tabular foundation model.Nature, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 2025

2025

[33] [33]

TabTransformer: Tabular data modeling using contextual embeddings.arXiv preprint arXiv:2012.06678, 2020

Xin Huang, Ashish Khetan, Milan Cvitkovic, and Zohar Karnin. TabTransformer: Tabular data modeling using contextual embeddings.arXiv preprint arXiv:2012.06678, 2020

Pith/arXiv arXiv 2012

[34] [34]

TABBIE: Pretrained representa- tions of tabular data

Hiroshi Iida, Dung Thai, Varun Manjunatha, and Mohit Iyyer. TABBIE: Pretrained representa- tions of tabular data. InNAACL, 2021. 11

2021

[35] [35]

OmniTab: Pretraining with natural and synthetic data for few-shot table-based question answering

Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, and Weizhu Chen. OmniTab: Pretraining with natural and synthetic data for few-shot table-based question answering. In NAACL, 2022

2022

[36] [36]

SemTab 2019: Resources to benchmark tabular data to knowledge graph matching systems

Ernesto Jiménez-Ruiz, Oktie Hassanzadeh, Vasilis Efthymiou, Jiaoyan Chen, and Kavitha Srinivas. SemTab 2019: Resources to benchmark tabular data to knowledge graph matching systems. InESWC, pages 514–530, 2020. doi: 10.1007/978-3-030-49461-2_30

work page doi:10.1007/978-3-030-49461-2_30 2019

[37] [37]

Billion-scale similarity search with GPUs

Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2021

2021

[38] [38]

PATE-GAN: Generating synthetic data with differential privacy guarantees

James Jordon, Jinsung Yoon, and Mihaela van der Schaar. PATE-GAN: Generating synthetic data with differential privacy guarantees. InICLR, 2019

2019

[39] [39]

Miller, and Mirek Riedewald

Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, and Mirek Riedewald. SANTOS: Relationship-based semantic table union search. In SIGMOD, 2023

2023

[40] [40]

TabSketchFM: Sketch-based tabular representation learning for data discovery over data lakes

Aamod Khatiwada, Harsha Kokel, Ibrahim Abdelaziz, Subhajit Chaudhury, Julian Dolby, Oktie Hassanzadeh, Zhenhan Huang, Tejaswini Pedapati, Horst Samulowitz, and Kavitha Srinivas. TabSketchFM: Sketch-based tabular representation learning for data discovery over data lakes. InICDE, 2025

2025

[41] [41]

CARTE: Pretraining and transfer for tabular learning

Myung Jun Kim, Léo Grinsztajn, and Gaël Varoquaux. CARTE: Pretraining and transfer for tabular learning. InICML, 2024

2024

[42] [42]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

2015

[43] [43]

SOTAB: The WDC Schema.org table annotation benchmark

Keti Korini, Ralph Peeters, and Christian Bizer. SOTAB: The WDC Schema.org table annotation benchmark. InSemTab @ ISWC, 2022

2022

[44] [44]

TabDDPM: Mod- elling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. TabDDPM: Mod- elling tabular data with diffusion models. InICML, 2023

2023

[45] [45]

Valentine: Evaluating matching techniques for dataset discovery

Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsifodimos. Valentine: Evaluating matching techniques for dataset discovery. InICDE, 2021

2021

[46] [46]

Harold W. Kuhn. The Hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2(1–2):83–97, 1955

1955

[47] [47]

Word translation without parallel data

Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. Word translation without parallel data. InICLR, 2018

2018

[48] [48]

Binning as a pretext task: Improving self-supervised learning in tabular domains

Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, and Woohyung Lim. Binning as a pretext task: Improving self-supervised learning in tabular domains. InICML, 2024

2024

[49] [49]

Deep entity matching with pre-trained language models.Proceedings of the VLDB Endowment, 14(1): 50–60, 2020

Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. Deep entity matching with pre-trained language models.Proceedings of the VLDB Endowment, 14(1): 50–60, 2020

2020

[50] [50]

Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023

Pith/arXiv arXiv 2023

[51] [51]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InICDM, 2008

2008

[52] [52]

TAPEX: Table pre-training via learning a neural SQL executor

Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian-Guang Lou. TAPEX: Table pre-training via learning a neural SQL executor. InICLR, 2022

2022

[53] [53]

Deep learning for entity matching: A design space exploration

Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. Deep learning for entity matching: A design space exploration. InSIGMOD, 2018. 12

2018

[54] [54]

Pu, and Renée J

Fatemeh Nargesian, Erkang Zhu, Ken Q. Pu, and Renée J. Miller. Table union search on open data.Proceedings of the VLDB Endowment, 11(7):813–825, 2018

2018

[55] [55]

Text and code embeddings by contrastive pre-training

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, P...

Pith/arXiv arXiv 2022

[56] [56]

Hall, Daniel Cer, and Yinfei Yang

Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of ACL, 2022

2022

[57] [57]

New embedding models and API updates

OpenAI. New embedding models and API updates. https://openai.com/index/ new-embedding-models-and-api-updates/, 2024. Released January 25, 2024

2024

[58] [58]

Koyena Pal, Aamod Khatiwada, Roee Shraga, and Renée J. Miller. Generative benchmark creation for table union search.arXiv preprint arXiv:2308.03883, 2023

arXiv 2023

[59] [59]

ClavaDDPM: Multi-relational data synthesis with cluster-guided diffusion models

Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, and Xi He. ClavaDDPM: Multi-relational data synthesis with cluster-guided diffusion models. InNeurIPS, 2024

2024

[60] [60]

Compositional semantic parsing on semi-structured tables

Panupong Pasupat and Percy Liang. Compositional semantic parsing on semi-structured tables. InACL, 2015

2015

[61] [61]

The synthetic data vault

Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. The synthetic data vault. InDSAA, 2016

2016

[62] [62]

Using schema.org annotations for training and maintaining product matchers

Ralph Peeters, Anna Primpeli, Benedikt Wichtlhuber, and Christian Bizer. Using schema.org annotations for training and maintaining product matchers. InWIMS, 2020

2020

[63] [63]

The WDC training dataset and gold standard for large-scale product matching

Anna Primpeli, Ralph Peeters, and Christian Bizer. The WDC training dataset and gold standard for large-scale product matching. InCompanion of The 2019 World Wide Web Conference (WWW ’19 Companion), ECNLP Workshop, 2019

2019

[64] [64]

TabICL: A tabular foundation model for in-context learning on large data

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data. InICML, 2025

2025

[65] [65]

Tabular data: Deep learning is not all you need

Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022

2022

[66] [66]

Bayan Bruss, and Tom Goldstein

Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, and Tom Goldstein. SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342, 2021

arXiv 2021

[67] [67]

MPNet: Masked and permuted pre-training for language understanding

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. MPNet: Masked and permuted pre-training for language understanding. InNeurIPS, 2020

2020

[68] [68]

LakeBench: Benchmarks for data discovery over data lakes.arXiv preprint arXiv:2307.04217, 2023

Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, and Horst Samulowitz. LakeBench: Benchmarks for data discovery over data lakes.arXiv preprint arXiv:2307.04217, 2023

arXiv 2023

[69] [69]

TableGPT2: A large multimodal model with tabular data integration.arXiv preprint arXiv:2411.02059, 2024

Aofeng Su, Aowen Wang, Chao Ye, et al. TableGPT2: A large multimodal model with tabular data integration.arXiv preprint arXiv:2411.02059, 2024

arXiv 2024

[70] [70]

Annotating columns with pre-trained language models

Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Ça ˘gatay Demiralp, Chen Chen, and Wang-Chiew Tan. Annotating columns with pre-trained language models. InSIGMOD, 2022

2022

[71] [71]

Unsupervised embedding quality evaluation.arXiv preprint arXiv:2305.16562, 2023

Anton Tsitsulin, Marina Munkhoeva, and Bryan Perozzi. Unsupervised embedding quality evaluation.arXiv preprint arXiv:2305.16562, 2023

arXiv 2023

[72] [72]

SubTab: Subsetting features of tabular data for self-supervised representation learning

Talip Ucar, Ehsan Hajiramezanali, and Lindsay Edwards. SubTab: Subsetting features of tabular data for self-supervised representation learning. InNeurIPS, 2021. 13

2021

[73] [73]

van Rijn, Bernd Bischl, and Luis Torgo

Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: Networked science in machine learning.ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2014

2014

[74] [74]

Extracting and composing robust features with denoising autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InICML, 2008

2008

[75] [75]

TUTA: Tree-based transformers for generally structured table pre-training

Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, and Dongmei Zhang. TUTA: Tree-based transformers for generally structured table pre-training. InKDD, 2021

2021

[76] [76]

TransTab: Learning transferable tabular transformers across tables

Zifeng Wang and Jimeng Sun. TransTab: Learning transferable tabular transformers across tables. InNeurIPS, 2022

2022

[77] [77]

Modeling tabular data using conditional GAN

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional GAN. InNeurIPS, 2019

2019

[78] [78]

TaBERT: Pretraining for joint understanding of textual and tabular data

Pengcheng Yin, Graham Neubig, Wen tau Yih, and Sebastian Riedel. TaBERT: Pretraining for joint understanding of textual and tabular data. InACL, 2020

2020

[79] [79]

GAIN: Missing data imputation using generative adversarial nets

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. GAIN: Missing data imputation using generative adversarial nets. InICML, 2018

2018

[80] [80]

VIME: Extending the success of self- and semi-supervised learning to tabular domain

Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. VIME: Extending the success of self- and semi-supervised learning to tabular domain. InNeurIPS, 2020

2020