FlexTab: A Flexible Encoder-Decoder Architecture for In-Context Learning Across Diverse Tabular Tasks

Johannes H\"ohne; Marco Spinaci; Marek Polewczyk; Maximilian Schambach; Sam Thelin

arxiv: 2606.30336 · v2 · pith:XTBRBLAFnew · submitted 2026-06-29 · 💻 cs.LG

FlexTab: A Flexible Encoder-Decoder Architecture for In-Context Learning Across Diverse Tabular Tasks

Marek Polewczyk , Maximilian Schambach , Marco Spinaci , Sam Thelin , Johannes H\"ohne This is my paper

Pith reviewed 2026-06-30 07:19 UTC · model grok-4.3

classification 💻 cs.LG

keywords tabular datain-context learningencoder-decodertarget-agnostic embeddingsclassificationregressionanomaly detectionentity matching

0 comments

The pith

A single task-agnostic encoder paired with task-specific decoders serves as an effective general-purpose backbone for diverse tabular prediction problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FlexTab to address the limitation of existing tabular in-context learners that entangle features with a specific target. It uses a shared encoder to produce target-agnostic row embeddings that work across six tasks including classification, regression, anomaly detection, clustering, entity matching, and entity classification. The encoder and decoders are trained on unlabeled tables. If the approach holds, it would allow a single backbone to handle many tabular problems efficiently through in-context learning.

Core claim

FlexTab shows that pairing a single task-agnostic encoder with task-specific decoders produces target-agnostic row embeddings that enable state-of-the-art performance on classification, regression, anomaly detection and entity matching while staying competitive on entity classification, proving the encoder-decoder design works as a general-purpose backbone for tabular tasks.

What carries the argument

The shared task-agnostic encoder that generates target-agnostic row embeddings, combined with a suite of task-specific decoders.

If this is right

The encoder can be reused across different tabular tasks without retraining from scratch.
Pretraining occurs once on unlabeled tables to support multiple prediction problems.
State-of-the-art results are achieved on four tasks and competitive performance on the remaining two.
This design avoids the need for task-specific feature engineering in the encoder.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a flexible architecture might scale to additional tabular tasks beyond the six tested.
It could lower the barrier for applying in-context learning in domains with limited labeled data per task.
Future work might test if the embeddings transfer to new table structures not seen in pretraining.

Load-bearing premise

The target-agnostic row embeddings produced by the encoder remain sufficiently informative and transferable across the six listed tasks without requiring task-specific feature engineering or additional supervision during pretraining.

What would settle it

A direct comparison showing that task-specific encoders outperform the shared encoder on multiple tasks would falsify the claim that the shared design is effective as a general-purpose backbone.

Figures

Figures reproduced from arXiv: 2606.30336 by Johannes H\"ohne, Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin.

**Figure 2.** Figure 2: Detailed schematic depiction of the encoder architecture. Note that cross-row attention [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Details of the used heads for all different investigated decoders, including the regression [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Column and row count distribution, as well as per-column data type rate of our pretraining [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Table overlap and false positive rates based on Armadillo embeddings and false-positive [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Relation between number of training dataset rows (top) and columns (bottom) and model [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Win ratio confusion matrix, Elo scores, and CD diagram of the main investigated models [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Win ratio confusion matrix, Elo scores, and CD diagram of the main investigated models [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Win ratio confusion matrix, Elo score, and CD diagram of the main investigated models [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Results for unsupervised anomaly detection (top) and one-class + semi-supervised novelty [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Results for semi-supervised anomaly detection. Left: Average AUROC scores per level of [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

**Figure 12.** Figure 12: Results for clustering. Left: box and whisker plots, sorted by average score. Right: critical [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Encoder context size: As shown in the top-left plot of [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗

**Figure 13.** Figure 13: FlexTab-Multi ablation across 12 RelBench tasks showing how scaling input data affects [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗

**Figure 14.** Figure 14: Runtime comparison of different models across varying numbers of training rows. (Left) [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗

read the original abstract

We introduce FlexTab, a flexible encoder-decoder architecture for in-context learning on tabular data that pairs a single, task-agnostic encoder with a suite of task-specific decoders. Unlike existing tabular in-context learners, which entangle feature representations with a specific prediction target, our design produces target-agnostic row embeddings that can be leveraged across a wide range of downstream tasks within a table-native in-context learning setup. We demonstrate this flexibility on six distinct problems: classification, regression, anomaly detection, clustering, entity matching, and entity classification in relational databases. Both the encoder and the task-specific decoders are trained on a large corpus of real-world, unlabeled tables. FlexTab achieves state-of-the-art performance on classification, regression, anomaly detection and entity matching, while remaining competitive with specialized models on entity classification in a relational setting. These results demonstrate that a single shared encoder, paired with task-specific decoders, can serve as an effective general-purpose backbone for diverse tabular prediction problems. The inference code and checkpoints will be made publicly available at https://github.com/SAP-samples/flextab.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FlexTab splits a shared encoder from task decoders to support six tabular tasks with target-agnostic embeddings, and the results look plausible but rest on claims that need the full experimental details to evaluate.

read the letter

FlexTab's core contribution is the use of a single task-agnostic encoder that generates row embeddings independent of the prediction target, paired with separate decoders for each of six tabular tasks. This setup is trained on a large set of unlabeled tables and then applied in an in-context manner.

What stands out is the explicit design for flexibility across tasks like classification and anomaly detection without retraining the encoder. The results indicate it reaches state-of-the-art on classification, regression, anomaly detection, and entity matching, while staying competitive on entity classification in relational databases.

The approach avoids entangling features with specific targets, which could reduce the need for task-specific models. Making the code and checkpoints available is a good step for others to build on.

On the downside, the provided abstract lacks specifics on the experimental setup, such as the exact datasets used, the baselines compared against, or any ablation studies on the encoder's role. This makes it difficult to fully evaluate how well the target-agnostic embeddings perform in practice or if they truly generalize without additional supervision.

The weakest point seems to be whether those embeddings remain informative enough across diverse tasks without task-specific adjustments during pretraining. If the paper includes strong evidence for that, the claims hold; otherwise, it might overstate the generality.

This work would interest researchers focused on tabular data and in-context learning who are looking for more unified architectures. It could be useful for practitioners wanting a single backbone instead of multiple specialized models.

Overall, the paper shows clear thinking on the architecture and has empirical support for its claims, so it should go to peer review for a closer examination of the methods and results.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces FlexTab, a flexible encoder-decoder architecture for in-context learning on tabular data. It pairs a single task-agnostic encoder, trained unsupervised on unlabeled tables to produce target-agnostic row embeddings, with a suite of task-specific decoders. The approach is evaluated on six tasks (classification, regression, anomaly detection, clustering, entity matching, and entity classification in relational databases) and claims state-of-the-art results on classification, regression, anomaly detection, and entity matching while remaining competitive on the relational entity classification task. The central claim is that this shared-encoder design serves as an effective general-purpose backbone for diverse tabular prediction problems. Inference code and checkpoints are promised to be released publicly.

Significance. If the empirical claims hold under detailed scrutiny, the work would be significant for tabular machine learning by demonstrating that target-agnostic embeddings from a shared encoder can transfer across heterogeneous tasks without task-specific pretraining or feature engineering. The explicit commitment to public release of code and checkpoints is a clear strength that supports reproducibility.

major comments (1)

Abstract: the manuscript reports state-of-the-art and competitive results across six tasks but supplies no experimental details, baselines, metrics, dataset descriptions, ablation studies, or evaluation protocol. This absence prevents verification of the central claim that the shared encoder produces sufficiently informative and transferable embeddings across tasks.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the opportunity to clarify our work. We address the single major comment below.

read point-by-point responses

Referee: Abstract: the manuscript reports state-of-the-art and competitive results across six tasks but supplies no experimental details, baselines, metrics, dataset descriptions, ablation studies, or evaluation protocol. This absence prevents verification of the central claim that the shared encoder produces sufficiently informative and transferable embeddings across tasks.

Authors: Abstracts are intentionally concise high-level summaries and, per standard practice in machine learning venues, do not contain the full experimental details, baselines, metrics, dataset descriptions, ablation studies, or evaluation protocols. These elements are provided in the main body of the manuscript (Sections 3–5 and the appendix), including dataset statistics, baseline implementations, metrics (accuracy, RMSE, AUC, etc.), evaluation protocols (in-context learning setup, train/test splits), and ablation studies on encoder design and decoder variants. The central claim regarding transferable target-agnostic embeddings is supported by the reported results and ablations in those sections, which enable verification. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an empirical architecture (shared encoder + task-specific decoders) trained unsupervised on unlabeled tables and evaluated on downstream tasks. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim rests on reported performance metrics rather than any self-referential reduction or ansatz smuggled via citation. This is a standard empirical contribution with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.1-grok · 5747 in / 1018 out tokens · 26979 ms · 2026-06-30T07:19:48.860314+00:00 · methodology

FlexTab: A Flexible Encoder-Decoder Architecture for In-Context Learning Across Diverse Tabular Tasks

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)