Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification

Lalitha Sankar; Nathan Stromberg; Samuel McDowell

arxiv: 2605.21742 · v1 · pith:KIGIAY5Znew · submitted 2026-05-20 · 💻 cs.LG · cs.IT· math.IT

Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification

Samuel McDowell , Nathan Stromberg , Lalitha Sankar This is my paper

Pith reviewed 2026-05-22 09:48 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords class imbalanceprior-data fitted networkstabular classificationin-context learningthresholdingdownsamplingcalibration

0 comments

The pith

Thresholding and downsampling correct class imbalance in prior-data fitted networks for tabular classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Prior-data fitted networks achieve strong results on tabular classification but suffer when classes are imbalanced, just like other models. Their in-context learning setup rules out standard loss-based fixes, so the authors adapt and evaluate classical alternatives such as thresholding and downsampling. Thresholding works particularly well because of the networks' calibration properties, while downsampling delivers comparable accuracy with the practical advantage of lower computation at inference time. This matters for deploying these models on real tabular datasets, where rare classes are common and performance on them is often critical.

Core claim

We have adapted several classical techniques addressing class imbalance and analyzed their performance on PFN classification. We observe that thresholding performs exceptionally well because of the calibration characteristics of PFNs, and downsampling performs comparably because of PFNs exceptional limited-data performance, with the additional benefit of reduced computation cost for inference.

What carries the argument

Adaptation of non-loss-based class imbalance techniques to the in-context learning dynamic of prior-data fitted networks.

Load-bearing premise

That classical class imbalance techniques can be successfully adapted to the in-context learning dynamic of PFNs even though loss-based strategies are impossible.

What would settle it

A test on held-out imbalanced tabular datasets where neither optimized thresholding nor downsampling raises minority-class F1 score above the unadjusted PFN baseline.

Figures

Figures reproduced from arXiv: 2605.21742 by Lalitha Sankar, Nathan Stromberg, Samuel McDowell.

**Figure 2.** Figure 2: Calibration Curve Averaged Over Datasets and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: shows an example of one such experiment on the kr-vs-kp dataset with a context size of N = 500 samples and an imbalance of π1 ∈ {0.05, 0.1, 0.2, 0.5}. We see that even as the imbalance changes and the crossover point moves, the maximum balanced accuracy point tracks τ = π1. 0.0 0.2 0.4 0.6 0.8 1.0 Decision threshold 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Minority accuracy Majority accuracy Balanced accuracy Max … view at source ↗

**Figure 3.** Figure 3: ROC Curve for kr-vs-kp with N = 25 C. Thresholding Based on the observations on the ROC curve, it is natural to ask how sensitive the per-class accuracies are to the choice of threshold for a given dataset. To evaluate this effect, we evaluate TabPFN performance with a variety of decision thresholds, τ . When we do this, we observe that the maximum balanced accuracy appears approximately at the value τ = … view at source ↗

**Figure 5.** Figure 5: TabPFN2.5 Downsample Crossover Table II shows the results of each correction method for each dataset tested. For each correction method, we report the balanced, and worst-class accuracy. We see that for most datasets, thresholding achieves both strong balanced and worst-class accuracy, with downsampling close behind. Meanwhile, both OS and TabPFGen perform worse than the base model alone. For OS, this can … view at source ↗

read the original abstract

Prior-data fitted networks (PFNs) have achieved exceptional performance on tabular classification tasks. However, like other classifiers, their performance can suffer under the effect of class imbalance, resulting in poor performance for rare classes. Several techniques exist which attempt to mitigate the deleterious effect of class imbalance on classification performance, but the in-context learning (ICL) dynamic of PFNs means that loss-based strategies are impossible, and other techniques are unproven. We have adapted several classical techniques addressing class imbalance and analyzed their performance on PFN classification. We observe that thresholding performs exceptionally well because of the calibration characteristics of PFNs, and downsampling performs comparably because of PFNs exceptional limited-data performance, with the additional benefit of reduced computation cost for inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adapts classical imbalance fixes to PFNs and reports solid practical results for thresholding and downsampling, but the causal attributions to calibration and limited-data traits rest on correlations.

read the letter

This paper looks at fixing class imbalance for prior-data fitted networks on tabular classification tasks. The central finding is that adapting classical methods works, particularly thresholding which the authors link to good calibration in PFNs, and downsampling which benefits from the model's ability to perform well with limited data while also speeding up inference. They handle the problem setup effectively by acknowledging that the in-context learning nature of PFNs blocks any approaches that depend on changing the training loss. The focus then shifts to post-hoc or data-level adjustments, and the results indicate these can mitigate poor performance on rare classes. Credit goes to the clear empirical focus on a common real-world issue. Tabular data often has imbalance, and PFNs are gaining traction, so targeted advice on this combination is helpful. The main soft spot is the strength of the causal claims. The stress-test note highlights that performance attributions rest on correlations unless the paper includes ablations like a miscalibrated PFN control or experiments varying context length independently. If those are missing, the explanations for why certain techniques excel remain unproven rather than demonstrated. This kind of work is for the community building or applying PFNs to practical tabular problems. Readers interested in deployment details will get value from the comparisons, while theory-focused readers may find less to engage with. I would send this to peer review. The observations are relevant and the paper is focused, so referees can help clarify the experimental support for the interpretations.

Referee Report

2 major / 2 minor

Summary. The paper adapts classical class-imbalance techniques to Prior-Data Fitted Networks (PFNs) for tabular classification. Because PFNs rely on in-context learning, loss-based reweighting is unavailable; the authors therefore evaluate thresholding, downsampling, and related heuristics on standard tabular benchmarks. They report that thresholding yields strong performance, attributed to PFNs’ calibration properties, while downsampling matches this performance and additionally reduces inference cost, attributed to PFNs’ strong limited-data behavior.

Significance. If the empirical results and the mechanistic attributions are substantiated, the work would supply immediately usable guidance for deploying PFNs on imbalanced tabular data and would clarify which inductive biases of PFNs interact favorably with classical imbalance remedies.

major comments (2)

[Experimental results / Discussion] The central claims rest on the statements that thresholding succeeds “because of the calibration characteristics of PFNs” and that downsampling succeeds “because of PFNs exceptional limited-data performance.” The manuscript reports only end-to-end accuracy; it contains neither (a) a miscalibrated PFN control nor (b) an explicit ablation that varies context length while holding imbalance ratio and dataset fixed. Without these isolations the causal language is unsupported.
[Experimental results] The abstract and results sections state performance observations but supply no table or appendix listing the exact datasets, imbalance ratios, number of runs, or statistical significance tests. Consequently the reader cannot assess whether the reported superiority of thresholding and downsampling generalizes beyond the chosen benchmarks.

minor comments (2)

[Methods] Clarify in the methods section precisely which classical imbalance techniques were implemented and how each was adapted to the ICL prompt format.
[Introduction] Add a short paragraph contrasting PFN behavior with that of standard gradient-based classifiers under the same imbalance corrections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important ways to strengthen the causal interpretations and experimental reporting. We address each point below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [Experimental results / Discussion] The central claims rest on the statements that thresholding succeeds “because of the calibration characteristics of PFNs” and that downsampling succeeds “because of PFNs exceptional limited-data performance.” The manuscript reports only end-to-end accuracy; it contains neither (a) a miscalibrated PFN control nor (b) an explicit ablation that varies context length while holding imbalance ratio and dataset fixed. Without these isolations the causal language is unsupported.

Authors: We agree that the current end-to-end results alone do not isolate the proposed mechanisms. While the attributions draw on established PFN properties from prior work, the manuscript would be stronger with direct controls. In the revision we will add (a) a miscalibrated PFN variant (via temperature scaling on the output probabilities) as a control and (b) an ablation that fixes the dataset and imbalance ratio while varying context length from 10 to 100 examples. These experiments will either support the mechanistic claims or prompt us to replace the causal phrasing with correlational language. revision: yes
Referee: [Experimental results] The abstract and results sections state performance observations but supply no table or appendix listing the exact datasets, imbalance ratios, number of runs, or statistical significance tests. Consequently the reader cannot assess whether the reported superiority of thresholding and downsampling generalizes beyond the chosen benchmarks.

Authors: We acknowledge the need for transparent reporting of experimental details. The full list of datasets (OpenML tabular classification tasks), imbalance ratios (ranging from 1:5 to 1:100), number of runs (five independent seeds per configuration), and statistical tests (paired Wilcoxon signed-rank tests with p < 0.05) is already present in the appendix. To improve accessibility we will insert a concise summary table in the main results section and add explicit cross-references from the abstract and results text. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical adaptation with no derivations or self-referential predictions

full rationale

The manuscript adapts existing class-imbalance techniques to the ICL setting of PFNs and reports observed performance differences on tabular benchmarks. No equations, fitted parameters, uniqueness theorems, or ansatzes appear. Attributions such as 'because of the calibration characteristics' are post-hoc interpretations of empirical results rather than load-bearing steps that reduce to the paper's own inputs by construction. Self-citations, if any, are not required to justify the central observations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract relies on unverified domain assumptions about PFN calibration and limited-data performance to explain why specific techniques succeed.

axioms (2)

domain assumption PFNs possess calibration characteristics that make thresholding an effective imbalance correction method.
Invoked directly in the abstract to account for thresholding success.
domain assumption PFNs exhibit exceptional performance in limited-data regimes.
Used in the abstract to explain why downsampling works comparably well.

pith-pipeline@v0.9.0 · 5657 in / 1170 out tokens · 37176 ms · 2026-05-22T09:48:01.019267+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We observe that thresholding performs exceptionally well because of the calibration characteristics of PFNs, and downsampling performs comparably because of PFNs exceptional limited-data performance
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the in-context learning (ICL) dynamic of PFNs means that loss-based strategies are impossible

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 2 internal anchors

[1]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Tabpfn-2.5: Advancing the state of the art in tabular foundation models , author=. arXiv preprint arXiv:2511.08667 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

arXiv preprint arXiv:2406.05216 , year=

TabPFGen--Tabular Data Generation with TabPFN , author=. arXiv preprint arXiv:2406.05216 , year=

work page arXiv
[3]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Tabpfn: A transformer that solves small tabular classification problems in a second , author=. arXiv preprint arXiv:2207.01848 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

arXiv e-prints , pages=

TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data , author=. arXiv e-prints , pages=

work page
[5]

Müller, N

Transformers can do bayesian inference , author=. arXiv preprint arXiv:2112.10510 , year=

work page arXiv
[6]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[7]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Images speak in images: A generalist painter for in-context visual learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[8]

1987 , publisher=

Structured Induction in Expert Systems , author=. 1987 , publisher=

work page 1987
[9]

2014 , publisher=

Vanschoren, Joaquin and van Rijn, Jan N and Bischl, Bernd and Torgo, Luis , journal=. 2014 , publisher=

work page 2014
[10]

Predicting good probabilities with supervised learning , isbn =

Niculescu-Mizil, Alexandru and Caruana, Rich , month = aug, year =. Predicting good probabilities with supervised learning , isbn =. Proceedings of the 22nd international conference on. doi:10.1145/1102351.1102430 , abstract =

work page doi:10.1145/1102351.1102430

[1] [1]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Tabpfn-2.5: Advancing the state of the art in tabular foundation models , author=. arXiv preprint arXiv:2511.08667 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

arXiv preprint arXiv:2406.05216 , year=

TabPFGen--Tabular Data Generation with TabPFN , author=. arXiv preprint arXiv:2406.05216 , year=

work page arXiv

[3] [3]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Tabpfn: A transformer that solves small tabular classification problems in a second , author=. arXiv preprint arXiv:2207.01848 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

arXiv e-prints , pages=

TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data , author=. arXiv e-prints , pages=

work page

[5] [5]

Müller, N

Transformers can do bayesian inference , author=. arXiv preprint arXiv:2112.10510 , year=

work page arXiv

[6] [6]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Images speak in images: A generalist painter for in-context visual learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[8] [8]

1987 , publisher=

Structured Induction in Expert Systems , author=. 1987 , publisher=

work page 1987

[9] [9]

2014 , publisher=

Vanschoren, Joaquin and van Rijn, Jan N and Bischl, Bernd and Torgo, Luis , journal=. 2014 , publisher=

work page 2014

[10] [10]

Predicting good probabilities with supervised learning , isbn =

Niculescu-Mizil, Alexandru and Caruana, Rich , month = aug, year =. Predicting good probabilities with supervised learning , isbn =. Proceedings of the 22nd international conference on. doi:10.1145/1102351.1102430 , abstract =

work page doi:10.1145/1102351.1102430