pith. sign in

arxiv: 2605.21544 · v1 · pith:FFT5KMYXnew · submitted 2026-05-20 · 💻 cs.LG · eess.SP

Tabular foundation models for robust calibration of near-infrared chemical sensing data

Pith reviewed 2026-05-22 00:30 UTC · model grok-4.3

classification 💻 cs.LG eess.SP
keywords near-infrared spectroscopytabular foundation modelsTabPFNcalibrationchemometricsregressionclassificationspectral data
0
0 comments X

The pith

Preprocessing-optimized TabPFN outperforms PLS and CNNs for NIR chemical sensing calibration while matching Ridge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether tabular foundation models can deliver reliable calibration for near-infrared spectroscopy used in food, pharma, and environmental analysis. NIR spectra are high-dimensional and collinear, with practical issues including limited samples, outliers, and the need to extrapolate beyond training data. Using a strict framework that tunes preprocessing and models only on calibration data before external testing, the study benchmarks TabPFN against PLS, Ridge, CatBoost, and CNN-1D across 66 datasets. In regression, the optimized TabPFN secures the top average rank and beats most baselines; in classification, raw TabPFN performs best. The work positions these models as complements to classical chemometrics in smaller calibration settings.

Core claim

Preprocessing-optimized TabPFN achieves the best overall average rank in regression across 54 NIR tasks and significantly outperforms PLS, CatBoost, TabPFN on raw spectra, and CNN-1D, while remaining statistically comparable to Ridge. In the 12 classification tasks, TabPFN applied directly to raw spectra yields the best average rank with performance close to its optimized variant. Robustness checks confirm strong average predictive performance, yet the advantage narrows on spectral outliers and extrapolated samples where classical chemometric models stay competitive.

What carries the argument

TabPFN, a tabular foundation model that performs direct inference on spectral data (raw or preprocessed) via prior-data fitting to manage high-dimensional collinear inputs without task-specific retraining.

If this is right

  • Tabular foundation models can serve as a practical addition to chemometric pipelines for small- to medium-sized NIR calibration sets.
  • Preprocessing selection remains valuable for TabPFN in regression even though the model is foundation-based.
  • Classical methods like Ridge retain an edge when spectral outliers or extrapolation dominate the test conditions.
  • Direct raw-spectrum use of TabPFN may simplify workflows in classification tasks for NIR data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adding spectroscopy-specific priors to foundation models could narrow the performance gap on outliers and extrapolated samples.
  • Uncertainty estimates from TabPFN could guide safer decisions in real-time chemical sensing deployments.
  • Broader testing across more varied NIR instruments and sample types would test whether the observed ranking generalizes.

Load-bearing premise

The 66 NIR datasets and the external-test validation framework represent typical practical deployment conditions that include spectral outliers and extrapolation beyond the calibration domain.

What would settle it

New NIR datasets containing more spectral outliers and clear extrapolation cases where TabPFN no longer maintains its performance edge over Ridge or PLS.

Figures

Figures reproduced from arXiv: 2605.21544 by Denis Cornet, Fabien Michel, Gregory Beurier, Lauriane Rouan, Robin Reiter.

Figure 1
Figure 1. Figure 1: Scatter plot representing the diversity of datasets in sample sizes and number of variables. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Critical difference diagram and beeswarm plot of iRMSEP values relative to PLS across [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative iRMSEP versus PLS across regression datasets ordered by dataset sample size. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dataset-wise comparison between TabPFN-Raw and TabPFN-opt in terms of iRMSEP [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Scatter plot representing the database-wise average rank of each model on all test samples [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dataset-wise iRMSEP versus PLS on extrapolated target values (outside the training range). [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Critical difference diagram and beeswarm plot of relative balanced-accuracy gains versus [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Cumulative relative balanced-accuracy gain versus PLS-DA across classification datasets [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Dataset-wise relative improvement versus PLS for regression tasks, expressed as iRMSEP. [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Dataset-level heatmap of iRMSEP relative to PLS for regression tasks. Positive values [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Regression rank heatmap after aggregation at the database level. Lower ranks indicate [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Cumulative iRMSEP values against PLS using 3 different x axis: dataset number of samples, [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Weighted preprocessing frequencies for linear models (PLS and Ridge) across the regression [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Weighted preprocessing frequencies for nonlinear models (CatBoost, TabPFN, and CNN) [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Dataset-wise relative balanced-accuracy gain versus PLS-DA for classification tasks. Each [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Classification rank heatmap after aggregation at the database level. Lower ranks indicate [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Weighted preprocessing frequencies by search subspace for CatBoost, TabPFN, and CNN [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗
read the original abstract

Near-infrared spectroscopy is increasingly used as a rapid, non-destructive chemical sensing technology for the analysis of food, pharmaceutical, biological, and environmental samples. However, the practical deployment of NIR sensors still depends on calibration models able to handle high-dimensional, collinear spectra, limited sample sizes, preprocessing dependence, spectral outliers, and extrapolation beyond the calibration domain. Here, we evaluate whether tabular foundation models can provide a new calibration strategy for NIR chemical sensing. We benchmark TabPFN on 66 NIR datasets covering 54 regression and 12 classification tasks, and compare direct inference on raw spectra with preprocessing-optimized inference against PLS/PLS-DA, Ridge, Catboost, and one-dimensional convolutional neural networks. The study uses a unified validation framework in which preprocessing and model selection are performed exclusively on calibration data before external test evaluation. In regression, preprocessing-optimized TabPFN achieves the best overall average rank and significantly outperforms PLS, CatBoost, TabPFN on raw spectra, and CNN-1D, while remaining statistically comparable to Ridge. In classification, TabPFN applied directly to raw spectra provides the best average rank, with performance close to the optimized variant. Robustness analyses show that TabPFN provides strong average predictive performance but that its advantage decreases on spectral outliers and extrapolated samples, where classical chemometric models remain competitive. These results suggest that tabular foundation models can complement established chemometric workflows for NIR chemical sensing, especially in small- to medium-sized calibration settings, while highlighting the need for spectroscopy-specific priors and uncertainty-aware deployment strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript evaluates TabPFN, a tabular foundation model, as a calibration approach for near-infrared (NIR) chemical sensing. It benchmarks preprocessing-optimized TabPFN and direct inference on raw spectra against PLS/PLS-DA, Ridge, CatBoost, and 1D-CNN across 66 NIR datasets (54 regression and 12 classification tasks). A unified validation framework keeps all preprocessing and model selection inside the calibration set prior to external test evaluation. In regression, preprocessing-optimized TabPFN obtains the best average rank and significantly outperforms PLS, CatBoost, raw TabPFN, and CNN-1D while remaining comparable to Ridge. In classification, raw TabPFN yields the best average rank. Robustness checks indicate that TabPFN advantages shrink on spectral outliers and extrapolated samples, where classical models stay competitive. The work concludes that tabular foundation models can complement chemometric workflows, especially for small- to medium-sized calibrations.

Significance. If the benchmark collection adequately represents deployment conditions, the results indicate that TabPFN can serve as a practical addition to established NIR calibration pipelines, particularly when sample sizes are limited. The strict internal validation protocol and explicit robustness analyses on outliers/extrapolation constitute clear methodological strengths that increase the reliability of the reported rankings. The study supplies a large-scale, reproducible-style empirical comparison that could guide future spectroscopy-specific adaptations of foundation models.

major comments (1)
  1. [Robustness analyses] Robustness analyses section: the paper correctly notes that TabPFN advantages decrease on spectral outliers and extrapolated samples, yet provides no quantitative breakdown of how many external-test samples (or what fraction per dataset) fall into these categories across the 66 NIR collections. Without this information it remains unclear whether the headline average-rank result is driven mainly by easier in-domain cases, which directly affects the strength of the practical-robustness claim made in the abstract.
minor comments (2)
  1. [Abstract] Abstract and results sections: the phrase 'statistically comparable to Ridge' should be accompanied by the exact test and p-value threshold used for all significance statements.
  2. [Methods] Dataset description: the selection criteria and source diversity of the 66 NIR datasets should be stated more explicitly to allow readers to judge representativeness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the methodological strengths of our unified validation framework and robustness checks. We address the single major comment below.

read point-by-point responses
  1. Referee: Robustness analyses section: the paper correctly notes that TabPFN advantages decrease on spectral outliers and extrapolated samples, yet provides no quantitative breakdown of how many external-test samples (or what fraction per dataset) fall into these categories across the 66 NIR collections. Without this information it remains unclear whether the headline average-rank result is driven mainly by easier in-domain cases, which directly affects the strength of the practical-robustness claim made in the abstract.

    Authors: We agree that the current manuscript lacks an explicit quantitative summary of the number and fraction of external-test samples falling into the spectral-outlier and extrapolation categories. In the revised version we will add a supplementary table that, for each of the 66 datasets, reports (i) the absolute count and (ii) the percentage of test samples classified as outliers or as extrapolation cases according to the definitions already used in the robustness section. This addition will allow readers to gauge the relative weight of in-domain versus challenging samples in the reported average ranks and will thereby strengthen the practical-robustness interpretation without changing the main conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical benchmarking study

full rationale

The manuscript is an empirical benchmarking study that evaluates TabPFN against PLS, Ridge, CatBoost, and CNN-1D on 66 NIR datasets using a fixed external-test validation protocol. All reported results (average ranks, statistical comparisons, robustness on outliers/extrapolation) are direct numerical outcomes of model training and inference on held-out data; no derivations, first-principles equations, or predictions are claimed. Consequently there are no load-bearing steps that reduce by construction to quantities fitted on the target data, no self-citation chains supporting uniqueness theorems, and no ansatzes or renamings of known results. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the chosen NIR datasets and the assumption that standard tabular modeling without spectroscopy-specific priors is sufficient for competitive performance.

free parameters (1)
  • preprocessing parameters
    Chosen via optimization on calibration data only; affects the reported performance of the optimized TabPFN variant.
axioms (1)
  • domain assumption The 66 NIR datasets are representative of real-world chemical sensing tasks including outliers and extrapolation.
    Invoked to support generalizability of the average-rank results and robustness conclusions.

pith-pipeline@v0.9.0 · 5818 in / 1269 out tokens · 57150 ms · 2026-05-22T00:30:18.040135+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 2 internal anchors

  1. [1]

    Akiba, S

    T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama. Optuna: A next-generation hyperparam- eter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, page 2623–2631, New York, NY, USA, 2019. Association for Computing Machinery

  2. [2]

    R. J. Barnes, M. S. Dhanoa, and S. J. Lister. Standard normal variate transformation and de- trending of near-infrared diffuse reflectance spectra.Applied Spectroscopy, 43(5):772–777, 1989

  3. [3]

    Bergstra, R

    J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. Algorithms for hyper-parameter optimization. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011

  4. [4]

    Beurier, D

    G. Beurier, D. Cornet, and L. Rouan. Nirs4all: Open spectroscopy for everyone, 2026

  5. [5]

    Cen and Y

    H. Cen and Y. He. Theory and application of near infrared reflectance spectroscopy in determi- nation of food quality.Trends in Food Science & Technology, 18(2):72–83, 2007

  6. [6]

    P. H. C. Eilers. A perfect smoother.Analytical Chemistry, 75(14):3631–3636, 2003. PMID: 14570219

  7. [7]

    Erickson, J

    N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon-tabular: Robust and accurate automl for structured data, 2020

  8. [8]

    Ferrari and V

    M. Ferrari and V. Quaresima. A brief review on the history of human functional near-infrared spectroscopy (fNIRS) development and fields of application.NeuroImage, 63(2):921–935, 2012

  9. [9]

    Friedman

    M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance.Journal of the American Statistical Association, 32(200):675–701, 1937

  10. [10]

    R. e. a. Galvão. A method for calibration and validation subset partitioning.Talanta, 67(4):736– 740, 2005

  11. [11]

    TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

    L. Grinsztajn, K. Flöge, O. Key, F. Birkel, P. Jund, B. Roof, B. Jäger, D. Safaric, S. Alessi, A. Hayler, M. Manium, R. Yu, F. Jablonski, S. B. Hoo, A. Garg, J. Robertson, M. Bühler, V. Moroshan, L. Purucker, C. Cornu, L. C. Wehrhahn, A. Bonetto, B. Schölkopf, S. Gambhir, N. Hollmann, and F. Hutter. TabPFN-2.5: Advancing the State of the Art in Tabular Fo...

  12. [12]

    Grinsztajn, E

    L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on tabular data?, 2022

  13. [13]

    A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970

  14. [14]

    Hollmann et al

    N. Hollmann et al. Accurate predictions on small data with a tabular foundation model.Nature, 637:319–326, 2025

  15. [15]

    TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

    N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second, Sept. 2023. arXiv:2207.01848 [cs]. 20

  16. [16]

    Hotelling

    H. Hotelling. The generalization of student’s ratio.The Annals of Mathematical Statistics, 2(3):360–378, 1931

  17. [17]

    M. E. Houngbo, L. Desfontaines, J.-L. Diman, G. Arnau, C. Mestres, F. Davrieux, L. Rouan, G. Beurier, C. Marie-Magdeleine, K. Meghar, E. O. Alamu, B. O. Otegbayo, and D. Cornet. Convolutional neural network allows amylose content prediction in yam (dioscorea alata l.) flour using near infrared spectroscopy.Journal of the Science of Food and Agriculture, 1...

  18. [18]

    Jamrógiewicz

    M. Jamrógiewicz. Application of the near-infrared spectroscopy in the pharmaceutical technology. Journal of Pharmaceutical and Biomedical Analysis, 66:1–10, July 2012

  19. [19]

    L. Li, R. Cao, L. Zhao, N. Liu, H. Sun, Z. Zhang, and Y. Sun. Near-Infrared Spectroscopy Combined with Explainable Machine Learning for Storage Time Prediction of Frozen Antarctic Krill.Foods, 14(8):1293, Jan. 2025

  20. [20]

    J. Liu, K. Yao, C. Chen, D. Li, D. Xu, R. Guo, W. Yao, Y. Deng, W. Xue, Q. Wu, T. Li, X. He, and P. K. Singh. Predicting and classifying deoxynivalenol in wheat flour using ATR-FTIR spectroscopy and explainable machine learning.Journal of Hazardous Materials, 504:141322, Feb. 2026

  21. [21]

    M. Manley. Near-infrared spectroscopy and hyperspectral imaging: non-destructive analysis of biological materials.Chem. Soc. Rev., 43:8200–8214, 2014

  22. [22]

    Martens and E

    H. Martens and E. Stark. Extended multiplicative signal correction and spectral interference subtraction.Journal of Pharmaceutical and Biomedical Analysis, 9(8):625–635, 1991

  23. [23]

    Deeplearningfornear-infraredspectraldatamodelling.TrAC Trends in Analytical Chemistry, 157:116804, 2022

    P.Mishraetal. Deeplearningfornear-infraredspectraldatamodelling.TrAC Trends in Analytical Chemistry, 157:116804, 2022

  24. [24]

    arXiv preprint arXiv:2112.10510 , year=

    S. Müller, N. Hollmann, S. P. Arango, J. Grabocka, and F. Hutter. Transformers Can Do Bayesian Inference, Aug. 2024. arXiv:2112.10510 [cs]

  25. [25]

    P. B. Nemenyi.Distribution-free multiple comparisons.Princeton University, 1963

  26. [26]

    B. G. Osborne. Near-infrared spectroscopy in food analysis. InEncyclopedia of Analytical Chem- istry. John Wiley & Sons, Ltd, 2006

  27. [27]

    Recentadvancementsinnirspectroscopyforassessingfoodquality.Frontiers in Nutrition, 9, 2022

    R.Pandiselvametal. Recentadvancementsinnirspectroscopyforassessingfoodquality.Frontiers in Nutrition, 9, 2022

  28. [28]

    Pasquini

    C. Pasquini. Near infrared spectroscopy: A mature analytical technique with new perspectives. Analytica Chimica Acta, 1026:8–36, 2018

  29. [29]

    Prokhorenkova, G

    L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin. Catboost: Unbiased boosting with categorical features, 2019

  30. [30]

    Rinnan, F

    Å. Rinnan, F. van den Berg, and S. Engelsen. Review of the most common pre-processing tech- niques for near-infrared spectra.TrAC Trends in Analytical Chemistry, 28(10):1201–1222, 2009

  31. [31]

    Savitzky and M

    A. Savitzky and M. Golay. Smoothing and differentiation of data by simplified least squares procedures.Analytical Chemistry, 36(8):1627–1639, 1964

  32. [32]

    Stenberg, R

    B. Stenberg, R. A. Viscarra Rossel, A. M. Mouazen, and J. Wetterlind. Chapter five - visible and near infrared spectroscopy in soil science. In D. L. Sparks, editor,Advances in Agronomy, volume 107 ofAdvances in Agronomy, pages 163–215. Academic Press, 2010

  33. [33]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polo- sukhin. Attention is all you need, 2017

  34. [34]

    T. Wang, Y. Zheng, L. Xu, and Y.-H. Yun. Comprehensive comparison on different wavelength selection methods using several near-infrared spectral datasets with different dimensionalities. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 331:125767, Apr. 2025. 21

  35. [35]

    S. Wold, H. Antti, F. Lindgren, and J. Öhman. Orthogonal signal correction of near-infrared spectra.Chemometrics and Intelligent Laboratory Systems, 44(1):175–185, 1998

  36. [36]

    S. Wold, M. Sjöström, and L. Eriksson. Pls-regression: A basic tool of chemometrics.Chemomet- rics and Intelligent Laboratory Systems, 58(2):109–130, 2001

  37. [37]

    With CNN-1D

    W. Zhang, C. Kasun, Q. J. Wang, Y. Zheng, and Z. Lin. A review of machine learning for near-infrared spectroscopy.Sensors, 22(24):9764, 2022. 22 6 Supplementary Material 6.1 Theoretical foundations of PFNs 6.1.1 Statistical context LetX∈ X ⊂R d, Y∈ Ybe a random vector of features and a random vector of targets, respectively. Let us consider a classificati...