pith. sign in

arxiv: 2605.21534 · v1 · pith:P3TS5XRJnew · submitted 2026-05-20 · 📊 stat.ML · cs.LG

Adaptive RBF-KAN: A Comparative Evaluation of Dynamic Shape Parameters in Kolmogorov-Arnold Networks

Pith reviewed 2026-05-22 01:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Kolmogorov-Arnold NetworksRadial Basis FunctionsAdaptive Shape ParametersLeave-One-Out Cross-ValidationFunction ApproximationKernel SelectionBenchmark Functions
0
0 comments X

The pith

LOOCV initialization plus adaptive kernels improves RBF-based KAN models on 2D benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends RBF-based Kolmogorov-Arnold Networks by replacing fixed Gaussian kernels with a broader family that includes Matern and Wendland kernels. It initializes the kernel shape parameter with a leave-one-out cross-validation estimate and then refines the value during network training. Tests on several two-dimensional benchmark functions show that kernel choice and adaptation matter, with different kernels performing better on smooth, discontinuous, or oscillatory targets. A sympathetic reader would care because the method keeps the efficiency gains of RBF-KAN while adding data-driven flexibility that fixed-parameter versions lack.

Core claim

The central claim is that combining LOOCV-based kernel scale estimation with adaptive refinement during training, together with the introduction of Matern and Wendland kernels, supplies a practical strategy for improving RBF-based KAN models, as evidenced by comparative results on two-dimensional benchmark functions where kernel selection yields advantages for different function classes.

What carries the argument

LOOCV-based initialization of the radial kernel shape parameter that is subsequently refined during deep KAN training.

If this is right

  • Kernel selection can be guided by the smoothness, presence of discontinuities, or oscillatory character of the target function.
  • Adaptive shape parameters deliver measurable gains over the fixed-parameter approach used in FastKAN.
  • The overall framework retains the computational efficiency of RBF replacements while increasing representational flexibility compared with spline-based KANs.
  • LOOCV supplies a reproducible, data-driven starting value that removes manual tuning of the initial scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same initialization-plus-adaptation pattern could be tested on higher-dimensional or noisy data sets where kernel scale choice is even more critical.
  • Hybrid models might combine the new RBF options with spline or other basis functions inside a single KAN layer.
  • Similar cross-validation warm-start techniques could be applied to tune other learnable parameters in related neural architectures for function approximation.

Load-bearing premise

That the LOOCV estimate supplies a useful starting scale for the kernel which training can improve and that performance patterns observed on 2D functions indicate advantages for broader classes of targets.

What would settle it

A head-to-head test on the same 2D benchmark functions in which the adaptive RBF-KAN versions produce no lower approximation error or no faster convergence than the original fixed-kernel FastKAN.

Figures

Figures reproduced from arXiv: 2605.21534 by Adeeba Haider, Alessandra De Rossi, Amir Noorizadegan, Roberto Cavoretto.

Figure 1
Figure 1. Figure 1: KAF layer architecture. where Cj (x) = cos(j arccos(x)), x ∈ [−1, 1]. Here, Cj (x) represents a Chebyshev polynomial of the first kind that helps in minimization of maximum error in the L∞ norm, resulting in an exponential convergence near-optimal polynomial approximation for smooth functions [21, 11]. The Chebyshev KAN model with three input features, the Chebyshev polynomial of degree 1, and output shape… view at source ↗
Figure 2
Figure 2. Figure 2: Cheby-KAN architecture. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Training curves (Relative L2 Error vs. Epoch) for adaptive FastKAN with different RBF kernels across four benchmark functions [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Reconstruction of f1 with GA kernel. GA works best here because the function is very smooth (L2 error: 4.07e-3). 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reconstruction of f2. M2 captures the sharp jump much better than GA. (a) Baseline: GA (L2: 4.56e-1) (b) Best: W2 (L2: 6.24e-2) [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Reconstruction of f3. GA completely misses the waves, but W2 rebuilds the pattern well. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Reconstruction of f4. W6 handles the sharp peak better, giving 52% less error than GA. cientKAN, Chebyshev KAN, KAF, standard KAN, and MLP. From the results, it is clear that using a fixed shape parameter of value 0.5714, as suggested in [20], does not perform well in terms of accuracy. On the other hand, a significant error drop in term of relative L2 norm could be observed in the case of our proposed mod… view at source ↗
read the original abstract

Kolmogorov-Arnold Networks (KANs) approximate multivariate functions using learnable univariate edge functions, typically parameterized by B-spline bases. Although effective, spline-based implementations can be computationally expensive. A modified version of KANs, called FastKAN, improves efficiency by replacing splines with Gaussian radial basis functions (RBFs), but it relies on a fixed kernel and shape parameter. In this work, we extend the RBF-based KAN framework by introducing a broader family of radial basis kernels and by initializing the kernel shape parameter using leave-one-out cross-validation (LOOCV). To the best of our knowledge, this is the first study that integrates LOOCV-based kernel scale estimation with deep KAN training. We also introduce Mat\'ern and Wendland kernels into the KAN framework for the first time, enabling more flexible basis representations beyond the Gaussian kernel used in FastKAN. The LOOCV estimate provides a data-driven initialization of the kernel scale, which is subsequently refined during network training. The proposed adaptive RBF-KAN is evaluated on several two-dimensional benchmark functions. The results highlight the importance of kernel selection and adaptive shape parameters, with different kernels showing advantages for smooth functions, discontinuities, and oscillatory patterns. Overall, combining LOOCV-based initialization with adaptive kernel learning provides a practical strategy for improving RBF-based KAN models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper extends the FastKAN framework by replacing fixed Gaussian RBFs with a broader family of radial basis kernels (including Matérn and Wendland) and by initializing the kernel shape parameter via leave-one-out cross-validation (LOOCV) before refining it during training. The adaptive RBF-KAN is evaluated on several two-dimensional benchmark functions, with the central claim that kernel selection and adaptive shape parameters yield practical improvements over fixed-shape baselines, with different kernels suiting smooth, discontinuous, and oscillatory targets.

Significance. If the reported gains hold under scaling, the work supplies a concrete, data-driven strategy for kernel choice and initialization in RBF-KANs. Notable strengths include the first integration of LOOCV-based scale estimation with deep KAN training and the introduction of Matérn and Wendland kernels into the KAN setting; these are reproducible, falsifiable extensions that could be directly adopted by practitioners.

major comments (2)
  1. [Experiments] Experiments section: All reported benchmarks are confined to two-dimensional functions. Because the number of adaptive shape parameters grows linearly with the number of edges and layers, the stability and benefit of per-edge or per-layer adaptation remain untested once input dimension or network width increases; this directly affects whether the claimed practical strategy generalizes beyond the toy regime.
  2. [Abstract] Abstract and Method description: The central claim of practical improvement rests on benchmark outcomes, yet the abstract supplies no quantitative metrics, error bars, training hyperparameters, or statistical significance tests. Without these details it is impossible to judge the magnitude or reliability of the reported kernel advantages.
minor comments (2)
  1. [Method] Notation for the shape parameter is introduced without an explicit equation linking the LOOCV objective to the subsequent gradient-based refinement; a short derivation or pseudocode would clarify the two-stage procedure.
  2. [Experiments] The paper asserts that different kernels are advantageous for different function classes, but does not report per-function quantitative comparisons (e.g., MSE tables with standard deviations across multiple runs).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recognition of the novelty in integrating LOOCV-based initialization and additional kernels into the RBF-KAN framework. We address each major comment below and describe the corresponding revisions.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: All reported benchmarks are confined to two-dimensional functions. Because the number of adaptive shape parameters grows linearly with the number of edges and layers, the stability and benefit of per-edge or per-layer adaptation remain untested once input dimension or network width increases; this directly affects whether the claimed practical strategy generalizes beyond the toy regime.

    Authors: We agree that confining the benchmarks to two-dimensional functions leaves the scalability of per-edge adaptive shape parameters untested for higher input dimensions or wider networks. The current study deliberately focuses on standard 2D benchmarks to isolate and compare kernel-specific behaviors on smooth, discontinuous, and oscillatory targets. In the revised manuscript we have added a new paragraph in the Discussion section that explicitly analyzes the linear growth in adaptive parameters, discusses potential stability implications, and proposes per-layer sharing of shape parameters as a practical mitigation. We also outline higher-dimensional scaling experiments as a clear direction for future work. No new empirical results on higher-dimensional cases are added in this revision. revision: partial

  2. Referee: [Abstract] Abstract and Method description: The central claim of practical improvement rests on benchmark outcomes, yet the abstract supplies no quantitative metrics, error bars, training hyperparameters, or statistical significance tests. Without these details it is impossible to judge the magnitude or reliability of the reported kernel advantages.

    Authors: We accept that the original abstract was insufficiently quantitative. We have revised the abstract to include specific performance metrics (e.g., relative MSE reductions for each kernel on the benchmark suite), a brief reference to the LOOCV initialization procedure, and a note that results are reported with error bars from repeated runs. Full tables with hyperparameters and statistical comparisons remain in the Experiments section; the abstract now points readers to these details so that the magnitude and reliability of the kernel advantages can be assessed directly. revision: yes

Circularity Check

0 steps flagged

Empirical extension with no circular derivation chain

full rationale

The paper introduces LOOCV-based initialization for kernel shape parameters in RBF-KANs, adds Matérn and Wendland kernels, and evaluates the resulting adaptive model on 2D benchmark functions. No mathematical derivation is claimed that reduces a prediction or result to its own fitted inputs by construction. The central practical strategy is supported by reported benchmark comparisons rather than self-referential fitting, self-citation load-bearing premises, or imported uniqueness theorems. The work is self-contained as an empirical study whose claims rest on external performance data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions from the FastKAN framework and general RBF approximation theory; no new free parameters are introduced beyond the adaptive shape values learned during training, and no invented entities are postulated.

axioms (1)
  • domain assumption Radial basis functions with adjustable shape parameters can serve as effective univariate edge functions in KAN architectures
    Inherited from the FastKAN baseline referenced in the abstract

pith-pipeline@v0.9.0 · 5787 in / 1261 out tokens · 51597 ms · 2026-05-22T01:56:05.824795+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 3 internal anchors

  1. [1]

    Layer Normalization

    Ba, J. L., Kiros, J. R., Hinton, G. E. Layer normalization. arXiv pre print arXiv:1607.06450 (2016). https://arxiv.org/abs/1607.06450

  2. [2]

    Kernel-based methods and function app roximation

    Baudat, G., Anouar, F. Kernel-based methods and function app roximation. IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH372 22), Washington, DC, USA, vol. 2, pp. 1244– 1249, 2001. https://doi.org/10.1109/IJCNN.2001.939539

  3. [3]

    Blealtan, A. D. An efficient implementation of Kolmogorov-Arnold ne twork. GitHub Repository. Available at: https://github.com/Blealtan/efficient-kan (2024)

  4. [4]

    Buhmann, M. D. Radial basis functions. Acta Numerica 9 (2000), 1–38. https://doi.org/10.1017/S0962492900000015

  5. [5]

    S., Sergeyev, Y

    Cavoretto, R., De Rossi, A., Mukhametzhanov, M. S., Sergeyev, Y. D. On the search of the shape parameter in radial basis functions using univariate global optimization methods. Journal of Global Optimization 79 (2021), 305–327. https://doi.org/10.1007/s10898-019-00853-3

  6. [6]

    Bayesian approach fo r radial kernel parameter tuning

    Cavoretto, R., De Rossi, A., Lancellotti, S. Bayesian approach fo r radial kernel parameter tuning. Journal of Computational and Applied Mathematics 441 (2024) 115716. https://doi.org/10.1016/j.cam.2023.115716 18

  7. [7]

    The effects of training algorithms in MLP netw ork on image classification

    Coskun, N., Yildirim, T. The effects of training algorithms in MLP netw ork on image classification. Proceedings of the International Joint Conference on Neural Ne tworks, vol. 2, pp. 1223–1226, 2003. https://doi.org/10.1109/IJCNN.2003.1223867

  8. [8]

    Neural network approximatio n

    DeVore, R., Hanin, B., Petrova, G. Neural network approximatio n. Acta Numerica 30 (2021), 327–444. https://doi.org/10.1017/S0962492921000052

  9. [9]

    Fasshauer, G. E. Meshfree Approximation Methods with MATLAB . Interdisciplinary Mathematical Sciences, Volume 6, World Scientific, Singapore, 2007. https://doi.org/10.1142/6437

  10. [10]

    Kernel-based Approximation Meth ods using MATLAB

    Fasshauer, G., McCourt, M. Kernel-based Approximation Meth ods using MATLAB. Interdisciplinary Mathe- matical Sciences, Volume 19, World Scientific, Singapore, 2015. https://doi.org/10.1142/9335

  11. [11]

    Gao, Y., Tan, V. Y. F. On the Convergence of (Stochastic) Gra dient Descent for Kolmogorov-Arnold Networks. arXiv preprint arXiv:2410.08041 (2024). https://arxiv.org/abs/2410.08041

  12. [12]

    Representation properties of networks: Kolmogorov’s theorem is irrelevant

    Girosi, F., Poggio, T. Representation properties of networks: Kolmogorov’s theorem is irrelevant. Neural Computation, 1(4), 465–469 (1989). https://doi.org/10.1162/neco.1989.1.4.465

  13. [13]

    Kolmogorov’s mapping neural network existe nce theorem

    Hecht-Nielsen, R. Kolmogorov’s mapping neural network existe nce theorem. In Proceedings of the IEEE First International Conference on Neural Networks, San D iego, CA, Vol. 3, pp. 11–14, 1987. https://cs.uwaterloo.ca/~y328yu/classics/Hecht-Nielsen.pdf

  14. [14]

    Gaussian Error Linear Units (GELUs)

    Hendrycks, D., Gimpel, K. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016). https://arxiv.org/abs/1606.08415

  15. [15]

    Multilayer feedforward ne tworks are universal approximators

    Hornik, K., Stinchcombe, M., White, H. Multilayer feedforward ne tworks are universal approximators. Neural Networks, 2 (1989), 359–366. https://doi.org/10.1016/0893-6080(89)90020-8

  16. [16]

    Kolmogorov, A. N. On the representation of continuous funct ions of several variables by superpo- sitions of continuous functions of fewer variables. Doklady Akademii Nauk SSSR , 108 (1956), 179–

  17. [17]

    https://cs.uwaterloo.ca/~y328yu/classics/Kolmogorov56.pdf

    [English translation in American Mathematical Society Translations (2) 17 (1961), 369–373]. https://cs.uwaterloo.ca/~y328yu/classics/Kolmogorov56.pdf

  18. [18]

    Learning multiple layers of features f rom tiny images

    Krizhevsky, A., Hinton, G. Learning multiple layers of features f rom tiny images. Technical Report, University of Toronto (2009). https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

  19. [19]

    Kolmogorov’s theorem is relevant

    Kurkova, V. Kolmogorov’s theorem is relevant. Neural Comput ation 3 (1991), 617–622. https://direct.mit.edu/neco/article-abstract/3/4/61 7/5612/Kolmogorov-s-Theorem-Is-Relevant?redirectedF r

  20. [20]

    Gradient-based le arning applied to document recognition

    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. Gradient-based le arning applied to document recognition. Proceedings of the IEEE, Vol. 86(11), 1998, pp. 2278–2324. https://ieeexplore.ieee.org/document/726791

  21. [21]

    Kolmogorov-Arnold networks are radial basis function ne tworks

    Li, Z. Kolmogorov-Arnold networks are radial basis function ne tworks. arXiv preprint arXiv:2405.06721 (2024). https://arxiv.org/abs/2405.06721

  22. [22]

    Granger Causality Detect ion with Kolmogorov-Arnold Networks

    Lin, H., Ren, M., Barucca, P., Aste, T. Granger Causality Detect ion with Kolmogorov-Arnold Networks. arXiv preprint arXiv:2412.15373 (2024). https://arxiv.org/abs/2412.15373

  23. [23]

    KAN: Kolmogorov-Arnold Networks

    Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljaˇ ci´ c , M., Hou, T.Y., Tegmark, M. KAN: Kolmogorov-Arnold networks. arXiv preprint arXiv:2404.19756 (20 24). https://arxiv.org/abs/2404.19756

  24. [24]

    Approximation of function and its de rivatives using ra- dial basis function networks

    Mai-Duy, N., Tran-Cong, T. Approximation of function and its de rivatives using ra- dial basis function networks. Applied Mathematical Modelling, 27(3) , 197–220 (2003). https://doi.org/10.1016/S0307-904X(02)00101-4

  25. [25]

    Error bounds for deep ReLU networks u sing the Kolmogorov-Arnold superposition theorem

    Montanelli, H., Yang, H. Error bounds for deep ReLU networks u sing the Kolmogorov-Arnold superposition theorem. arXiv preprint arXiv:1906.11945 (2020). https://arxiv.org/abs/1906.11945

  26. [26]

    Efficie nt truncated randomized SVD for mesh-free kernel methods

    Noorizadegan, A., Chen, C.-S., Cavoretto, R., De Rossi, A. Efficie nt truncated randomized SVD for mesh-free kernel methods. Computers & Mathematics with Applications 164 (2 024), 12–20. 19

  27. [27]

    L., Chen, C

    Noorizadegan, A., Cavoretto, R., Young, D. L., Chen, C. S. Sta ble weight updating: A key to reliable PDE solutions using deep learning. Engineering Analysis with Boundary Elements 168 (2024), 105933. https://doi.org/10.1016/j.enganabound.2024.105933

  28. [28]

    L., Hon, Y

    Noorizadegan, A., Young, D. L., Hon, Y. C., Chen, C. S. Power-e nhanced residual network for function approximation and physics-informed inverse problems. Applied Math ematics and Computation 480 (2024), 128910. https://doi.org/10.1016/j.amc.2024.128910

  29. [29]

    C., Young, D.-L., Chen, C.-S

    Noorizadegan, A., Hon, Y. C., Young, D.-L., Chen, C.-S. Enhancin g supervised surface reconstruction through implicit weight regularization. Engineering Analysis with Bound ary Elements 180 (2025), 106439. https://doi.org/10.1016/j.enganabound.2025.106439

  30. [30]

    Noorizadegan, A., Wang, S., Ling, L., Dominguez-Morales, J. P. A practitioner’s guide to Kolmogorov–Arnold networks. Computer Science Review , 62, 100991, 2026. https://doi.org/10.1016/j.cosrev.2026.100991

  31. [31]

    Approximation theory of the MLP model in neural net works

    Pinkus, A. Approximation theory of the MLP model in neural net works. Acta Numerica 8, 143–195 (1999). https://doi.org/10.1017/S0962492900002919

  32. [32]

    An algorithm for selecting a good value for the paramet er c in radial basis function interpolation

    Rippa, S. An algorithm for selecting a good value for the paramet er c in radial basis function interpolation. Advances in Computational Mathematics 11 (1999), 193–210. https://doi.org/10.1023/A:1018975909870

  33. [33]

    Learning representations b y back-propagating errors

    Rumelhart, D., Hinton, G., Williams, R. Learning representations b y back-propagating errors. Nature 323 (1986), 533–536. https://doi.org/10.1038/323533a0

  34. [34]

    The Kolmogorov-Arnold representation th eorem revisited

    Schmidt-Hieber, J. The Kolmogorov-Arnold representation th eorem revisited. Neural Networks 137 (2021), 119–126. https://doi.org/10.1016/j.neunet.2021.01.020

  35. [35]

    Seydi, S. T. Exploring the potential of polynomial basis function s in Kolmogorov-Arnold networks: A comparative study of different groups of polynomials (2024). arX iv preprint arXiv:2406.02583. https://arxiv.org/abs/2406.02583

  36. [36]

    S., Keerthana, A

    Sidharth, S. S., Keerthana, A. R., Gokul, R., Anas, K. P. Chebys hev polynomial-based Kolmogorov-Arnold networks: An efficient architecture for nonlinear function approx imation. arXiv preprint arXiv:2405.07200 (2024). https://arxiv.org/abs/2405.07200

  37. [37]

    Scattered Data Approximation

    Wendland, H. Scattered Data Approximation. Cambridge Monog raphs on Applied and Computational Mathematics, Volume 17, Cambridge University Pres s, Cambridge, 2005. https://doi.org/10.1017/CBO9780511617539

  38. [38]

    KAN or MLP: A fairer comparison

    Yu, R., Yu, W., Wang, X. KAN or MLP: A fairer comparison. arXiv pr eprint arXiv:2407.16674 (2024). https://arxiv.org/abs/2407.16674

  39. [39]

    Kolmogorov-Arnold Fourier N etworks

    Zhang, J., Fan, Y., Cai, K., Wang, K. Kolmogorov-Arnold Fourier N etworks. arXiv preprint arXiv:2502.06018 (2025). https://arxiv.org/abs/2502.06018 20