pith. machine review for the scientific record. sign in

arxiv: 2604.22034 · v1 · submitted 2026-04-23 · 💻 cs.LG · cs.CV· cs.NE

Recognition: unknown

LTBs-KAN: Linear-Time B-splines Kolmogorov-Arnold Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:21 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.NE
keywords Kolmogorov-Arnold networksB-splineslinear time complexityparameter reductionmatrix factorizationneural network efficiencyimage classification
0
0 comments X

The pith

Linear-time B-spline evaluation combined with matrix factorization makes Kolmogorov-Arnold Networks faster and smaller while preserving accuracy on image tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Kolmogorov-Arnold Network variant that replaces recursive B-spline calculations with a direct method whose cost grows only linearly with input size. It further applies product-of-sums factorization to the forward pass, cutting the number of trainable weights. On MNIST, Fashion-MNIST, and CIFAR-10 the resulting models reach comparable classification accuracy to earlier KAN versions yet run faster and require fewer parameters. A sympathetic reader cares because KANs were designed to offer greater interpretability than ordinary neural nets, but their slow spline evaluations have limited real-world use. If the approach holds, KAN layers could become routine components in efficient, explainable models.

Core claim

LTBs-KAN computes B-spline basis functions through a non-recursive linear-time procedure and factors the layer computation as a product of sums. These two changes together produce linear complexity in the spline evaluations and a reduced parameter count. The networks match the performance of prior KAN implementations on standard image-classification benchmarks.

What carries the argument

The non-recursive linear-time B-spline evaluator paired with product-of-sums matrix factorization applied inside each network layer.

If this is right

  • Training and inference cost scales linearly rather than with higher-order terms in the spline degree.
  • Total model parameters decrease while classification accuracy remains comparable on the tested datasets.
  • The new layers can be substituted into existing KAN-based architectures for immediate efficiency gains.
  • The speed and size benefits appear consistently across grayscale digit, fashion-item, and color-image tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same linearization and factorization pattern could be applied to other basis-function networks beyond KANs.
  • Faster KAN layers might now support deeper stacks or larger input dimensions that were previously impractical.
  • Hardware kernels optimized for the new direct evaluation order could further amplify the reported speedups.
  • If the factorization generalizes, similar parameter-sharing tricks may reduce memory use in related spline or polynomial models.

Load-bearing premise

The direct linear-time B-spline method and the factorization together keep exactly the same functional expressivity and numerical stability as the original recursive KAN formulation.

What would settle it

A side-by-side run of the original KAN and LTBs-KAN on CIFAR-10 that shows either higher wall-clock training time per epoch or lower test accuracy for the new method.

Figures

Figures reproduced from arXiv: 2604.22034 by Andres Mendez-Vazquez, Eduardo Rodriguez-Tello, Eduardo Said Merin-Martinez.

Figure 1
Figure 1. Figure 1: The proposed LTBs-KAN linear layer using the new LTBs Algorithm, see gridUpdate procedure 2 for details. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the Conv2D. Arquitecture Conv2D in LTBs-KAN, generalized convolution replaces the [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: LTBs-KAN-ConvNet architecture used in this experiment. The Max Pooling layers are done after every [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test loss values during a training run of the different models after 15 epochs over MNIST dataset. The [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Test loss values during a training run of the different models over [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Loss during a training run of 20 epochs over the CIFAR-10 dataset for MLP, AlexNet, [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Kolmogorov-Arnold Networks (KANs) are a recent neural network architecture offering an alternative to Multilayer Perceptrons (MLPs) with improved explainability and expressibility. However, KANs are significantly slower than MLPs due to the recursive nature of B-spline function computations, limiting their application. This work addresses these issues by proposing a novel base-spline Linear-Time B-splines Kolmogorov-Arnold Network (LTBs-KAN) with linear complexity. Unlike previous methods that rely on the Boor-Mansfield-Cox spline algorithm or other computationally intensive mathematical functions, our approach significantly reduces the computational burden. Additionally, we further reduce model's parameter through product-of-sums matrix factorization in the forward pass without sacrificing performance. Experiments on MNIST, Fashion-MNIST and CIFAR-10 demonstrate that LTBs-KAN achieves good time complexity and parameter reduction, when used as building architectural blocks, compared to other KAN implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes LTBs-KAN, a variant of Kolmogorov-Arnold Networks that replaces the recursive Boor-Mansfield-Cox B-spline evaluation with a novel non-recursive linear-time base-spline method and applies a product-of-sums matrix factorization during the forward pass to reduce parameter count. The central claims are that these modifications achieve linear complexity, preserve performance, and yield competitive results on MNIST, Fashion-MNIST, and CIFAR-10 when used as architectural blocks.

Significance. If the new B-spline procedure is shown to be exactly equivalent to the standard recursive formulation and the factorization is proven not to restrict the representable function space, the work would meaningfully address the primary practical barrier to KAN adoption by delivering substantial speed and memory gains without loss of expressivity or interpretability advantages over MLPs.

major comments (3)
  1. [§3.1] §3.1 (Linear-Time B-spline Method): No derivation, identity, or numerical equivalence check is supplied showing that the proposed non-recursive evaluation computes identically the same univariate B-spline basis functions as the Cox-de Boor recursion for arbitrary knot vectors and inputs; without this, the claim that expressivity and learned functions remain unchanged is unsupported and load-bearing for the performance-parity assertion.
  2. [§4.2] §4.2 (Product-of-Sums Factorization): The matrix factorization is introduced to reduce parameters, yet the manuscript provides neither an analysis of the altered span of representable functions nor a bound on the induced approximation error relative to the original KAN layer; this directly affects whether the “without sacrificing performance” guarantee holds.
  3. [Table 2] Table 2 (Timing and Accuracy Results): Reported wall-clock times and accuracies on CIFAR-10 lack direct head-to-head comparison against a standard KAN implementation using identical architecture and the same B-spline order, and no variance estimates or statistical tests are given, preventing verification that performance is truly preserved under the claimed linear-time regime.
minor comments (2)
  1. [Abstract] The abstract states “linear complexity” and “good time complexity” without supplying the formal big-O analysis or pseudocode that appears later in §3; a brief forward reference would improve readability.
  2. [§3] Notation for the new base-spline functions is introduced without an explicit comparison table to the standard B-spline notation used in the original KAN paper, which may confuse readers familiar with the recursive definition.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we will implement to strengthen the paper.

read point-by-point responses
  1. Referee: §3.1 (Linear-Time B-spline Method): No derivation, identity, or numerical equivalence check is supplied showing that the proposed non-recursive evaluation computes identically the same univariate B-spline basis functions as the Cox-de Boor recursion for arbitrary knot vectors and inputs; without this, the claim that expressivity and learned functions remain unchanged is unsupported and load-bearing for the performance-parity assertion.

    Authors: We agree that providing a formal equivalence proof is essential. In the revised version, we will add a detailed derivation demonstrating that our non-recursive base-spline evaluation yields exactly the same results as the Cox-de Boor recursion for any knot vector and input. We will also include numerical experiments verifying this equivalence across a range of configurations to support the claim of preserved expressivity. revision: yes

  2. Referee: §4.2 (Product-of-Sums Factorization): The matrix factorization is introduced to reduce parameters, yet the manuscript provides neither an analysis of the altered span of representable functions nor a bound on the induced approximation error relative to the original KAN layer; this directly affects whether the “without sacrificing performance” guarantee holds.

    Authors: We recognize the need for a theoretical analysis of the factorization's impact. The revised manuscript will include a section analyzing the representable functions under the product-of-sums approach, showing that it does not restrict the function space beyond what is already achievable in KAN layers, along with any necessary error bounds. This will better justify the performance claims. revision: yes

  3. Referee: Table 2 (Timing and Accuracy Results): Reported wall-clock times and accuracies on CIFAR-10 lack direct head-to-head comparison against a standard KAN implementation using identical architecture and the same B-spline order, and no variance estimates or statistical tests are given, preventing verification that performance is truly preserved under the claimed linear-time regime.

    Authors: We will revise Table 2 to incorporate head-to-head comparisons with standard KANs using matching architectures and B-spline orders. Furthermore, we will add variance estimates based on repeated experiments and conduct appropriate statistical tests to confirm that the observed performance is statistically equivalent. revision: yes

Circularity Check

0 steps flagged

No circularity: novel algorithm and empirical validation are independent of inputs

full rationale

The paper introduces a new non-recursive linear-time B-spline evaluation and a product-of-sums factorization, then reports wall-clock and accuracy results on MNIST/Fashion-MNIST/CIFAR-10. No equation is defined in terms of its own output, no fitted parameter is relabeled as a prediction, and no load-bearing premise reduces to a self-citation chain. The equivalence to classical B-splines is asserted by construction of the new method rather than derived from prior results inside the paper; therefore the derivation chain does not collapse to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the existence of a non-recursive linear-time B-spline algorithm that maintains accuracy; this is introduced as an ad-hoc innovation in the abstract.

axioms (1)
  • ad hoc to paper B-splines admit an efficient non-recursive linear-time evaluation procedure that preserves the properties required by KANs
    This is the core technical assumption enabling the linear complexity claim.

pith-pipeline@v0.9.0 · 5474 in / 1237 out tokens · 56275 ms · 2026-05-09T22:21:18.981961+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources

    cs.CV 2026-05 unverdicted novelty 6.0

    SRGAN-CKAN integrates convolutional Kolmogorov-Arnold networks into an adversarial super-resolution pipeline, replacing linear convolutions with spline-based nonlinear patch operators to improve perceptual quality und...

  2. SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources

    cs.CV 2026-05 unverdicted novelty 5.0

    SRGAN-CKAN integrates convolutional Kolmogorov-Arnold networks into an adversarial super-resolution pipeline, replacing linear convolutions with nonlinear functional operators to improve perceptual quality under const...

Reference graph

Works this paper leans on

43 extracted references · 23 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Z. Liu, Y . Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljacic, T. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks, in: International Conference on Learning Representations (ICLR), 2025, accessed: 2026-04-17. URLhttps://github.com/KindXiaoming/pykan 17 APREPRINT- APRIL27, 2026

  2. [2]

    de Boor, A Practical Guide to Splines, V ol

    C. de Boor, A Practical Guide to Splines, V ol. 27 of Applied Mathematical Sciences, Springer, New York, 1978

  3. [3]

    Prkan: Parameter-reduced kolmogorov-arnold networks,

    H.-T. Ta, D.-Q. Thai, A. Tran, G. Sidorov, A. Gelbukh, PRKAN: Parameter-Reduced Kolmogorov–Arnold Networks, accessed: 2026-04-17 (2025).doi:10.48550/arXiv.2501.07032. URLhttps://github.com/hoangthangta/BSRBF_KAN

  4. [4]

    Delis, FasterKAN: Efficient Implementation of Kolmogorov–Arnold Networks, accessed: 2026-04-17 (2024)

    A. Delis, FasterKAN: Efficient Implementation of Kolmogorov–Arnold Networks, accessed: 2026-04-17 (2024). URLhttps://github.com/AthanasiosDelis/faster-kan/

  5. [5]

    M. J. Gottlieb, Concerning some polynomials orthogonal on a finite or enumerable set of points, American Journal of Mathematics 60 (2) (1938) 453–458.doi:10.2307/2371307

  6. [6]

    URLhttps://github.com/Blealtan/efficient-kan

    Blealtan, An Efficient Implementation of Kolmogorov–Arnold Network (KAN), accessed: 2026-04-16 (2024). URLhttps://github.com/Blealtan/efficient-kan

  7. [7]

    K. He, X. Zhang, S. Ren, J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 1026–1034.doi:10.1109/ICCV.2015.123

  8. [8]

    1998 , month = nov, journal =

    Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, Gradient-Based Learning Applied to Document Recognition, Tech. Rep. 11, Proceedings of the IEEE (1998).doi:10.1109/5.726791

  9. [9]

    Li, Fastkan: A fast kolmogorov–arnold network, accessed: 2026-04-17 (2024)

    Z. Li, Fastkan: A fast kolmogorov–arnold network, accessed: 2026-04-17 (2024). URLhttps://github.com/ZiyaoLi/fast-kan

  10. [10]

    Learning- Based Link Anomaly Detection in Continuous-Time Dynamic Graphs,

    Z. Li, Kolmogorov–Arnold Networks are Radial Basis Function Networks (2024). doi:10.48550/arXiv.2405. 06721

  11. [11]

    S. T. Seydi, Exploring the Potential of Polynomial Basis Functions in Kolmogorov–Arnold Networks: A Compar- ative Study of Different Groups of Polynomials, accessed: 2026-04-16 (2024). doi:10.48550/arXiv.2406. 02583. URLhttps://github.com/seydi1370/Basis_Functions

  12. [12]

    H. Xiao, K. Rasul, R. V ollgraf, Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms (2017).doi:10.48550/arXiv.1708.07747

  13. [13]

    Krizhevsky, V

    A. Krizhevsky, V . Nair, G. Hinton, The CIFAR-10 Dataset, accessed: 2026-04-16 (2009). URLhttps://www.cs.toronto.edu/~kriz/cifar.html

  14. [14]

    LeCun, Y

    Y . LeCun, Y . Bengio, G. Hinton, Deep Learning, Nature 521 (7553) (2015) 436–444. doi:10.1038/ nature14539

  15. [15]

    Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994

    S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994

  16. [16]

    , title =

    G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2 (4) (1989) 303–314.doi:10.1007/BF02551274

  17. [17]

    Society for Industrial and Applied Mathematics, 2 edition, 2008

    A. Griewank, A. Walther, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2nd Edition, SIAM, Philadelphia, PA, 2008.doi:10.1137/1.9780898717761

  18. [18]

    A. N. Kolmogorov, On the Representation of Continuous Functions of Several Variables by Superpositions of Continuous Functions of One Variable and Addition, Doklady Akademii Nauk SSSR 114 (1957) 953–956

  19. [19]

    Schmidt-Hieber, The Kolmogorov–Arnold Representation Theorem Revisited, Neural Networks 137 (2021) 119–126.doi:10.1016/j.neunet.2021.01.020

    J. Schmidt-Hieber, The Kolmogorov–Arnold Representation Theorem Revisited, Neural Networks 137 (2021) 119–126.doi:10.1016/j.neunet.2021.01.020

  20. [20]

    D. A. Sprecher, S. Draghici, Space-Filling Curves and Kolmogorov Superposition-Based Neural Networks, Neural Networks 15 (1) (2002) 57–67.doi:10.1016/S0893-6080(01)00119-9

  21. [21]

    Glorot, Y

    X. Glorot, Y . Bengio, Understanding the Difficulty of Training Deep Feedforward Neural Networks, in: Proceed- ings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), V ol. 9 of Proceedings of Machine Learning Research, 2010, pp. 249–256

  22. [22]

    Piegl, W

    L. Piegl, W. Tiller, The NURBS Book, 2nd Edition, Springer, Berlin, 1997. doi:10.1007/ 978-3-642-59223-2

  23. [23]

    Chudy, P

    F. Chudy, P. Wo´ zny, Linear-time algorithm for computing the bernstein–bézier coefficients of b-spline basis functions, Computer-Aided Des. 154 (2023) 103434.doi:10.1016/j.cad.2022.103434

  24. [24]

    de Boor, On calculating with b-splines, Journal of Approximation Theory 6 (1972) 50–62

    C. de Boor, On calculating with b-splines, Journal of Approximation Theory 6 (1972) 50–62. doi:10.1016/ 0021-9045(72)90080-9. 18 APREPRINT- APRIL27, 2026

  25. [25]

    Boehm, Wolfgang and Müller, Andreas, On de Casteljau’s Algorithm, Computer Aided Geometric Design 16 (1999) 587–605.doi:10.1016/S0167-8396(99)00023-0

  26. [26]

    C. W. Wu, ProdSumNet: Reducing Model Parameters in Deep Neural Networks via Product-of-Sums Matrix Decompositions, accessed: 2026-04-17 (2019).doi:10.48550/arXiv.1809.02209

  27. [27]

    A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic Differentiation in Machine Learning: A Survey, Journal of Machine Learning Research 18 (153) (2018) 1–43

  28. [28]

    S. A. Cook, C. Dwork, R. Reischuk, Upper and Lower Time Bounds for Parallel Random Access Machines Without Simultaneous Writes, SIAM Journal on Computing 15 (1) (1986) 87–97.doi:10.1137/0215006

  29. [29]

    Fortune, J

    S. Fortune, J. Wyllie, Parallelism in Random Access Machines, in: Proceedings of the 10th Annual ACM Symposium on Theory of Computing (STOC), 1978, pp. 114–118.doi:10.1145/800133.804338

  30. [30]

    G. E. Blelloch, Prefix Sums and Their Applications, Tech. Rep. CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (1990)

  31. [31]

    Contributors, CUDA Semantics, accessed: 2026-04-16 (2026)

    P. Contributors, CUDA Semantics, accessed: 2026-04-16 (2026). URLhttps://docs.pytorch.org/docs/stable/notes/cuda.html

  32. [32]

    Contributors, Associative Scan, accessed: 2026-04-16 (2026)

    P. Contributors, Associative Scan, accessed: 2026-04-16 (2026). URLhttps://docs.pytorch.org/docs/2.11/higher_order_ops/associative_scan.html

  33. [33]

    T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 3rd Edition, MIT Press, Cambridge, MA, USA, 2009

  34. [34]

    M. Fey, J. E. Lenssen, SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 869–877. doi:10.1109/CVPR.2018.00097

  35. [35]

    A guide to convolution arithmetic for deep learning

    V . Dumoulin, F. Visin, A Guide to Convolution Arithmetic for Deep Learning, arXiv (2016).doi:10.48550/ arXiv.1603.07285

  36. [36]

    Z. Liu, Z. Ma, K. Zhao, K. Wang, S. Lian, KAConvNet: Kolmogorov–Arnold convolutional networks for vision recognition, Image and Vision Computing 170 (2026). doi:https://doi.org/10.1016/j.imavis.2026. 105983

  37. [37]

    S. Lou, Y . Shao, Q. Du, Kolmogorov-Arnold Optimized UNet: An enhanced image segmentation model based on Kolmogorov-Arnold Network and Convolutional Kolmogorov-Arnold Network, Engineering Applications of Artificial Intelligence 173 (2026).doi:https://doi.org/10.1016/j.engappai.2026.114405

  38. [38]

    Y . Wang, X. Yu, Y . Gao, J. Sha, J. Wang, S. Yan, K. Qin, Y . Zhang, L. Gao, Spectralkan: Weighted activation distribution kolmogorov–arnold network for hyperspectral image change detection, Pattern Recognition 175 (2026) 113042.doi:https://doi.org/10.1016/j.patcog.2026.113042

  39. [39]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, in: International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105

  40. [40]

    J. L. Ba, J. R. Kiros, G. E. Hinton, Layer Normalization (2016).doi:10.48550/arXiv.1607.06450

  41. [41]

    Srivastava, G

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (56) (2014) 1929–1958

  42. [42]

    Loshchilov, F

    I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, in: International Conference on Learning Representations (ICLR), 2019

  43. [43]

    The Elements of Statistical Learning

    T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer, New York, 2009.doi:10.1007/978-0-387-84858-7. 19