arxiv: 2604.22034 · v1 · submitted 2026-04-23 · 💻 cs.LG · cs.CV· cs.NE

Recognition: unknown

LTBs-KAN: Linear-Time B-splines Kolmogorov-Arnold Networks

Eduardo Said Merin-Martinez , Andres Mendez-Vazquez , Eduardo Rodriguez-Tello

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:21 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.NE

keywords Kolmogorov-Arnold networksB-splineslinear time complexityparameter reductionmatrix factorizationneural network efficiencyimage classification

0 comments

The pith

Linear-time B-spline evaluation combined with matrix factorization makes Kolmogorov-Arnold Networks faster and smaller while preserving accuracy on image tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Kolmogorov-Arnold Network variant that replaces recursive B-spline calculations with a direct method whose cost grows only linearly with input size. It further applies product-of-sums factorization to the forward pass, cutting the number of trainable weights. On MNIST, Fashion-MNIST, and CIFAR-10 the resulting models reach comparable classification accuracy to earlier KAN versions yet run faster and require fewer parameters. A sympathetic reader cares because KANs were designed to offer greater interpretability than ordinary neural nets, but their slow spline evaluations have limited real-world use. If the approach holds, KAN layers could become routine components in efficient, explainable models.

Core claim

LTBs-KAN computes B-spline basis functions through a non-recursive linear-time procedure and factors the layer computation as a product of sums. These two changes together produce linear complexity in the spline evaluations and a reduced parameter count. The networks match the performance of prior KAN implementations on standard image-classification benchmarks.

What carries the argument

The non-recursive linear-time B-spline evaluator paired with product-of-sums matrix factorization applied inside each network layer.

If this is right

Training and inference cost scales linearly rather than with higher-order terms in the spline degree.
Total model parameters decrease while classification accuracy remains comparable on the tested datasets.
The new layers can be substituted into existing KAN-based architectures for immediate efficiency gains.
The speed and size benefits appear consistently across grayscale digit, fashion-item, and color-image tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same linearization and factorization pattern could be applied to other basis-function networks beyond KANs.
Faster KAN layers might now support deeper stacks or larger input dimensions that were previously impractical.
Hardware kernels optimized for the new direct evaluation order could further amplify the reported speedups.
If the factorization generalizes, similar parameter-sharing tricks may reduce memory use in related spline or polynomial models.

Load-bearing premise

The direct linear-time B-spline method and the factorization together keep exactly the same functional expressivity and numerical stability as the original recursive KAN formulation.

What would settle it

A side-by-side run of the original KAN and LTBs-KAN on CIFAR-10 that shows either higher wall-clock training time per epoch or lower test accuracy for the new method.

Figures

Figures reproduced from arXiv: 2604.22034 by Andres Mendez-Vazquez, Eduardo Rodriguez-Tello, Eduardo Said Merin-Martinez.

**Figure 2.** Figure 2: Architecture of the Conv2D. Arquitecture Conv2D in LTBs-KAN, generalized convolution replaces the [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: LTBs-KAN-ConvNet architecture used in this experiment. The Max Pooling layers are done after every [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Test loss values during a training run of the different models after 15 epochs over MNIST dataset. The [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Test loss values during a training run of the different models over [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Loss during a training run of 20 epochs over the CIFAR-10 dataset for MLP, AlexNet, [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

Kolmogorov-Arnold Networks (KANs) are a recent neural network architecture offering an alternative to Multilayer Perceptrons (MLPs) with improved explainability and expressibility. However, KANs are significantly slower than MLPs due to the recursive nature of B-spline function computations, limiting their application. This work addresses these issues by proposing a novel base-spline Linear-Time B-splines Kolmogorov-Arnold Network (LTBs-KAN) with linear complexity. Unlike previous methods that rely on the Boor-Mansfield-Cox spline algorithm or other computationally intensive mathematical functions, our approach significantly reduces the computational burden. Additionally, we further reduce model's parameter through product-of-sums matrix factorization in the forward pass without sacrificing performance. Experiments on MNIST, Fashion-MNIST and CIFAR-10 demonstrate that LTBs-KAN achieves good time complexity and parameter reduction, when used as building architectural blocks, compared to other KAN implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a non-recursive linear-time B-spline method plus factorization for KANs, but the abstract supplies no derivation, proof of equivalence, or numbers, leaving the central claims uncheckable.

read the letter

The main new pieces are a base-spline linear-time B-spline evaluator meant to replace the recursive Cox-de Boor method and a product-of-sums matrix factorization that cuts parameters in the forward pass. Both target the known speed and size problems with standard KANs while keeping the rest of the architecture the same. That framing is reasonable and the choice of MNIST, Fashion-MNIST, and CIFAR-10 for testing is the right minimal bar for checking whether accuracy survives the changes. The work also correctly identifies recursion as the dominant cost in the original KAN formulation. The soft spot is that nothing in the abstract shows the new evaluator actually computes the identical univariate functions as the recursive version for arbitrary knots and inputs. Without that identity or at least a numerical equivalence check, the network is no longer guaranteed to be a KAN, and the claim of unchanged performance cannot be taken at face value. The factorization's impact on the span of representable functions is also unanalyzed. No timing tables, parameter counts, accuracy deltas, or complexity proof appear either, so the linear-time and size-reduction assertions stay unverified. This is for people already experimenting with KAN variants who need faster spline layers. A reader focused on practical efficiency tweaks in that narrow area could get something out of the full details if they hold up. I would send it to peer review because the targeted fix is concrete enough for referees to test the equivalence and benchmarks directly, even though the current version needs the missing math and data before it can be evaluated properly.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes LTBs-KAN, a variant of Kolmogorov-Arnold Networks that replaces the recursive Boor-Mansfield-Cox B-spline evaluation with a novel non-recursive linear-time base-spline method and applies a product-of-sums matrix factorization during the forward pass to reduce parameter count. The central claims are that these modifications achieve linear complexity, preserve performance, and yield competitive results on MNIST, Fashion-MNIST, and CIFAR-10 when used as architectural blocks.

Significance. If the new B-spline procedure is shown to be exactly equivalent to the standard recursive formulation and the factorization is proven not to restrict the representable function space, the work would meaningfully address the primary practical barrier to KAN adoption by delivering substantial speed and memory gains without loss of expressivity or interpretability advantages over MLPs.

major comments (3)

[§3.1] §3.1 (Linear-Time B-spline Method): No derivation, identity, or numerical equivalence check is supplied showing that the proposed non-recursive evaluation computes identically the same univariate B-spline basis functions as the Cox-de Boor recursion for arbitrary knot vectors and inputs; without this, the claim that expressivity and learned functions remain unchanged is unsupported and load-bearing for the performance-parity assertion.
[§4.2] §4.2 (Product-of-Sums Factorization): The matrix factorization is introduced to reduce parameters, yet the manuscript provides neither an analysis of the altered span of representable functions nor a bound on the induced approximation error relative to the original KAN layer; this directly affects whether the “without sacrificing performance” guarantee holds.
[Table 2] Table 2 (Timing and Accuracy Results): Reported wall-clock times and accuracies on CIFAR-10 lack direct head-to-head comparison against a standard KAN implementation using identical architecture and the same B-spline order, and no variance estimates or statistical tests are given, preventing verification that performance is truly preserved under the claimed linear-time regime.

minor comments (2)

[Abstract] The abstract states “linear complexity” and “good time complexity” without supplying the formal big-O analysis or pseudocode that appears later in §3; a brief forward reference would improve readability.
[§3] Notation for the new base-spline functions is introduced without an explicit comparison table to the standard B-spline notation used in the original KAN paper, which may confuse readers familiar with the recursive definition.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we will implement to strengthen the paper.

read point-by-point responses

Referee: §3.1 (Linear-Time B-spline Method): No derivation, identity, or numerical equivalence check is supplied showing that the proposed non-recursive evaluation computes identically the same univariate B-spline basis functions as the Cox-de Boor recursion for arbitrary knot vectors and inputs; without this, the claim that expressivity and learned functions remain unchanged is unsupported and load-bearing for the performance-parity assertion.

Authors: We agree that providing a formal equivalence proof is essential. In the revised version, we will add a detailed derivation demonstrating that our non-recursive base-spline evaluation yields exactly the same results as the Cox-de Boor recursion for any knot vector and input. We will also include numerical experiments verifying this equivalence across a range of configurations to support the claim of preserved expressivity. revision: yes
Referee: §4.2 (Product-of-Sums Factorization): The matrix factorization is introduced to reduce parameters, yet the manuscript provides neither an analysis of the altered span of representable functions nor a bound on the induced approximation error relative to the original KAN layer; this directly affects whether the “without sacrificing performance” guarantee holds.

Authors: We recognize the need for a theoretical analysis of the factorization's impact. The revised manuscript will include a section analyzing the representable functions under the product-of-sums approach, showing that it does not restrict the function space beyond what is already achievable in KAN layers, along with any necessary error bounds. This will better justify the performance claims. revision: yes
Referee: Table 2 (Timing and Accuracy Results): Reported wall-clock times and accuracies on CIFAR-10 lack direct head-to-head comparison against a standard KAN implementation using identical architecture and the same B-spline order, and no variance estimates or statistical tests are given, preventing verification that performance is truly preserved under the claimed linear-time regime.

Authors: We will revise Table 2 to incorporate head-to-head comparisons with standard KANs using matching architectures and B-spline orders. Furthermore, we will add variance estimates based on repeated experiments and conduct appropriate statistical tests to confirm that the observed performance is statistically equivalent. revision: yes

Circularity Check

0 steps flagged

No circularity: novel algorithm and empirical validation are independent of inputs

full rationale

The paper introduces a new non-recursive linear-time B-spline evaluation and a product-of-sums factorization, then reports wall-clock and accuracy results on MNIST/Fashion-MNIST/CIFAR-10. No equation is defined in terms of its own output, no fitted parameter is relabeled as a prediction, and no load-bearing premise reduces to a self-citation chain. The equivalence to classical B-splines is asserted by construction of the new method rather than derived from prior results inside the paper; therefore the derivation chain does not collapse to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the existence of a non-recursive linear-time B-spline algorithm that maintains accuracy; this is introduced as an ad-hoc innovation in the abstract.

axioms (1)

ad hoc to paper B-splines admit an efficient non-recursive linear-time evaluation procedure that preserves the properties required by KANs
This is the core technical assumption enabling the linear complexity claim.

pith-pipeline@v0.9.0 · 5474 in / 1237 out tokens · 56275 ms · 2026-05-09T22:21:18.981961+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources
cs.CV 2026-05 unverdicted novelty 6.0

SRGAN-CKAN integrates convolutional Kolmogorov-Arnold networks into an adversarial super-resolution pipeline, replacing linear convolutions with spline-based nonlinear patch operators to improve perceptual quality und...
SRGAN-CKAN: Expressive Super-Resolution with Nonlinear Functional Operators under Minimal Resources
cs.CV 2026-05 unverdicted novelty 5.0

SRGAN-CKAN integrates convolutional Kolmogorov-Arnold networks into an adversarial super-resolution pipeline, replacing linear convolutions with nonlinear functional operators to improve perceptual quality under const...

Reference graph

Works this paper leans on

43 extracted references · 23 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Z. Liu, Y . Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljacic, T. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks, in: International Conference on Learning Representations (ICLR), 2025, accessed: 2026-04-17. URLhttps://github.com/KindXiaoming/pykan 17 APREPRINT- APRIL27, 2026

2025
[2]

de Boor, A Practical Guide to Splines, V ol

C. de Boor, A Practical Guide to Splines, V ol. 27 of Applied Mathematical Sciences, Springer, New York, 1978

1978
[3]

Prkan: Parameter-reduced kolmogorov-arnold networks,

H.-T. Ta, D.-Q. Thai, A. Tran, G. Sidorov, A. Gelbukh, PRKAN: Parameter-Reduced Kolmogorov–Arnold Networks, accessed: 2026-04-17 (2025).doi:10.48550/arXiv.2501.07032. URLhttps://github.com/hoangthangta/BSRBF_KAN

work page doi:10.48550/arxiv.2501.07032 2026
[4]

Delis, FasterKAN: Efficient Implementation of Kolmogorov–Arnold Networks, accessed: 2026-04-17 (2024)

A. Delis, FasterKAN: Efficient Implementation of Kolmogorov–Arnold Networks, accessed: 2026-04-17 (2024). URLhttps://github.com/AthanasiosDelis/faster-kan/

2026
[5]

M. J. Gottlieb, Concerning some polynomials orthogonal on a finite or enumerable set of points, American Journal of Mathematics 60 (2) (1938) 453–458.doi:10.2307/2371307

work page doi:10.2307/2371307 1938
[6]

URLhttps://github.com/Blealtan/efficient-kan

Blealtan, An Efficient Implementation of Kolmogorov–Arnold Network (KAN), accessed: 2026-04-16 (2024). URLhttps://github.com/Blealtan/efficient-kan

2026
[7]

K. He, X. Zhang, S. Ren, J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 1026–1034.doi:10.1109/ICCV.2015.123

work page doi:10.1109/iccv.2015.123 2015
[8]

1998 , month = nov, journal =

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner, Gradient-Based Learning Applied to Document Recognition, Tech. Rep. 11, Proceedings of the IEEE (1998).doi:10.1109/5.726791

work page doi:10.1109/5.726791 1998
[9]

Li, Fastkan: A fast kolmogorov–arnold network, accessed: 2026-04-17 (2024)

Z. Li, Fastkan: A fast kolmogorov–arnold network, accessed: 2026-04-17 (2024). URLhttps://github.com/ZiyaoLi/fast-kan

2026
[10]

Learning- Based Link Anomaly Detection in Continuous-Time Dynamic Graphs,

Z. Li, Kolmogorov–Arnold Networks are Radial Basis Function Networks (2024). doi:10.48550/arXiv.2405. 06721

work page doi:10.48550/arxiv.2405 2024
[11]

S. T. Seydi, Exploring the Potential of Polynomial Basis Functions in Kolmogorov–Arnold Networks: A Compar- ative Study of Different Groups of Polynomials, accessed: 2026-04-16 (2024). doi:10.48550/arXiv.2406. 02583. URLhttps://github.com/seydi1370/Basis_Functions

work page doi:10.48550/arxiv.2406 2026
[12]

H. Xiao, K. Rasul, R. V ollgraf, Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms (2017).doi:10.48550/arXiv.1708.07747

work page internal anchor Pith review doi:10.48550/arxiv.1708.07747 2017
[13]

Krizhevsky, V

A. Krizhevsky, V . Nair, G. Hinton, The CIFAR-10 Dataset, accessed: 2026-04-16 (2009). URLhttps://www.cs.toronto.edu/~kriz/cifar.html

2026
[14]

LeCun, Y

Y . LeCun, Y . Bengio, G. Hinton, Deep Learning, Nature 521 (7553) (2015) 436–444. doi:10.1038/ nature14539

2015
[15]

Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994

S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994

1994
[16]

, title =

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2 (4) (1989) 303–314.doi:10.1007/BF02551274

work page doi:10.1007/bf02551274 1989
[17]

Society for Industrial and Applied Mathematics, 2 edition, 2008

A. Griewank, A. Walther, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2nd Edition, SIAM, Philadelphia, PA, 2008.doi:10.1137/1.9780898717761

work page doi:10.1137/1.9780898717761 2008
[18]

A. N. Kolmogorov, On the Representation of Continuous Functions of Several Variables by Superpositions of Continuous Functions of One Variable and Addition, Doklady Akademii Nauk SSSR 114 (1957) 953–956

1957
[19]

Schmidt-Hieber, The Kolmogorov–Arnold Representation Theorem Revisited, Neural Networks 137 (2021) 119–126.doi:10.1016/j.neunet.2021.01.020

J. Schmidt-Hieber, The Kolmogorov–Arnold Representation Theorem Revisited, Neural Networks 137 (2021) 119–126.doi:10.1016/j.neunet.2021.01.020

work page doi:10.1016/j.neunet.2021.01.020 2021
[20]

D. A. Sprecher, S. Draghici, Space-Filling Curves and Kolmogorov Superposition-Based Neural Networks, Neural Networks 15 (1) (2002) 57–67.doi:10.1016/S0893-6080(01)00119-9

work page doi:10.1016/s0893-6080(01)00119-9 2002
[21]

Glorot, Y

X. Glorot, Y . Bengio, Understanding the Difficulty of Training Deep Feedforward Neural Networks, in: Proceed- ings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), V ol. 9 of Proceedings of Machine Learning Research, 2010, pp. 249–256

2010
[22]

Piegl, W

L. Piegl, W. Tiller, The NURBS Book, 2nd Edition, Springer, Berlin, 1997. doi:10.1007/ 978-3-642-59223-2

1997
[23]

Chudy, P

F. Chudy, P. Wo´ zny, Linear-time algorithm for computing the bernstein–bézier coefficients of b-spline basis functions, Computer-Aided Des. 154 (2023) 103434.doi:10.1016/j.cad.2022.103434

work page doi:10.1016/j.cad.2022.103434 2023
[24]

de Boor, On calculating with b-splines, Journal of Approximation Theory 6 (1972) 50–62

C. de Boor, On calculating with b-splines, Journal of Approximation Theory 6 (1972) 50–62. doi:10.1016/ 0021-9045(72)90080-9. 18 APREPRINT- APRIL27, 2026

1972
[25]

Boehm, Wolfgang and Müller, Andreas, On de Casteljau’s Algorithm, Computer Aided Geometric Design 16 (1999) 587–605.doi:10.1016/S0167-8396(99)00023-0

work page doi:10.1016/s0167-8396(99)00023-0 1999
[26]

C. W. Wu, ProdSumNet: Reducing Model Parameters in Deep Neural Networks via Product-of-Sums Matrix Decompositions, accessed: 2026-04-17 (2019).doi:10.48550/arXiv.1809.02209

work page doi:10.48550/arxiv.1809.02209 2026
[27]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic Differentiation in Machine Learning: A Survey, Journal of Machine Learning Research 18 (153) (2018) 1–43

2018
[28]

S. A. Cook, C. Dwork, R. Reischuk, Upper and Lower Time Bounds for Parallel Random Access Machines Without Simultaneous Writes, SIAM Journal on Computing 15 (1) (1986) 87–97.doi:10.1137/0215006

work page doi:10.1137/0215006 1986
[29]

Fortune, J

S. Fortune, J. Wyllie, Parallelism in Random Access Machines, in: Proceedings of the 10th Annual ACM Symposium on Theory of Computing (STOC), 1978, pp. 114–118.doi:10.1145/800133.804338

work page doi:10.1145/800133.804338 1978
[30]

G. E. Blelloch, Prefix Sums and Their Applications, Tech. Rep. CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (1990)

1990
[31]

Contributors, CUDA Semantics, accessed: 2026-04-16 (2026)

P. Contributors, CUDA Semantics, accessed: 2026-04-16 (2026). URLhttps://docs.pytorch.org/docs/stable/notes/cuda.html

2026
[32]

Contributors, Associative Scan, accessed: 2026-04-16 (2026)

P. Contributors, Associative Scan, accessed: 2026-04-16 (2026). URLhttps://docs.pytorch.org/docs/2.11/higher_order_ops/associative_scan.html

2026
[33]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 3rd Edition, MIT Press, Cambridge, MA, USA, 2009

2009
[34]

M. Fey, J. E. Lenssen, SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 869–877. doi:10.1109/CVPR.2018.00097

work page doi:10.1109/cvpr.2018.00097 2018
[35]

A guide to convolution arithmetic for deep learning

V . Dumoulin, F. Visin, A Guide to Convolution Arithmetic for Deep Learning, arXiv (2016).doi:10.48550/ arXiv.1603.07285

work page Pith review arXiv 2016
[36]

Z. Liu, Z. Ma, K. Zhao, K. Wang, S. Lian, KAConvNet: Kolmogorov–Arnold convolutional networks for vision recognition, Image and Vision Computing 170 (2026). doi:https://doi.org/10.1016/j.imavis.2026. 105983

work page doi:10.1016/j.imavis.2026 2026
[37]

S. Lou, Y . Shao, Q. Du, Kolmogorov-Arnold Optimized UNet: An enhanced image segmentation model based on Kolmogorov-Arnold Network and Convolutional Kolmogorov-Arnold Network, Engineering Applications of Artificial Intelligence 173 (2026).doi:https://doi.org/10.1016/j.engappai.2026.114405

work page doi:10.1016/j.engappai.2026.114405 2026
[38]

Y . Wang, X. Yu, Y . Gao, J. Sha, J. Wang, S. Yan, K. Qin, Y . Zhang, L. Gao, Spectralkan: Weighted activation distribution kolmogorov–arnold network for hyperspectral image change detection, Pattern Recognition 175 (2026) 113042.doi:https://doi.org/10.1016/j.patcog.2026.113042

work page doi:10.1016/j.patcog.2026.113042 2026
[39]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, in: International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105

2012
[40]

J. L. Ba, J. R. Kiros, G. E. Hinton, Layer Normalization (2016).doi:10.48550/arXiv.1607.06450

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.06450 2016
[41]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (56) (2014) 1929–1958

2014
[42]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, in: International Conference on Learning Representations (ICLR), 2019

2019
[43]

The Elements of Statistical Learning

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer, New York, 2009.doi:10.1007/978-0-387-84858-7. 19

work page doi:10.1007/978-0-387-84858-7 2009