pith. machine review for the scientific record. sign in

arxiv: 2404.19756 · v5 · submitted 2024-04-30 · 💻 cs.LG · cond-mat.dis-nn· cs.AI· stat.ML

Recognition: 2 theorem links

· Lean Theorem

KAN: Kolmogorov-Arnold Networks

Fabian Ruehle, James Halverson, Marin Solja\v{c}i\'c, Max Tegmark, Sachin Vaidya, Thomas Y. Hou, Yixuan Wang, Ziming Liu

Pith reviewed 2026-05-11 23:35 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.dis-nncs.AIstat.ML
keywords Kolmogorov-Arnold NetworksKANMLPsspline parametrizationneural scaling lawsinterpretabilityPDE solvingfunction approximation
0
0 comments X

The pith

Kolmogorov-Arnold Networks use learnable spline functions on edges to achieve better accuracy than MLPs with smaller sizes and faster scaling laws.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Kolmogorov-Arnold Networks, or KANs, as an alternative to standard Multi-Layer Perceptrons. In KANs, activation functions are placed on the edges as learnable univariate splines instead of being fixed on the nodes. This design allows much smaller KANs to match or surpass the accuracy of larger MLPs in fitting data and solving partial differential equations. The networks also scale more efficiently and offer improved interpretability, making them useful for scientists to discover mathematical and physical laws through visualization and interaction.

Core claim

Inspired by the Kolmogorov-Arnold representation theorem, KANs replace the fixed activation functions on nodes in MLPs with learnable univariate functions parametrized as splines on the edges, with no linear weights at all. This change results in KANs that outperform MLPs in accuracy for data fitting and PDE solving, exhibit faster neural scaling laws, and provide intuitive visualizations that help in rediscovering scientific laws.

What carries the argument

Learnable univariate spline functions on edges that replace all linear weight parameters and enable function approximation per the Kolmogorov-Arnold theorem.

Load-bearing premise

That the univariate spline parametrizations can be trained stably at scale without introducing excessive hyperparameters or overfitting that erases the accuracy and scaling advantages.

What would settle it

Demonstrating that on standard benchmarks, KANs require more parameters or training time than MLPs to reach the same accuracy, or that their scaling exponents are not faster.

read the original abstract

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Kolmogorov-Arnold Networks (KANs) as alternatives to MLPs, replacing fixed node activations and linear weights with learnable univariate B-spline functions on edges. It claims that substantially smaller KANs achieve comparable or superior accuracy to larger MLPs on data-fitting and PDE-solving tasks, that KANs exhibit faster neural scaling laws, and that their edge-wise functions enable superior interpretability and human-AI collaboration for rediscovering mathematical and physical laws.

Significance. If the reported accuracy and scaling advantages hold under equivalent hyperparameter budgets and at larger scales, KANs could provide a theoretically grounded alternative architecture for scientific machine learning, reducing parameter counts while improving both performance and interpretability. The manuscript supplies concrete support via multiple experimental runs, held-out scaling plots, and two illustrative discovery examples; these empirical elements are strengths that would be diminished only if the spline-specific hyperparameters prove unstable or require disproportionate tuning.

major comments (2)
  1. [§4.1 and Table 1] §4.1 and Table 1: The central claim that 'much smaller KANs' outperform 'much larger MLPs' on fitting tasks rests on reported test errors for specific spline grid sizes (G=5) and orders (k=3) together with L1 regularization on coefficients; without an ablation or sensitivity analysis over G, k, and regularization strength under a fixed tuning budget, it is unclear whether the accuracy advantage survives when MLPs receive comparable hyperparameter effort.
  2. [§5 and Figure 7] §5 and Figure 7: The faster neural scaling laws are demonstrated by plotting test error against layer width or depth; because each KAN edge carries G+k trainable spline coefficients (plus possible grid-update steps), the x-axis must be total trainable parameters rather than architectural width for the 'parameter-efficient' interpretation of the scaling advantage to be load-bearing.
minor comments (2)
  1. [§3.2] §3.2: The periodic grid-extension procedure and the exact form of the basis functions after extension are described only briefly; a short pseudocode block or explicit knot-vector update rule would improve reproducibility.
  2. [Figure 4] Figure 4 (PDE example): The color scale and axis limits on the residual plots are not stated, making quantitative comparison of KAN versus MLP residuals difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where the concerns are valid and can be addressed with additional analysis.

read point-by-point responses
  1. Referee: [§4.1 and Table 1] §4.1 and Table 1: The central claim that 'much smaller KANs' outperform 'much larger MLPs' on fitting tasks rests on reported test errors for specific spline grid sizes (G=5) and orders (k=3) together with L1 regularization on coefficients; without an ablation or sensitivity analysis over G, k, and regularization strength under a fixed tuning budget, it is unclear whether the accuracy advantage survives when MLPs receive comparable hyperparameter effort.

    Authors: We acknowledge that the primary reported results use G=5, k=3, and L1 regularization on the spline coefficients. While MLP baselines were tuned across widths, depths, and optimization hyperparameters, we did not conduct an exhaustive joint ablation of KAN spline hyperparameters under an identical total tuning budget. To address this directly, we will add a new sensitivity analysis subsection in the revised manuscript. This will vary G, k, and regularization strength for KANs while reporting MLP performance under matched computational effort for hyperparameter search, thereby demonstrating that the accuracy advantages are robust to reasonable choices of these parameters. revision: yes

  2. Referee: [§5 and Figure 7] §5 and Figure 7: The faster neural scaling laws are demonstrated by plotting test error against layer width or depth; because each KAN edge carries G+k trainable spline coefficients (plus possible grid-update steps), the x-axis must be total trainable parameters rather than architectural width for the 'parameter-efficient' interpretation of the scaling advantage to be load-bearing.

    Authors: We agree that a parameter-efficient interpretation of the scaling advantage is best supported by plotting against total trainable parameters rather than architectural width or depth alone. The original Figure 7 and associated text emphasized scaling with respect to network dimensions, following conventions in the neural scaling literature. In the revision we will augment the figure with additional curves that explicitly plot test error versus total parameter count for both KANs and MLPs, and we will update the text to qualify the scaling claims accordingly. This change will make the parameter-efficiency argument load-bearing while preserving the architectural scaling observations. revision: partial

Circularity Check

0 steps flagged

No significant circularity in KAN architecture or claims

full rationale

The KAN architecture is defined directly from the external Kolmogorov-Arnold representation theorem by placing learnable univariate spline functions on edges instead of fixed activations on nodes. Accuracy comparisons, scaling-law observations, and interpretability demonstrations are obtained from separate empirical evaluations on held-out data and PDE tasks; these measurements do not reduce to the training loss or to any self-referential definition. No load-bearing step relies on self-citation, uniqueness theorems imported from the authors, or renaming of known results. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work rests on the Kolmogorov-Arnold theorem as the existence proof for representing multivariate functions via univariate ones, plus standard spline approximation theory. No new physical entities are postulated. Hyperparameters such as grid size and spline order are chosen by the user rather than fitted to the target claim.

free parameters (2)
  • spline grid size
    Controls the number of intervals in each univariate spline; chosen per experiment rather than learned from data.
  • spline order
    Polynomial degree of the basis functions inside each spline; set by the user.
axioms (1)
  • standard math Kolmogorov-Arnold representation theorem
    Invoked in the introduction to justify that multivariate continuous functions can be expressed as finite sums and compositions of univariate continuous functions.

pith-pipeline@v0.9.0 · 5532 in / 1338 out tokens · 63114 ms · 2026-05-11T23:35:37.628342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 39 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling

    cs.LG 2026-05 unverdicted novelty 8.0

    New lower bounds establish that Deep Sets need embedding dimension linear in the number of points (up to constants) for d>1, and give the first non-trivial bounds for higher-order Janossy pooling.

  2. KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks

    cs.LG 2026-05 conditional novelty 7.0

    KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.

  3. Bridging Spectral Operator Learning and U-Net Hierarchies: SpectraNet for Stable Autoregressive PDE Surrogates

    cs.LG 2026-05 unverdicted novelty 7.0

    SpectraNet delivers stable autoregressive PDE rollouts with lower error and 2.3x fewer parameters than FNO by embedding spectral convolutions in a U-Net and training a residual-target block under semigroup-consistency loss.

  4. Variable decoupling and the Kolmogorov Superposition Theorem for rational functions

    math.NA 2026-05 unverdicted novelty 7.0

    For rational multivariate functions, the Kolmogorov Superposition Theorem allows variable decoupling by inspection with no computation using the Loewner Framework.

  5. TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations

    cs.LG 2026-05 unverdicted novelty 7.0

    TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.

  6. KANs need curvature: penalties for compositional smoothness

    cs.LG 2026-05 unverdicted novelty 7.0

    A curvature penalty for KANs, derived to respect compositional effects and equipped with a proven upper bound on full-model curvature, produces smoother activations while preserving accuracy.

  7. Layer-wise Lipschitz-Product Control for Deep Kolmogorov--Arnold Network Representations of Compositionally Structured Functions

    cs.LG 2026-04 unverdicted novelty 7.0

    Compositionally sparse functions given by finite computation trees admit deep KAN representations with dimension-independent layer-wise Lipschitz product bounds P(KAN) <= max(C*,1)^L_f where L_f scales linearly with t...

  8. Neural Enhancement of Analytical Appearance Models

    cs.GR 2026-04 unverdicted novelty 7.0

    Neural enhancement replaces selected computational nodes in analytical BRDF models with MLPs identified via hypercube search, yielding accurate, compact models that fit measured reflectance data better than pure analy...

  9. Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

    cs.LG 2026-04 unverdicted novelty 7.0

    Deep KANs with edge functions restricted to affine maps plus one fixed non-affine continuous function σ are dense in C(K) for any compact K if and only if σ is non-affine.

  10. Physics informed operator learning of parameter dependent spectra

    gr-qc 2026-04 unverdicted novelty 7.0

    DeepOPiraKAN learns parameter-to-spectrum mappings via operator learning and achieves relative errors of O(10^{-6}) to O(10^{-4}) for Kerr black hole quasinormal modes up to n=7 when benchmarked against Leaver's method.

  11. KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition

    cs.CV 2026-04 unverdicted novelty 7.0

    KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.

  12. From Zero to Detail: A Progressive Spectral Decoupling Paradigm for UHD Image Restoration with New Benchmark

    cs.CV 2026-04 unverdicted novelty 7.0

    A new framework called ERR decomposes UHD image restoration into three frequency stages with specialized sub-networks and introduces the LSUHDIR benchmark dataset of over 82,000 images.

  13. G-PARC: Graph-Physics Aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics on Unstructured Meshes

    cs.LG 2026-04 unverdicted novelty 7.0

    G-PARC embeds analytically computed differential operators via moving least squares on graphs into recurrent networks, achieving higher accuracy with 2-3x fewer parameters than prior graph PADL methods on nonlinear be...

  14. Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling

    cs.LG 2026-04 unverdicted novelty 7.0

    COSINE jointly discovers latent interaction graphs and compact symbolic dynamical equations by using an LLM to iteratively prune and expand the function library based on optimization feedback.

  15. Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps

    cs.AI 2026-04 unverdicted novelty 7.0

    KA-FCM uses B-spline functions on FCM edges, inspired by the Kolmogorov-Arnold theorem, to enable arbitrary non-monotonic causal modeling and outperforms standard FCM while matching MLPs on non-monotonic inference, sy...

  16. Efficient Convexification of Kolmogorov-Arnold Networks with Polynomial Functional Forms Via a Continuous Graham Scan Approach

    math.OC 2026-04 unverdicted novelty 7.0

    A continuous Graham Scan constructs exact convex envelopes of univariate polynomials for strong convex relaxations of polynomial Kolmogorov-Arnold Networks.

  17. WGFINNs: Weak formulation-based GENERIC formalism informed neural networks

    cs.LG 2026-04 unverdicted novelty 7.0

    WGFINNs use weak-form loss functions with GENERIC structure preservation to recover governing equations more accurately from noisy observations than prior strong-form GFINNs.

  18. Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

    cs.LG 2026-05 unverdicted novelty 6.0

    Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...

  19. FreeMOCA: Memory-Free Continual Learning for Malicious Code Analysis

    cs.CR 2026-05 unverdicted novelty 6.0

    FreeMOCA enables memory-free continual learning for malicious code analysis by adaptive layer-wise parameter interpolation between task updates, outperforming baselines on EMBER and AZ malware benchmarks with up to 42...

  20. Sparse Random-Feature Neural Networks with Krylov-Based SVD for Singularly Perturbed ODE

    math.NA 2026-05 unverdicted novelty 6.0

    Sparse RFNNs with sSVD via Lanczos-Golub-Kahan bidiagonalization maintain accuracy while improving efficiency and robustness for 1D steady convection-diffusion equations with strong advection.

  21. Towards Intelligent Low-Altitude Wireless Network Deployment: Differentiable Channel Knowledge Map Construction and Trajectory Design

    eess.SP 2026-05 unverdicted novelty 6.0

    A neural network-based differentiable CKM construction method enables joint power-bandwidth-trajectory optimization for multi-UAV systems, achieving higher minimum throughput than statistical channel models.

  22. Partition-of-Unity Gaussian Kolmogorov-Arnold Networks

    cs.CE 2026-04 unverdicted novelty 6.0

    PU-GKAN applies Shepard normalization to Gaussian bases in KANs, yielding exact constant reproduction, reduced epsilon sensitivity, and better validation accuracy across tested regimes.

  23. Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications

    cs.IT 2026-04 unverdicted novelty 6.0

    The proposed pretraining framework for safe DRL in CF-MIMO resource management doubles initial energy efficiency, achieves 4.7% higher final EE, maintains 1% delay violation rate, and cuts exploration steps by 50% com...

  24. Scale-Parameter Selection in Gaussian Kolmogorov-Arnold Networks

    cs.CE 2026-04 unverdicted novelty 6.0

    A stable operating interval for the Gaussian scale parameter ε in KANs is ε ∈ [1/(G-1), 2/(G-1)], derived from first-layer feature geometry and validated across multiple approximation and physics-informed problems.

  25. ParamBoost: Gradient Boosted Piecewise Cubic Polynomials

    cs.LG 2026-04 unverdicted novelty 6.0

    ParamBoost improves GAMs by fitting piecewise cubic polynomials via gradient boosting and supports constraints for continuity, monotonicity, convexity, and feature interactions.

  26. Unified scaling laws for turbulent boundary layers across flow regimes

    physics.flu-dyn 2026-04 unverdicted novelty 6.0

    Two local dimensionless groups predict wall shear stress and three predict velocity profiles in turbulent boundary layers across pressure gradient regimes using information-theoretic selection of maximal-predictive co...

  27. Small-scale photonic Kolmogorov-Arnold networks using standard telecom nonlinear modules

    physics.optics 2026-04 unverdicted novelty 6.0

    Small photonic KANs using commodity telecom nonlinear modules reach 98.4% accuracy on nonlinear classification with only four modules and remain robust to hardware impairments.

  28. Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

    cs.CE 2026-04 unverdicted novelty 6.0

    Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.

  29. From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning

    cs.LG 2026-04 unverdicted novelty 6.0

    Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family...

  30. Interpretation of Crystal Energy Landscapes with Kolmogorov-Arnold Networks

    cond-mat.dis-nn 2026-04 unverdicted novelty 6.0

    Element-Weighted KANs achieve state-of-the-art accuracy on formation energy, band gap, and work function while revealing periodic-table-aligned chemical trends through their learnable activation functions.

  31. M$^4$-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection

    cs.CV 2026-05 unverdicted novelty 5.0

    M⁴-SAM equips SAM2 with modality-aware MoE-LoRA, gated multi-level fusion, and pseudo-guided initialization to reach state-of-the-art on RGB-D video salient object detection.

  32. PixelFlowCast: Latent-Free Precipitation Nowcasting via Pixel Mean Flows

    cs.CV 2026-05 unverdicted novelty 5.0

    PixelFlowCast delivers high-fidelity precipitation nowcasts from radar sequences using a latent-free Pixel Mean Flows predictor guided by a deterministic coarse stage and KANCondNet features.

  33. KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

    cs.CV 2026-05 unverdicted novelty 5.0

    KANMultiSign generates sign language poses from notation via coarse-to-fine multi-scale supervision and compact KAN-Transformer modules, achieving lower DTW joint error with fewer parameters than baselines on several ...

  34. RoboKA: KAN Informed Multimodal Learning for RoboCall Surveillance System

    cs.MM 2026-04 unverdicted novelty 5.0

    RoboKA is a KAN-based multimodal fusion model that outperforms baselines on a new synthetic dataset for detecting adversarial robocalls via acoustic and linguistic cues.

  35. DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    DepthPilot generates physically consistent and clinically interpretable colonoscopy videos by injecting depth priors into diffusion models through parameter-efficient fine-tuning and replacing linear denoising weights...

  36. Gait Recognition with Temporal Kolmogorov-Arnold Networks

    cs.CV 2026-04 unverdicted novelty 5.0

    A CNN combined with a new Temporal Kolmogorov-Arnold Network using learnable functions and two-level memory achieves strong gait recognition performance on the CASIA-B dataset.

  37. High-Precision Phase-Shift Transferable Neural Networks for High-Frequency Function Approximation and PDE Solution

    math.NA 2026-04 unverdicted novelty 5.0

    Phase-shift transferable neural networks achieve high-precision approximation of high-frequency functions and PDE solutions.

  38. General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations

    cs.LG 2026-04 unverdicted novelty 5.0

    GEN is a neural network that solves PDEs by constructing explicit function approximations from basis functions based on prior PDE knowledge, yielding more robust and extensible solutions than standard PINNs.

  39. Low Light Image Enhancement Challenge at NTIRE 2026

    cs.CV 2026-04 unverdicted novelty 2.0

    NTIRE 2026 challenge report shows progress in low-light image enhancement via 22 submitted networks evaluated on a new dataset.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 39 Pith papers · 9 internal anchors

  1. [1]

    Neural networks: a comprehensive foundation

    Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994

  2. [2]

    Approximation by superpositions of a sigmoidal function

    George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989

  3. [3]

    Multilayer feedforward networks are universal approximators

    Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989

  4. [4]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural informa- tion processing systems, 30, 2017

  5. [5]

    Sparse Autoencoders Find Highly Interpretable Features in Language Models

    Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023

  6. [6]

    Kolmogorov

    A.N. Kolmogorov. On the representation of continuous functions of several variables as superpositions of continuous functions of a smaller number of variables. Dokl. Akad. Nauk, 108(2), 1956

  7. [7]

    On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition

    Andrei Nikolaevich Kolmogorov. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. In Doklady Akademii Nauk, volume 114, pages 953–956. Russian Academy of Sciences, 1957

  8. [8]

    On a constructive proof of kolmogorov’s superposition theorem

    Jürgen Braun and Michael Griebel. On a constructive proof of kolmogorov’s superposition theorem. Constructive approximation, 30:653–675, 2009

  9. [9]

    Space-filling curves and kolmogorov superposition- based neural networks

    David A Sprecher and Sorin Draghici. Space-filling curves and kolmogorov superposition- based neural networks. Neural Networks, 15(1):57–67, 2002

  10. [10]

    On the training of a kolmogorov network

    Mario Köppen. On the training of a kolmogorov network. In Artificial Neural Net- works—ICANN 2002: International Conference Madrid, Spain, August 28–30, 2002 Pro- ceedings 12, pages 474–479. Springer, 2002

  11. [11]

    On the realization of a kolmogorov network

    Ji-Nan Lin and Rolf Unbehauen. On the realization of a kolmogorov network. Neural Com- putation, 5(1):18–20, 1993

  12. [12]

    The kolmogorov superposition theorem can break the curse of dimensionality when approximating high dimensional functions

    Ming-Jun Lai and Zhaiming Shen. The kolmogorov superposition theorem can break the curse of dimensionality when approximating high dimensional functions. arXiv preprint arXiv:2112.09963, 2021

  13. [13]

    The kolmogorov spline network for image processing

    Pierre-Emmanuel Leni, Yohan D Fougerolle, and Frédéric Truchetet. The kolmogorov spline network for image processing. In Image Processing: Concepts, Methodologies, Tools, and Applications, pages 54–78. IGI Global, 2013

  14. [14]

    Exsplinet: An interpretable and expressive spline-based neural network

    Daniele Fakhoury, Emanuele Fakhoury, and Hendrik Speleers. Exsplinet: An interpretable and expressive spline-based neural network. Neural Networks, 152:332–346, 2022

  15. [15]

    Error bounds for deep relu networks using the kolmogorov–arnold superposition theorem

    Hadrien Montanelli and Haizhao Yang. Error bounds for deep relu networks using the kolmogorov–arnold superposition theorem. Neural Networks, 129:1–6, 2020

  16. [16]

    On the optimal expressive power of relu dnns and its application in approximation with kolmogorov superposition theorem

    Juncai He. On the optimal expressive power of relu dnns and its application in approximation with kolmogorov superposition theorem. arXiv preprint arXiv:2308.05509, 2023. 34

  17. [17]

    Relu deep neural networks and linear finite elements

    Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. Relu deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973, 2018

  18. [18]

    Deep neural networks and finite elements of any order on arbitrary dimensions

    Juncai He and Jinchao Xu. Deep neural networks and finite elements of any order on arbitrary dimensions. arXiv preprint arXiv:2312.14276, 2023

  19. [19]

    Theoretical issues in deep networks

    Tomaso Poggio, Andrzej Banburski, and Qianli Liao. Theoretical issues in deep networks. Proceedings of the National Academy of Sciences, 117(48):30039–30045, 2020

  20. [20]

    Representation properties of networks: Kolmogorov’s theorem is irrelevant

    Federico Girosi and Tomaso Poggio. Representation properties of networks: Kolmogorov’s theorem is irrelevant. Neural Computation, 1(4):465–469, 1989

  21. [21]

    Why does deep and cheap learning work so well? Journal of Statistical Physics, 168:1223–1247, 2017

    Henry W Lin, Max Tegmark, and David Rolnick. Why does deep and cheap learning work so well? Journal of Statistical Physics, 168:1223–1247, 2017

  22. [22]

    Nonlinear material design using principal stretches

    Hongyi Xu, Funshing Sin, Yufeng Zhu, and Jernej Barbi ˇc. Nonlinear material design using principal stretches. ACM Transactions on Graphics (TOG), 34(4):1–11, 2015

  23. [23]

    A practical guide to splines, volume 27

    Carl De Boor. A practical guide to splines, volume 27. springer-verlag New York, 1978

  24. [24]

    A neural scaling law from the dimension of the data manifold, 2020, 2004.10802 http://arxiv.org/abs/2004.10802

    Utkarsh Sharma and Jared Kaplan. A neural scaling law from the dimension of the data manifold. arXiv preprint arXiv:2004.10802, 2020

  25. [25]

    Precision machine learning

    Eric J Michaud, Ziming Liu, and Max Tegmark. Precision machine learning. Entropy, 25(1):175, 2023

  26. [26]

    Rate-optimal estimation for a general class of nonpara- metric regression models with unknown link functions

    Joel L Horowitz and Enno Mammen. Rate-optimal estimation for a general class of nonpara- metric regression models with unknown link functions. 2007

  27. [27]

    On the rate of convergence of fully connected deep neural network regression estimates

    Michael Kohler and Sophie Langer. On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021

  28. [28]

    Nonparametric regression using deep neural networks with relu activation function

    Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. 2020

  29. [29]

    Optimal nonlinear approximation

    Ronald A DeV ore, Ralph Howard, and Charles Micchelli. Optimal nonlinear approximation. Manuscripta mathematica, 63:469–478, 1989

  30. [30]

    Wavelet compression and nonlinear n-widths

    Ronald A DeV ore, George Kyriazis, Dany Leviatan, and Vladimir M Tikhomirov. Wavelet compression and nonlinear n-widths. Adv. Comput. Math., 1(2):197–214, 1993

  31. [31]

    Sharp lower bounds on the manifold widths of sobolev and besov spaces

    Jonathan W Siegel. Sharp lower bounds on the manifold widths of sobolev and besov spaces. arXiv preprint arXiv:2402.04407, 2024

  32. [32]

    Error bounds for approximations with deep relu networks

    Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Net- works, 94:103–114, 2017

  33. [33]

    Nearly-tight vc- dimension and pseudodimension bounds for piecewise linear neural networks

    Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight vc- dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20(63):1–17, 2019

  34. [34]

    Optimal approximation rates for deep relu neural networks on sobolev and besov spaces

    Jonathan W Siegel. Optimal approximation rates for deep relu neural networks on sobolev and besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023

  35. [35]

    Multi-stage neural networks: Function approximator of machine precision

    Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of machine precision. Journal of Computational Physics, page 112865, 2024

  36. [36]

    Ai feynman: A physics-inspired method for sym- bolic regression

    Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for sym- bolic regression. Science Advances, 6(16):eaay2631, 2020. 35

  37. [37]

    Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity

    Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity. Advances in Neural Information Processing Systems, 33:4860–4871, 2020

  38. [38]

    Physics-informed neural net- works: A deep learning framework for solving forward and inverse problems involving non- linear partial differential equations

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural net- works: A deep learning framework for solving forward and inverse problems involving non- linear partial differential equations. Journal of Computational physics, 378:686–707, 2019

  39. [39]

    Physics-informed machine learning

    George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021

  40. [40]

    Measuring catastrophic forgetting in neural networks

    Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  41. [41]

    Brain plasticity and behavior

    Bryan Kolb and Ian Q Whishaw. Brain plasticity and behavior. Annual review of psychology, 49(1):43–64, 1998

  42. [42]

    Modular and hierarchically modular organization of brain networks

    David Meunier, Renaud Lambiotte, and Edward T Bullmore. Modular and hierarchically modular organization of brain networks. Frontiers in neuroscience, 4:7572, 2010

  43. [43]

    Overcoming catastrophic forgetting in neural networks

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

  44. [44]

    Revisiting neural net- works for continual learning: An architectural perspective, 2024

    Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, and Yanan Sun. Revisiting neural net- works for continual learning: An architectural perspective, 2024

  45. [45]

    Advancing mathe- matics by guiding human intuition with ai

    Alex Davies, Petar Veliˇckovi´c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathe- matics by guiding human intuition with ai. Nature, 600(7887):70–74, 2021

  46. [46]

    Searching for rib- bons with machine learning, 2023

    Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for rib- bons with machine learning, 2023

  47. [47]

    Petersen

    P. Petersen. Riemannian Geometry. Graduate Texts in Mathematics. Springer New York, 2006

  48. [48]

    Absence of diffusion in certain random lattices

    Philip W Anderson. Absence of diffusion in certain random lattices. Physical review, 109(5):1492, 1958

  49. [49]

    A relation between the density of states and range of localization for one dimensional random systems

    David J Thouless. A relation between the density of states and range of localization for one dimensional random systems. Journal of Physics C: Solid State Physics, 5(1):77, 1972

  50. [50]

    Scaling theory of localization: Absence of quantum diffusion in two dimensions

    Elihu Abrahams, PW Anderson, DC Licciardello, and TV Ramakrishnan. Scaling theory of localization: Absence of quantum diffusion in two dimensions. Physical Review Letters, 42(10):673, 1979

  51. [51]

    Fifty years of anderson localiza- tion

    Ad Lagendijk, Bart van Tiggelen, and Diederik S Wiersma. Fifty years of anderson localiza- tion. Physics today, 62(8):24–29, 2009

  52. [52]

    Anderson localization of light

    Mordechai Segev, Yaron Silberberg, and Demetrios N Christodoulides. Anderson localization of light. Nature Photonics, 7(3):197–204, 2013

  53. [53]

    Optics of photonic quasicrystals

    Z Valy Vardeny, Ajay Nahata, and Amit Agrawal. Optics of photonic quasicrystals. Nature photonics, 7(3):177–187, 2013. 36

  54. [54]

    Strong localization of photons in certain disordered dielectric superlattices

    Sajeev John. Strong localization of photons in certain disordered dielectric superlattices. Physical review letters, 58(23):2486, 1987

  55. [55]

    Observation of a localization transition in quasiperiodic photonic lattices

    Yoav Lahini, Rami Pugatch, Francesca Pozzi, Marc Sorel, Roberto Morandotti, Nir David- son, and Yaron Silberberg. Observation of a localization transition in quasiperiodic photonic lattices. Physical review letters, 103(1):013901, 2009

  56. [56]

    Reen- trant delocalization transition in one-dimensional photonic quasicrystals

    Sachin Vaidya, Christina Jörg, Kyle Linn, Megan Goh, and Mikael C Rechtsman. Reen- trant delocalization transition in one-dimensional photonic quasicrystals. Physical Review Research, 5(3):033170, 2023

  57. [57]

    Absence of many-body mobility edges

    Wojciech De Roeck, Francois Huveneers, Markus Müller, and Mauro Schiulaz. Absence of many-body mobility edges. Physical Review B, 93(1):014203, 2016

  58. [58]

    Many-body localization and quantum nonergodicity in a model with a single-particle mobility edge

    Xiaopeng Li, Sriram Ganeshan, JH Pixley, and S Das Sarma. Many-body localization and quantum nonergodicity in a model with a single-particle mobility edge. Physical review letters, 115(18):186601, 2015

  59. [59]

    Interactions and mobility edges: Observing the generalized aubry-andré model

    Fangzhao Alex An, Karmela Padavi´c, Eric J Meier, Suraj Hegde, Sriram Ganeshan, JH Pixley, Smitha Vishveshwara, and Bryce Gadway. Interactions and mobility edges: Observing the generalized aubry-andré model. Physical review letters, 126(4):040603, 2021

  60. [60]

    Predicted mobility edges in one-dimensional incommensurate optical lattices: An exactly solvable model of anderson localization

    J Biddle and S Das Sarma. Predicted mobility edges in one-dimensional incommensurate optical lattices: An exactly solvable model of anderson localization. Physical review letters, 104(7):070601, 2010

  61. [61]

    Self-consistent theory of mobility edges in quasiperiodic chains

    Alexander Duthie, Sthitadhi Roy, and David E Logan. Self-consistent theory of mobility edges in quasiperiodic chains. Physical Review B, 103(6):L060201, 2021

  62. [62]

    Nearest neighbor tight binding models with an exact mobility edge in one dimension

    Sriram Ganeshan, JH Pixley, and S Das Sarma. Nearest neighbor tight binding models with an exact mobility edge in one dimension. Physical review letters, 114(14):146601, 2015

  63. [63]

    One-dimensional quasiperiodic mosaic lattice with exact mobility edges

    Yucheng Wang, Xu Xia, Long Zhang, Hepeng Yao, Shu Chen, Jiangong You, Qi Zhou, and Xiong-Jun Liu. One-dimensional quasiperiodic mosaic lattice with exact mobility edges. Physical Review Letters, 125(19):196604, 2020

  64. [64]

    Duality be- tween two generalized aubry-andré models with exact mobility edges

    Yucheng Wang, Xu Xia, Yongjian Wang, Zuohuan Zheng, and Xiong-Jun Liu. Duality be- tween two generalized aubry-andré models with exact mobility edges. Physical Review B , 103(17):174205, 2021

  65. [65]

    Exact new mobility edges between critical and localized states

    Xin-Chi Zhou, Yongjian Wang, Ting-Fung Jeffrey Poon, Qi Zhou, and Xiong-Jun Liu. Exact new mobility edges between critical and localized states. Physical Review Letters , 131(17):176401, 2023

  66. [66]

    How deep sparse networks avoid the curse of dimensionality: Efficiently computable functions are compositionally sparse

    Tomaso Poggio. How deep sparse networks avoid the curse of dimensionality: Efficiently computable functions are compositionally sparse. CBMM Memo, 10:2022, 2022

  67. [67]

    The kolmogorov–arnold representation theorem revisited

    Johannes Schmidt-Hieber. The kolmogorov–arnold representation theorem revisited. Neural networks, 137:119–126, 2021

  68. [68]

    On the kolmogorov neural networks

    Aysu Ismayilova and Vugar E Ismailov. On the kolmogorov neural networks. Neural Net- works, page 106333, 2024

  69. [69]

    A new iterative method for construction of the kolmogorov-arnold representation

    Michael Poluektov and Andrew Polar. A new iterative method for construction of the kolmogorov-arnold representation. arXiv preprint arXiv:2305.08194, 2023. 37

  70. [70]

    Neural additive models: Interpretable machine learning with neural nets

    Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34:4699–4711, 2021

  71. [71]

    Deep sets

    Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdi- nov, and Alexander J Smola. Deep sets. Advances in neural information processing systems, 30, 2017

  72. [72]

    Optimizing kernel machines using deep learning

    Huan Song, Jayaraman J Thiagarajan, Prasanna Sattigeri, and Andreas Spanias. Optimizing kernel machines using deep learning. IEEE transactions on neural networks and learning systems, 29(11):5528–5540, 2018

  73. [73]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020

  74. [74]

    Scaling Laws for Autoregressive Generative Modeling

    Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Hee- woo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701, 2020

  75. [75]

    Data and parameter scaling laws for neural machine translation

    Mitchell A Gordon, Kevin Duh, and Jared Kaplan. Data and parameter scaling laws for neural machine translation. In ACL Rolling Review - May 2021, 2021

  76. [76]

    Deep Learning Scaling is Predictable, Empirically

    Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kia- ninejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017

  77. [77]

    Bahri, E

    Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701, 2021

  78. [78]

    The quantization model of neural scaling

    Eric J Michaud, Ziming Liu, Uzay Girit, and Max Tegmark. The quantization model of neural scaling. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

  79. [79]

    A resource model for neural scaling law

    Jinyeop Song, Ziming Liu, Max Tegmark, and Jeff Gore. A resource model for neural scaling law. arXiv preprint arXiv:2402.05164, 2024

  80. [80]

    In-context Learning and Induction Heads

    Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022

Showing first 80 references.