arxiv: 2404.19756 · v5 · submitted 2024-04-30 · 💻 cs.LG · cond-mat.dis-nn· cs.AI· stat.ML

Recognition: 2 theorem links

· Lean Theorem

KAN: Kolmogorov-Arnold Networks

Fabian Ruehle, James Halverson, Marin Solja\v{c}i\'c, Max Tegmark, Sachin Vaidya, Thomas Y. Hou, Yixuan Wang, Ziming Liu

Pith reviewed 2026-05-11 23:35 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.dis-nncs.AIstat.ML

keywords Kolmogorov-Arnold NetworksKANMLPsspline parametrizationneural scaling lawsinterpretabilityPDE solvingfunction approximation

0 comments

The pith

Kolmogorov-Arnold Networks use learnable spline functions on edges to achieve better accuracy than MLPs with smaller sizes and faster scaling laws.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Kolmogorov-Arnold Networks, or KANs, as an alternative to standard Multi-Layer Perceptrons. In KANs, activation functions are placed on the edges as learnable univariate splines instead of being fixed on the nodes. This design allows much smaller KANs to match or surpass the accuracy of larger MLPs in fitting data and solving partial differential equations. The networks also scale more efficiently and offer improved interpretability, making them useful for scientists to discover mathematical and physical laws through visualization and interaction.

Core claim

Inspired by the Kolmogorov-Arnold representation theorem, KANs replace the fixed activation functions on nodes in MLPs with learnable univariate functions parametrized as splines on the edges, with no linear weights at all. This change results in KANs that outperform MLPs in accuracy for data fitting and PDE solving, exhibit faster neural scaling laws, and provide intuitive visualizations that help in rediscovering scientific laws.

What carries the argument

Learnable univariate spline functions on edges that replace all linear weight parameters and enable function approximation per the Kolmogorov-Arnold theorem.

Load-bearing premise

That the univariate spline parametrizations can be trained stably at scale without introducing excessive hyperparameters or overfitting that erases the accuracy and scaling advantages.

What would settle it

Demonstrating that on standard benchmarks, KANs require more parameters or training time than MLPs to reach the same accuracy, or that their scaling exponents are not faster.

read the original abstract

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KANs swap fixed node activations for learnable edge splines and report smaller networks beating larger MLPs on fitting and PDE tasks, with some interpretability upside, but the scaling claims hinge on experiments that may not survive broader hyperparameter scrutiny.

read the letter

The main takeaway is that this paper replaces every scalar weight in an MLP with a trainable univariate B-spline on the edge, motivated by the Kolmogorov-Arnold theorem. They implement it with grid extension, L1 regularization on the spline coefficients, and show the resulting networks on function approximation, PDE solving, and two symbolic discovery examples. The scaling-law plots and the side-by-side visualizations of learned functions are the parts that land cleanly; multiple runs are reported and the architecture itself is straightforward to code from the description.

Referee Report

2 major / 2 minor

Summary. The paper proposes Kolmogorov-Arnold Networks (KANs) as alternatives to MLPs, replacing fixed node activations and linear weights with learnable univariate B-spline functions on edges. It claims that substantially smaller KANs achieve comparable or superior accuracy to larger MLPs on data-fitting and PDE-solving tasks, that KANs exhibit faster neural scaling laws, and that their edge-wise functions enable superior interpretability and human-AI collaboration for rediscovering mathematical and physical laws.

Significance. If the reported accuracy and scaling advantages hold under equivalent hyperparameter budgets and at larger scales, KANs could provide a theoretically grounded alternative architecture for scientific machine learning, reducing parameter counts while improving both performance and interpretability. The manuscript supplies concrete support via multiple experimental runs, held-out scaling plots, and two illustrative discovery examples; these empirical elements are strengths that would be diminished only if the spline-specific hyperparameters prove unstable or require disproportionate tuning.

major comments (2)

[§4.1 and Table 1] §4.1 and Table 1: The central claim that 'much smaller KANs' outperform 'much larger MLPs' on fitting tasks rests on reported test errors for specific spline grid sizes (G=5) and orders (k=3) together with L1 regularization on coefficients; without an ablation or sensitivity analysis over G, k, and regularization strength under a fixed tuning budget, it is unclear whether the accuracy advantage survives when MLPs receive comparable hyperparameter effort.
[§5 and Figure 7] §5 and Figure 7: The faster neural scaling laws are demonstrated by plotting test error against layer width or depth; because each KAN edge carries G+k trainable spline coefficients (plus possible grid-update steps), the x-axis must be total trainable parameters rather than architectural width for the 'parameter-efficient' interpretation of the scaling advantage to be load-bearing.

minor comments (2)

[§3.2] §3.2: The periodic grid-extension procedure and the exact form of the basis functions after extension are described only briefly; a short pseudocode block or explicit knot-vector update rule would improve reproducibility.
[Figure 4] Figure 4 (PDE example): The color scale and axis limits on the residual plots are not stated, making quantitative comparison of KAN versus MLP residuals difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where the concerns are valid and can be addressed with additional analysis.

read point-by-point responses

Referee: [§4.1 and Table 1] §4.1 and Table 1: The central claim that 'much smaller KANs' outperform 'much larger MLPs' on fitting tasks rests on reported test errors for specific spline grid sizes (G=5) and orders (k=3) together with L1 regularization on coefficients; without an ablation or sensitivity analysis over G, k, and regularization strength under a fixed tuning budget, it is unclear whether the accuracy advantage survives when MLPs receive comparable hyperparameter effort.

Authors: We acknowledge that the primary reported results use G=5, k=3, and L1 regularization on the spline coefficients. While MLP baselines were tuned across widths, depths, and optimization hyperparameters, we did not conduct an exhaustive joint ablation of KAN spline hyperparameters under an identical total tuning budget. To address this directly, we will add a new sensitivity analysis subsection in the revised manuscript. This will vary G, k, and regularization strength for KANs while reporting MLP performance under matched computational effort for hyperparameter search, thereby demonstrating that the accuracy advantages are robust to reasonable choices of these parameters. revision: yes
Referee: [§5 and Figure 7] §5 and Figure 7: The faster neural scaling laws are demonstrated by plotting test error against layer width or depth; because each KAN edge carries G+k trainable spline coefficients (plus possible grid-update steps), the x-axis must be total trainable parameters rather than architectural width for the 'parameter-efficient' interpretation of the scaling advantage to be load-bearing.

Authors: We agree that a parameter-efficient interpretation of the scaling advantage is best supported by plotting against total trainable parameters rather than architectural width or depth alone. The original Figure 7 and associated text emphasized scaling with respect to network dimensions, following conventions in the neural scaling literature. In the revision we will augment the figure with additional curves that explicitly plot test error versus total parameter count for both KANs and MLPs, and we will update the text to qualify the scaling claims accordingly. This change will make the parameter-efficiency argument load-bearing while preserving the architectural scaling observations. revision: partial

Circularity Check

0 steps flagged

No significant circularity in KAN architecture or claims

full rationale

The KAN architecture is defined directly from the external Kolmogorov-Arnold representation theorem by placing learnable univariate spline functions on edges instead of fixed activations on nodes. Accuracy comparisons, scaling-law observations, and interpretability demonstrations are obtained from separate empirical evaluations on held-out data and PDE tasks; these measurements do not reduce to the training loss or to any self-referential definition. No load-bearing step relies on self-citation, uniqueness theorems imported from the authors, or renaming of known results. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work rests on the Kolmogorov-Arnold theorem as the existence proof for representing multivariate functions via univariate ones, plus standard spline approximation theory. No new physical entities are postulated. Hyperparameters such as grid size and spline order are chosen by the user rather than fitted to the target claim.

free parameters (2)

spline grid size
Controls the number of intervals in each univariate spline; chosen per experiment rather than learned from data.
spline order
Polynomial degree of the basis functions inside each spline; set by the user.

axioms (1)

standard math Kolmogorov-Arnold representation theorem
Invoked in the introduction to justify that multivariate continuous functions can be expressed as finite sums and compositions of univariate continuous functions.

pith-pipeline@v0.9.0 · 5532 in / 1338 out tokens · 63114 ms · 2026-05-11T23:35:37.628342+00:00 · methodology

discussion (0)

Forward citations

Cited by 39 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling
cs.LG 2026-05 unverdicted novelty 8.0

New lower bounds establish that Deep Sets need embedding dimension linear in the number of points (up to constants) for d>1, and give the first non-trivial bounds for higher-order Janossy pooling.
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
cs.LG 2026-05 conditional novelty 7.0

KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.
Bridging Spectral Operator Learning and U-Net Hierarchies: SpectraNet for Stable Autoregressive PDE Surrogates
cs.LG 2026-05 unverdicted novelty 7.0

SpectraNet delivers stable autoregressive PDE rollouts with lower error and 2.3x fewer parameters than FNO by embedding spectral convolutions in a U-Net and training a residual-target block under semigroup-consistency loss.
Variable decoupling and the Kolmogorov Superposition Theorem for rational functions
math.NA 2026-05 unverdicted novelty 7.0

For rational multivariate functions, the Kolmogorov Superposition Theorem allows variable decoupling by inspection with no computation using the Loewner Framework.
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
cs.LG 2026-05 unverdicted novelty 7.0

TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
KANs need curvature: penalties for compositional smoothness
cs.LG 2026-05 unverdicted novelty 7.0

A curvature penalty for KANs, derived to respect compositional effects and equipped with a proven upper bound on full-model curvature, produces smoother activations while preserving accuracy.
Layer-wise Lipschitz-Product Control for Deep Kolmogorov--Arnold Network Representations of Compositionally Structured Functions
cs.LG 2026-04 unverdicted novelty 7.0

Compositionally sparse functions given by finite computation trees admit deep KAN representations with dimension-independent layer-wise Lipschitz product bounds P(KAN) <= max(C*,1)^L_f where L_f scales linearly with t...
Neural Enhancement of Analytical Appearance Models
cs.GR 2026-04 unverdicted novelty 7.0

Neural enhancement replaces selected computational nodes in analytical BRDF models with MLPs identified via hypercube search, yielding accurate, compact models that fit measured reflectance data better than pure analy...
Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks
cs.LG 2026-04 unverdicted novelty 7.0

Deep KANs with edge functions restricted to affine maps plus one fixed non-affine continuous function σ are dense in C(K) for any compact K if and only if σ is non-affine.
Physics informed operator learning of parameter dependent spectra
gr-qc 2026-04 unverdicted novelty 7.0

DeepOPiraKAN learns parameter-to-spectrum mappings via operator learning and achieves relative errors of O(10^{-6}) to O(10^{-4}) for Kerr black hole quasinormal modes up to n=7 when benchmarked against Leaver's method.
KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition
cs.CV 2026-04 unverdicted novelty 7.0

KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.
From Zero to Detail: A Progressive Spectral Decoupling Paradigm for UHD Image Restoration with New Benchmark
cs.CV 2026-04 unverdicted novelty 7.0

A new framework called ERR decomposes UHD image restoration into three frequency stages with specialized sub-networks and introduces the LSUHDIR benchmark dataset of over 82,000 images.
G-PARC: Graph-Physics Aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics on Unstructured Meshes
cs.LG 2026-04 unverdicted novelty 7.0

G-PARC embeds analytically computed differential operators via moving least squares on graphs into recurrent networks, achieving higher accuracy with 2-3x fewer parameters than prior graph PADL methods on nonlinear be...
Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling
cs.LG 2026-04 unverdicted novelty 7.0

COSINE jointly discovers latent interaction graphs and compact symbolic dynamical equations by using an LLM to iteratively prune and expand the function library based on optimization feedback.
Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps
cs.AI 2026-04 unverdicted novelty 7.0

KA-FCM uses B-spline functions on FCM edges, inspired by the Kolmogorov-Arnold theorem, to enable arbitrary non-monotonic causal modeling and outperforms standard FCM while matching MLPs on non-monotonic inference, sy...
Efficient Convexification of Kolmogorov-Arnold Networks with Polynomial Functional Forms Via a Continuous Graham Scan Approach
math.OC 2026-04 unverdicted novelty 7.0

A continuous Graham Scan constructs exact convex envelopes of univariate polynomials for strong convex relaxations of polynomial Kolmogorov-Arnold Networks.
WGFINNs: Weak formulation-based GENERIC formalism informed neural networks
cs.LG 2026-04 unverdicted novelty 7.0

WGFINNs use weak-form loss functions with GENERIC structure preservation to recover governing equations more accurately from noisy observations than prior strong-form GFINNs.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
cs.LG 2026-05 unverdicted novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...
FreeMOCA: Memory-Free Continual Learning for Malicious Code Analysis
cs.CR 2026-05 unverdicted novelty 6.0

FreeMOCA enables memory-free continual learning for malicious code analysis by adaptive layer-wise parameter interpolation between task updates, outperforming baselines on EMBER and AZ malware benchmarks with up to 42...
Sparse Random-Feature Neural Networks with Krylov-Based SVD for Singularly Perturbed ODE
math.NA 2026-05 unverdicted novelty 6.0

Sparse RFNNs with sSVD via Lanczos-Golub-Kahan bidiagonalization maintain accuracy while improving efficiency and robustness for 1D steady convection-diffusion equations with strong advection.
Towards Intelligent Low-Altitude Wireless Network Deployment: Differentiable Channel Knowledge Map Construction and Trajectory Design
eess.SP 2026-05 unverdicted novelty 6.0

A neural network-based differentiable CKM construction method enables joint power-bandwidth-trajectory optimization for multi-UAV systems, achieving higher minimum throughput than statistical channel models.
Partition-of-Unity Gaussian Kolmogorov-Arnold Networks
cs.CE 2026-04 unverdicted novelty 6.0

PU-GKAN applies Shepard normalization to Gaussian bases in KANs, yielding exact constant reproduction, reduced epsilon sensitivity, and better validation accuracy across tested regimes.
Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications
cs.IT 2026-04 unverdicted novelty 6.0

The proposed pretraining framework for safe DRL in CF-MIMO resource management doubles initial energy efficiency, achieves 4.7% higher final EE, maintains 1% delay violation rate, and cuts exploration steps by 50% com...
Scale-Parameter Selection in Gaussian Kolmogorov-Arnold Networks
cs.CE 2026-04 unverdicted novelty 6.0

A stable operating interval for the Gaussian scale parameter ε in KANs is ε ∈ [1/(G-1), 2/(G-1)], derived from first-layer feature geometry and validated across multiple approximation and physics-informed problems.
ParamBoost: Gradient Boosted Piecewise Cubic Polynomials
cs.LG 2026-04 unverdicted novelty 6.0

ParamBoost improves GAMs by fitting piecewise cubic polynomials via gradient boosting and supports constraints for continuity, monotonicity, convexity, and feature interactions.
Unified scaling laws for turbulent boundary layers across flow regimes
physics.flu-dyn 2026-04 unverdicted novelty 6.0

Two local dimensionless groups predict wall shear stress and three predict velocity profiles in turbulent boundary layers across pressure gradient regimes using information-theoretic selection of maximal-predictive co...
Small-scale photonic Kolmogorov-Arnold networks using standard telecom nonlinear modules
physics.optics 2026-04 unverdicted novelty 6.0

Small photonic KANs using commodity telecom nonlinear modules reach 98.4% accuracy on nonlinear classification with only four modules and remain robust to hardware impairments.
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs
cs.CE 2026-04 unverdicted novelty 6.0

Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
cs.LG 2026-04 unverdicted novelty 6.0

Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family...
Interpretation of Crystal Energy Landscapes with Kolmogorov-Arnold Networks
cond-mat.dis-nn 2026-04 unverdicted novelty 6.0

Element-Weighted KANs achieve state-of-the-art accuracy on formation energy, band gap, and work function while revealing periodic-table-aligned chemical trends through their learnable activation functions.
M$^4$-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection
cs.CV 2026-05 unverdicted novelty 5.0

M⁴-SAM equips SAM2 with modality-aware MoE-LoRA, gated multi-level fusion, and pseudo-guided initialization to reach state-of-the-art on RGB-D video salient object detection.
PixelFlowCast: Latent-Free Precipitation Nowcasting via Pixel Mean Flows
cs.CV 2026-05 unverdicted novelty 5.0

PixelFlowCast delivers high-fidelity precipitation nowcasts from radar sequences using a latent-free Pixel Mean Flows predictor guided by a deterministic coarse stage and KANCondNet features.
KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
cs.CV 2026-05 unverdicted novelty 5.0

KANMultiSign generates sign language poses from notation via coarse-to-fine multi-scale supervision and compact KAN-Transformer modules, achieving lower DTW joint error with fewer parameters than baselines on several ...
RoboKA: KAN Informed Multimodal Learning for RoboCall Surveillance System
cs.MM 2026-04 unverdicted novelty 5.0

RoboKA is a KAN-based multimodal fusion model that outperforms baselines on a new synthetic dataset for detecting adversarial robocalls via acoustic and linguistic cues.
DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation
cs.CV 2026-04 unverdicted novelty 5.0

DepthPilot generates physically consistent and clinically interpretable colonoscopy videos by injecting depth priors into diffusion models through parameter-efficient fine-tuning and replacing linear denoising weights...
Gait Recognition with Temporal Kolmogorov-Arnold Networks
cs.CV 2026-04 unverdicted novelty 5.0

A CNN combined with a new Temporal Kolmogorov-Arnold Network using learnable functions and two-level memory achieves strong gait recognition performance on the CASIA-B dataset.
High-Precision Phase-Shift Transferable Neural Networks for High-Frequency Function Approximation and PDE Solution
math.NA 2026-04 unverdicted novelty 5.0

Phase-shift transferable neural networks achieve high-precision approximation of high-frequency functions and PDE solutions.
General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations
cs.LG 2026-04 unverdicted novelty 5.0

GEN is a neural network that solves PDEs by constructing explicit function approximations from basis functions based on prior PDE knowledge, yielding more robust and extensible solutions than standard PINNs.
Low Light Image Enhancement Challenge at NTIRE 2026
cs.CV 2026-04 unverdicted novelty 2.0

NTIRE 2026 challenge report shows progress in low-light image enhancement via 22 submitted networks evaluated on a new dataset.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 39 Pith papers · 9 internal anchors

[1]

Neural networks: a comprehensive foundation

Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994

work page 1994
[2]

Approximation by superpositions of a sigmoidal function

George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989

work page 1989
[3]

Multilayer feedforward networks are universal approximators

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989

work page 1989
[4]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural informa- tion processing systems, 30, 2017

work page 2017
[5]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Kolmogorov

A.N. Kolmogorov. On the representation of continuous functions of several variables as superpositions of continuous functions of a smaller number of variables. Dokl. Akad. Nauk, 108(2), 1956

work page 1956
[7]

On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition

Andrei Nikolaevich Kolmogorov. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. In Doklady Akademii Nauk, volume 114, pages 953–956. Russian Academy of Sciences, 1957

work page 1957
[8]

On a constructive proof of kolmogorov’s superposition theorem

Jürgen Braun and Michael Griebel. On a constructive proof of kolmogorov’s superposition theorem. Constructive approximation, 30:653–675, 2009

work page 2009
[9]

Space-filling curves and kolmogorov superposition- based neural networks

David A Sprecher and Sorin Draghici. Space-filling curves and kolmogorov superposition- based neural networks. Neural Networks, 15(1):57–67, 2002

work page 2002
[10]

On the training of a kolmogorov network

Mario Köppen. On the training of a kolmogorov network. In Artificial Neural Net- works—ICANN 2002: International Conference Madrid, Spain, August 28–30, 2002 Pro- ceedings 12, pages 474–479. Springer, 2002

work page 2002
[11]

On the realization of a kolmogorov network

Ji-Nan Lin and Rolf Unbehauen. On the realization of a kolmogorov network. Neural Com- putation, 5(1):18–20, 1993

work page 1993
[12]

The kolmogorov superposition theorem can break the curse of dimensionality when approximating high dimensional functions

Ming-Jun Lai and Zhaiming Shen. The kolmogorov superposition theorem can break the curse of dimensionality when approximating high dimensional functions. arXiv preprint arXiv:2112.09963, 2021

work page arXiv 2021
[13]

The kolmogorov spline network for image processing

Pierre-Emmanuel Leni, Yohan D Fougerolle, and Frédéric Truchetet. The kolmogorov spline network for image processing. In Image Processing: Concepts, Methodologies, Tools, and Applications, pages 54–78. IGI Global, 2013

work page 2013
[14]

Exsplinet: An interpretable and expressive spline-based neural network

Daniele Fakhoury, Emanuele Fakhoury, and Hendrik Speleers. Exsplinet: An interpretable and expressive spline-based neural network. Neural Networks, 152:332–346, 2022

work page 2022
[15]

Error bounds for deep relu networks using the kolmogorov–arnold superposition theorem

Hadrien Montanelli and Haizhao Yang. Error bounds for deep relu networks using the kolmogorov–arnold superposition theorem. Neural Networks, 129:1–6, 2020

work page 2020
[16]

On the optimal expressive power of relu dnns and its application in approximation with kolmogorov superposition theorem

Juncai He. On the optimal expressive power of relu dnns and its application in approximation with kolmogorov superposition theorem. arXiv preprint arXiv:2308.05509, 2023. 34

work page arXiv 2023
[17]

Relu deep neural networks and linear finite elements

Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. Relu deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973, 2018

work page arXiv 2018
[18]

Deep neural networks and finite elements of any order on arbitrary dimensions

Juncai He and Jinchao Xu. Deep neural networks and finite elements of any order on arbitrary dimensions. arXiv preprint arXiv:2312.14276, 2023

work page arXiv 2023
[19]

Theoretical issues in deep networks

Tomaso Poggio, Andrzej Banburski, and Qianli Liao. Theoretical issues in deep networks. Proceedings of the National Academy of Sciences, 117(48):30039–30045, 2020

work page 2020
[20]

Representation properties of networks: Kolmogorov’s theorem is irrelevant

Federico Girosi and Tomaso Poggio. Representation properties of networks: Kolmogorov’s theorem is irrelevant. Neural Computation, 1(4):465–469, 1989

work page 1989
[21]

Why does deep and cheap learning work so well? Journal of Statistical Physics, 168:1223–1247, 2017

Henry W Lin, Max Tegmark, and David Rolnick. Why does deep and cheap learning work so well? Journal of Statistical Physics, 168:1223–1247, 2017

work page 2017
[22]

Nonlinear material design using principal stretches

Hongyi Xu, Funshing Sin, Yufeng Zhu, and Jernej Barbi ˇc. Nonlinear material design using principal stretches. ACM Transactions on Graphics (TOG), 34(4):1–11, 2015

work page 2015
[23]

A practical guide to splines, volume 27

Carl De Boor. A practical guide to splines, volume 27. springer-verlag New York, 1978

work page 1978
[24]

A neural scaling law from the dimension of the data manifold, 2020, 2004.10802 http://arxiv.org/abs/2004.10802

Utkarsh Sharma and Jared Kaplan. A neural scaling law from the dimension of the data manifold. arXiv preprint arXiv:2004.10802, 2020

work page arXiv 2004
[25]

Precision machine learning

Eric J Michaud, Ziming Liu, and Max Tegmark. Precision machine learning. Entropy, 25(1):175, 2023

work page 2023
[26]

Rate-optimal estimation for a general class of nonpara- metric regression models with unknown link functions

Joel L Horowitz and Enno Mammen. Rate-optimal estimation for a general class of nonpara- metric regression models with unknown link functions. 2007

work page 2007
[27]

On the rate of convergence of fully connected deep neural network regression estimates

Michael Kohler and Sophie Langer. On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021

work page 2021
[28]

Nonparametric regression using deep neural networks with relu activation function

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. 2020

work page 2020
[29]

Optimal nonlinear approximation

Ronald A DeV ore, Ralph Howard, and Charles Micchelli. Optimal nonlinear approximation. Manuscripta mathematica, 63:469–478, 1989

work page 1989
[30]

Wavelet compression and nonlinear n-widths

Ronald A DeV ore, George Kyriazis, Dany Leviatan, and Vladimir M Tikhomirov. Wavelet compression and nonlinear n-widths. Adv. Comput. Math., 1(2):197–214, 1993

work page 1993
[31]

Sharp lower bounds on the manifold widths of sobolev and besov spaces

Jonathan W Siegel. Sharp lower bounds on the manifold widths of sobolev and besov spaces. arXiv preprint arXiv:2402.04407, 2024

work page arXiv 2024
[32]

Error bounds for approximations with deep relu networks

Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Net- works, 94:103–114, 2017

work page 2017
[33]

Nearly-tight vc- dimension and pseudodimension bounds for piecewise linear neural networks

Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight vc- dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20(63):1–17, 2019

work page 2019
[34]

Optimal approximation rates for deep relu neural networks on sobolev and besov spaces

Jonathan W Siegel. Optimal approximation rates for deep relu neural networks on sobolev and besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023

work page 2023
[35]

Multi-stage neural networks: Function approximator of machine precision

Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of machine precision. Journal of Computational Physics, page 112865, 2024

work page 2024
[36]

Ai feynman: A physics-inspired method for sym- bolic regression

Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for sym- bolic regression. Science Advances, 6(16):eaay2631, 2020. 35

work page 2020
[37]

Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity

Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity. Advances in Neural Information Processing Systems, 33:4860–4871, 2020

work page 2020
[38]

Physics-informed neural net- works: A deep learning framework for solving forward and inverse problems involving non- linear partial differential equations

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural net- works: A deep learning framework for solving forward and inverse problems involving non- linear partial differential equations. Journal of Computational physics, 378:686–707, 2019

work page 2019
[39]

Physics-informed machine learning

George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021

work page 2021
[40]

Measuring catastrophic forgetting in neural networks

Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[41]

Brain plasticity and behavior

Bryan Kolb and Ian Q Whishaw. Brain plasticity and behavior. Annual review of psychology, 49(1):43–64, 1998

work page 1998
[42]

Modular and hierarchically modular organization of brain networks

David Meunier, Renaud Lambiotte, and Edward T Bullmore. Modular and hierarchically modular organization of brain networks. Frontiers in neuroscience, 4:7572, 2010

work page 2010
[43]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

work page 2017
[44]

Revisiting neural net- works for continual learning: An architectural perspective, 2024

Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, and Yanan Sun. Revisiting neural net- works for continual learning: An architectural perspective, 2024

work page 2024
[45]

Advancing mathe- matics by guiding human intuition with ai

Alex Davies, Petar Veliˇckovi´c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathe- matics by guiding human intuition with ai. Nature, 600(7887):70–74, 2021

work page 2021
[46]

Searching for rib- bons with machine learning, 2023

Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for rib- bons with machine learning, 2023

work page 2023
[47]

Petersen

P. Petersen. Riemannian Geometry. Graduate Texts in Mathematics. Springer New York, 2006

work page 2006
[48]

Absence of diffusion in certain random lattices

Philip W Anderson. Absence of diffusion in certain random lattices. Physical review, 109(5):1492, 1958

work page 1958
[49]

A relation between the density of states and range of localization for one dimensional random systems

David J Thouless. A relation between the density of states and range of localization for one dimensional random systems. Journal of Physics C: Solid State Physics, 5(1):77, 1972

work page 1972
[50]

Scaling theory of localization: Absence of quantum diffusion in two dimensions

Elihu Abrahams, PW Anderson, DC Licciardello, and TV Ramakrishnan. Scaling theory of localization: Absence of quantum diffusion in two dimensions. Physical Review Letters, 42(10):673, 1979

work page 1979
[51]

Fifty years of anderson localiza- tion

Ad Lagendijk, Bart van Tiggelen, and Diederik S Wiersma. Fifty years of anderson localiza- tion. Physics today, 62(8):24–29, 2009

work page 2009
[52]

Anderson localization of light

Mordechai Segev, Yaron Silberberg, and Demetrios N Christodoulides. Anderson localization of light. Nature Photonics, 7(3):197–204, 2013

work page 2013
[53]

Optics of photonic quasicrystals

Z Valy Vardeny, Ajay Nahata, and Amit Agrawal. Optics of photonic quasicrystals. Nature photonics, 7(3):177–187, 2013. 36

work page 2013
[54]

Strong localization of photons in certain disordered dielectric superlattices

Sajeev John. Strong localization of photons in certain disordered dielectric superlattices. Physical review letters, 58(23):2486, 1987

work page 1987
[55]

Observation of a localization transition in quasiperiodic photonic lattices

Yoav Lahini, Rami Pugatch, Francesca Pozzi, Marc Sorel, Roberto Morandotti, Nir David- son, and Yaron Silberberg. Observation of a localization transition in quasiperiodic photonic lattices. Physical review letters, 103(1):013901, 2009

work page 2009
[56]

Reen- trant delocalization transition in one-dimensional photonic quasicrystals

Sachin Vaidya, Christina Jörg, Kyle Linn, Megan Goh, and Mikael C Rechtsman. Reen- trant delocalization transition in one-dimensional photonic quasicrystals. Physical Review Research, 5(3):033170, 2023

work page 2023
[57]

Absence of many-body mobility edges

Wojciech De Roeck, Francois Huveneers, Markus Müller, and Mauro Schiulaz. Absence of many-body mobility edges. Physical Review B, 93(1):014203, 2016

work page 2016
[58]

Many-body localization and quantum nonergodicity in a model with a single-particle mobility edge

Xiaopeng Li, Sriram Ganeshan, JH Pixley, and S Das Sarma. Many-body localization and quantum nonergodicity in a model with a single-particle mobility edge. Physical review letters, 115(18):186601, 2015

work page 2015
[59]

Interactions and mobility edges: Observing the generalized aubry-andré model

Fangzhao Alex An, Karmela Padavi´c, Eric J Meier, Suraj Hegde, Sriram Ganeshan, JH Pixley, Smitha Vishveshwara, and Bryce Gadway. Interactions and mobility edges: Observing the generalized aubry-andré model. Physical review letters, 126(4):040603, 2021

work page 2021
[60]

Predicted mobility edges in one-dimensional incommensurate optical lattices: An exactly solvable model of anderson localization

J Biddle and S Das Sarma. Predicted mobility edges in one-dimensional incommensurate optical lattices: An exactly solvable model of anderson localization. Physical review letters, 104(7):070601, 2010

work page 2010
[61]

Self-consistent theory of mobility edges in quasiperiodic chains

Alexander Duthie, Sthitadhi Roy, and David E Logan. Self-consistent theory of mobility edges in quasiperiodic chains. Physical Review B, 103(6):L060201, 2021

work page 2021
[62]

Nearest neighbor tight binding models with an exact mobility edge in one dimension

Sriram Ganeshan, JH Pixley, and S Das Sarma. Nearest neighbor tight binding models with an exact mobility edge in one dimension. Physical review letters, 114(14):146601, 2015

work page 2015
[63]

One-dimensional quasiperiodic mosaic lattice with exact mobility edges

Yucheng Wang, Xu Xia, Long Zhang, Hepeng Yao, Shu Chen, Jiangong You, Qi Zhou, and Xiong-Jun Liu. One-dimensional quasiperiodic mosaic lattice with exact mobility edges. Physical Review Letters, 125(19):196604, 2020

work page 2020
[64]

Duality be- tween two generalized aubry-andré models with exact mobility edges

Yucheng Wang, Xu Xia, Yongjian Wang, Zuohuan Zheng, and Xiong-Jun Liu. Duality be- tween two generalized aubry-andré models with exact mobility edges. Physical Review B , 103(17):174205, 2021

work page 2021
[65]

Exact new mobility edges between critical and localized states

Xin-Chi Zhou, Yongjian Wang, Ting-Fung Jeffrey Poon, Qi Zhou, and Xiong-Jun Liu. Exact new mobility edges between critical and localized states. Physical Review Letters , 131(17):176401, 2023

work page 2023
[66]

How deep sparse networks avoid the curse of dimensionality: Efficiently computable functions are compositionally sparse

Tomaso Poggio. How deep sparse networks avoid the curse of dimensionality: Efficiently computable functions are compositionally sparse. CBMM Memo, 10:2022, 2022

work page 2022
[67]

The kolmogorov–arnold representation theorem revisited

Johannes Schmidt-Hieber. The kolmogorov–arnold representation theorem revisited. Neural networks, 137:119–126, 2021

work page 2021
[68]

On the kolmogorov neural networks

Aysu Ismayilova and Vugar E Ismailov. On the kolmogorov neural networks. Neural Net- works, page 106333, 2024

work page 2024
[69]

A new iterative method for construction of the kolmogorov-arnold representation

Michael Poluektov and Andrew Polar. A new iterative method for construction of the kolmogorov-arnold representation. arXiv preprint arXiv:2305.08194, 2023. 37

work page arXiv 2023
[70]

Neural additive models: Interpretable machine learning with neural nets

Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34:4699–4711, 2021

work page 2021
[71]

Deep sets

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdi- nov, and Alexander J Smola. Deep sets. Advances in neural information processing systems, 30, 2017

work page 2017
[72]

Optimizing kernel machines using deep learning

Huan Song, Jayaraman J Thiagarajan, Prasanna Sattigeri, and Andreas Spanias. Optimizing kernel machines using deep learning. IEEE transactions on neural networks and learning systems, 29(11):5528–5540, 2018

work page 2018
[73]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[74]

Scaling Laws for Autoregressive Generative Modeling

Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Hee- woo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701, 2020

work page internal anchor Pith review arXiv 2010
[75]

Data and parameter scaling laws for neural machine translation

Mitchell A Gordon, Kevin Duh, and Jared Kaplan. Data and parameter scaling laws for neural machine translation. In ACL Rolling Review - May 2021, 2021

work page 2021
[76]

Deep Learning Scaling is Predictable, Empirically

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kia- ninejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017

work page internal anchor Pith review arXiv 2017
[77]

Bahri, E

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701, 2021

work page arXiv 2021
[78]

The quantization model of neural scaling

Eric J Michaud, Ziming Liu, Uzay Girit, and Max Tegmark. The quantization model of neural scaling. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[79]

A resource model for neural scaling law

Jinyeop Song, Ziming Liu, Max Tegmark, and Jeff Gore. A resource model for neural scaling law. arXiv preprint arXiv:2402.05164, 2024

work page arXiv 2024
[80]

In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

Showing first 80 references.