Recognition: 2 theorem links
· Lean TheoremKAN: Kolmogorov-Arnold Networks
Pith reviewed 2026-05-11 23:35 UTC · model grok-4.3
The pith
Kolmogorov-Arnold Networks use learnable spline functions on edges to achieve better accuracy than MLPs with smaller sizes and faster scaling laws.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Inspired by the Kolmogorov-Arnold representation theorem, KANs replace the fixed activation functions on nodes in MLPs with learnable univariate functions parametrized as splines on the edges, with no linear weights at all. This change results in KANs that outperform MLPs in accuracy for data fitting and PDE solving, exhibit faster neural scaling laws, and provide intuitive visualizations that help in rediscovering scientific laws.
What carries the argument
Learnable univariate spline functions on edges that replace all linear weight parameters and enable function approximation per the Kolmogorov-Arnold theorem.
Load-bearing premise
That the univariate spline parametrizations can be trained stably at scale without introducing excessive hyperparameters or overfitting that erases the accuracy and scaling advantages.
What would settle it
Demonstrating that on standard benchmarks, KANs require more parameters or training time than MLPs to reach the same accuracy, or that their scaling exponents are not faster.
read the original abstract
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Kolmogorov-Arnold Networks (KANs) as alternatives to MLPs, replacing fixed node activations and linear weights with learnable univariate B-spline functions on edges. It claims that substantially smaller KANs achieve comparable or superior accuracy to larger MLPs on data-fitting and PDE-solving tasks, that KANs exhibit faster neural scaling laws, and that their edge-wise functions enable superior interpretability and human-AI collaboration for rediscovering mathematical and physical laws.
Significance. If the reported accuracy and scaling advantages hold under equivalent hyperparameter budgets and at larger scales, KANs could provide a theoretically grounded alternative architecture for scientific machine learning, reducing parameter counts while improving both performance and interpretability. The manuscript supplies concrete support via multiple experimental runs, held-out scaling plots, and two illustrative discovery examples; these empirical elements are strengths that would be diminished only if the spline-specific hyperparameters prove unstable or require disproportionate tuning.
major comments (2)
- [§4.1 and Table 1] §4.1 and Table 1: The central claim that 'much smaller KANs' outperform 'much larger MLPs' on fitting tasks rests on reported test errors for specific spline grid sizes (G=5) and orders (k=3) together with L1 regularization on coefficients; without an ablation or sensitivity analysis over G, k, and regularization strength under a fixed tuning budget, it is unclear whether the accuracy advantage survives when MLPs receive comparable hyperparameter effort.
- [§5 and Figure 7] §5 and Figure 7: The faster neural scaling laws are demonstrated by plotting test error against layer width or depth; because each KAN edge carries G+k trainable spline coefficients (plus possible grid-update steps), the x-axis must be total trainable parameters rather than architectural width for the 'parameter-efficient' interpretation of the scaling advantage to be load-bearing.
minor comments (2)
- [§3.2] §3.2: The periodic grid-extension procedure and the exact form of the basis functions after extension are described only briefly; a short pseudocode block or explicit knot-vector update rule would improve reproducibility.
- [Figure 4] Figure 4 (PDE example): The color scale and axis limits on the residual plots are not stated, making quantitative comparison of KAN versus MLP residuals difficult.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where the concerns are valid and can be addressed with additional analysis.
read point-by-point responses
-
Referee: [§4.1 and Table 1] §4.1 and Table 1: The central claim that 'much smaller KANs' outperform 'much larger MLPs' on fitting tasks rests on reported test errors for specific spline grid sizes (G=5) and orders (k=3) together with L1 regularization on coefficients; without an ablation or sensitivity analysis over G, k, and regularization strength under a fixed tuning budget, it is unclear whether the accuracy advantage survives when MLPs receive comparable hyperparameter effort.
Authors: We acknowledge that the primary reported results use G=5, k=3, and L1 regularization on the spline coefficients. While MLP baselines were tuned across widths, depths, and optimization hyperparameters, we did not conduct an exhaustive joint ablation of KAN spline hyperparameters under an identical total tuning budget. To address this directly, we will add a new sensitivity analysis subsection in the revised manuscript. This will vary G, k, and regularization strength for KANs while reporting MLP performance under matched computational effort for hyperparameter search, thereby demonstrating that the accuracy advantages are robust to reasonable choices of these parameters. revision: yes
-
Referee: [§5 and Figure 7] §5 and Figure 7: The faster neural scaling laws are demonstrated by plotting test error against layer width or depth; because each KAN edge carries G+k trainable spline coefficients (plus possible grid-update steps), the x-axis must be total trainable parameters rather than architectural width for the 'parameter-efficient' interpretation of the scaling advantage to be load-bearing.
Authors: We agree that a parameter-efficient interpretation of the scaling advantage is best supported by plotting against total trainable parameters rather than architectural width or depth alone. The original Figure 7 and associated text emphasized scaling with respect to network dimensions, following conventions in the neural scaling literature. In the revision we will augment the figure with additional curves that explicitly plot test error versus total parameter count for both KANs and MLPs, and we will update the text to qualify the scaling claims accordingly. This change will make the parameter-efficiency argument load-bearing while preserving the architectural scaling observations. revision: partial
Circularity Check
No significant circularity in KAN architecture or claims
full rationale
The KAN architecture is defined directly from the external Kolmogorov-Arnold representation theorem by placing learnable univariate spline functions on edges instead of fixed activations on nodes. Accuracy comparisons, scaling-law observations, and interpretability demonstrations are obtained from separate empirical evaluations on held-out data and PDE tasks; these measurements do not reduce to the training loss or to any self-referential definition. No load-bearing step relies on self-citation, uniqueness theorems imported from the authors, or renaming of known results. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- spline grid size
- spline order
axioms (1)
- standard math Kolmogorov-Arnold representation theorem
Forward citations
Cited by 39 Pith papers
-
Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling
New lower bounds establish that Deep Sets need embedding dimension linear in the number of points (up to constants) for d>1, and give the first non-trivial bounds for higher-order Janossy pooling.
-
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.
-
Bridging Spectral Operator Learning and U-Net Hierarchies: SpectraNet for Stable Autoregressive PDE Surrogates
SpectraNet delivers stable autoregressive PDE rollouts with lower error and 2.3x fewer parameters than FNO by embedding spectral convolutions in a U-Net and training a residual-target block under semigroup-consistency loss.
-
Variable decoupling and the Kolmogorov Superposition Theorem for rational functions
For rational multivariate functions, the Kolmogorov Superposition Theorem allows variable decoupling by inspection with no computation using the Loewner Framework.
-
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
-
KANs need curvature: penalties for compositional smoothness
A curvature penalty for KANs, derived to respect compositional effects and equipped with a proven upper bound on full-model curvature, produces smoother activations while preserving accuracy.
-
Layer-wise Lipschitz-Product Control for Deep Kolmogorov--Arnold Network Representations of Compositionally Structured Functions
Compositionally sparse functions given by finite computation trees admit deep KAN representations with dimension-independent layer-wise Lipschitz product bounds P(KAN) <= max(C*,1)^L_f where L_f scales linearly with t...
-
Neural Enhancement of Analytical Appearance Models
Neural enhancement replaces selected computational nodes in analytical BRDF models with MLPs identified via hypercube search, yielding accurate, compact models that fit measured reflectance data better than pure analy...
-
Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks
Deep KANs with edge functions restricted to affine maps plus one fixed non-affine continuous function σ are dense in C(K) for any compact K if and only if σ is non-affine.
-
Physics informed operator learning of parameter dependent spectra
DeepOPiraKAN learns parameter-to-spectrum mappings via operator learning and achieves relative errors of O(10^{-6}) to O(10^{-4}) for Kerr black hole quasinormal modes up to n=7 when benchmarked against Leaver's method.
-
KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition
KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.
-
From Zero to Detail: A Progressive Spectral Decoupling Paradigm for UHD Image Restoration with New Benchmark
A new framework called ERR decomposes UHD image restoration into three frequency stages with specialized sub-networks and introduces the LSUHDIR benchmark dataset of over 82,000 images.
-
G-PARC: Graph-Physics Aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics on Unstructured Meshes
G-PARC embeds analytically computed differential operators via moving least squares on graphs into recurrent networks, achieving higher accuracy with 2-3x fewer parameters than prior graph PADL methods on nonlinear be...
-
Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling
COSINE jointly discovers latent interaction graphs and compact symbolic dynamical equations by using an LLM to iteratively prune and expand the function library based on optimization feedback.
-
Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps
KA-FCM uses B-spline functions on FCM edges, inspired by the Kolmogorov-Arnold theorem, to enable arbitrary non-monotonic causal modeling and outperforms standard FCM while matching MLPs on non-monotonic inference, sy...
-
Efficient Convexification of Kolmogorov-Arnold Networks with Polynomial Functional Forms Via a Continuous Graham Scan Approach
A continuous Graham Scan constructs exact convex envelopes of univariate polynomials for strong convex relaxations of polynomial Kolmogorov-Arnold Networks.
-
WGFINNs: Weak formulation-based GENERIC formalism informed neural networks
WGFINNs use weak-form loss functions with GENERIC structure preservation to recover governing equations more accurately from noisy observations than prior strong-form GFINNs.
-
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...
-
FreeMOCA: Memory-Free Continual Learning for Malicious Code Analysis
FreeMOCA enables memory-free continual learning for malicious code analysis by adaptive layer-wise parameter interpolation between task updates, outperforming baselines on EMBER and AZ malware benchmarks with up to 42...
-
Sparse Random-Feature Neural Networks with Krylov-Based SVD for Singularly Perturbed ODE
Sparse RFNNs with sSVD via Lanczos-Golub-Kahan bidiagonalization maintain accuracy while improving efficiency and robustness for 1D steady convection-diffusion equations with strong advection.
-
Towards Intelligent Low-Altitude Wireless Network Deployment: Differentiable Channel Knowledge Map Construction and Trajectory Design
A neural network-based differentiable CKM construction method enables joint power-bandwidth-trajectory optimization for multi-UAV systems, achieving higher minimum throughput than statistical channel models.
-
Partition-of-Unity Gaussian Kolmogorov-Arnold Networks
PU-GKAN applies Shepard normalization to Gaussian bases in KANs, yielding exact constant reproduction, reduced epsilon sensitivity, and better validation accuracy across tested regimes.
-
Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications
The proposed pretraining framework for safe DRL in CF-MIMO resource management doubles initial energy efficiency, achieves 4.7% higher final EE, maintains 1% delay violation rate, and cuts exploration steps by 50% com...
-
Scale-Parameter Selection in Gaussian Kolmogorov-Arnold Networks
A stable operating interval for the Gaussian scale parameter ε in KANs is ε ∈ [1/(G-1), 2/(G-1)], derived from first-layer feature geometry and validated across multiple approximation and physics-informed problems.
-
ParamBoost: Gradient Boosted Piecewise Cubic Polynomials
ParamBoost improves GAMs by fitting piecewise cubic polynomials via gradient boosting and supports constraints for continuity, monotonicity, convexity, and feature interactions.
-
Unified scaling laws for turbulent boundary layers across flow regimes
Two local dimensionless groups predict wall shear stress and three predict velocity profiles in turbulent boundary layers across pressure gradient regimes using information-theoretic selection of maximal-predictive co...
-
Small-scale photonic Kolmogorov-Arnold networks using standard telecom nonlinear modules
Small photonic KANs using commodity telecom nonlinear modules reach 98.4% accuracy on nonlinear classification with only four modules and remain robust to hardware impairments.
-
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family...
-
Interpretation of Crystal Energy Landscapes with Kolmogorov-Arnold Networks
Element-Weighted KANs achieve state-of-the-art accuracy on formation energy, band gap, and work function while revealing periodic-table-aligned chemical trends through their learnable activation functions.
-
M$^4$-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection
M⁴-SAM equips SAM2 with modality-aware MoE-LoRA, gated multi-level fusion, and pseudo-guided initialization to reach state-of-the-art on RGB-D video salient object detection.
-
PixelFlowCast: Latent-Free Precipitation Nowcasting via Pixel Mean Flows
PixelFlowCast delivers high-fidelity precipitation nowcasts from radar sequences using a latent-free Pixel Mean Flows predictor guided by a deterministic coarse stage and KANCondNet features.
-
KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation
KANMultiSign generates sign language poses from notation via coarse-to-fine multi-scale supervision and compact KAN-Transformer modules, achieving lower DTW joint error with fewer parameters than baselines on several ...
-
RoboKA: KAN Informed Multimodal Learning for RoboCall Surveillance System
RoboKA is a KAN-based multimodal fusion model that outperforms baselines on a new synthetic dataset for detecting adversarial robocalls via acoustic and linguistic cues.
-
DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation
DepthPilot generates physically consistent and clinically interpretable colonoscopy videos by injecting depth priors into diffusion models through parameter-efficient fine-tuning and replacing linear denoising weights...
-
Gait Recognition with Temporal Kolmogorov-Arnold Networks
A CNN combined with a new Temporal Kolmogorov-Arnold Network using learnable functions and two-level memory achieves strong gait recognition performance on the CASIA-B dataset.
-
High-Precision Phase-Shift Transferable Neural Networks for High-Frequency Function Approximation and PDE Solution
Phase-shift transferable neural networks achieve high-precision approximation of high-frequency functions and PDE solutions.
-
General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations
GEN is a neural network that solves PDEs by constructing explicit function approximations from basis functions based on prior PDE knowledge, yielding more robust and extensible solutions than standard PINNs.
-
Low Light Image Enhancement Challenge at NTIRE 2026
NTIRE 2026 challenge report shows progress in low-light image enhancement via 22 submitted networks evaluated on a new dataset.
Reference graph
Works this paper leans on
-
[1]
Neural networks: a comprehensive foundation
Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994
work page 1994
-
[2]
Approximation by superpositions of a sigmoidal function
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989
work page 1989
-
[3]
Multilayer feedforward networks are universal approximators
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989
work page 1989
-
[4]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural informa- tion processing systems, 30, 2017
work page 2017
-
[5]
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
A.N. Kolmogorov. On the representation of continuous functions of several variables as superpositions of continuous functions of a smaller number of variables. Dokl. Akad. Nauk, 108(2), 1956
work page 1956
-
[7]
Andrei Nikolaevich Kolmogorov. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. In Doklady Akademii Nauk, volume 114, pages 953–956. Russian Academy of Sciences, 1957
work page 1957
-
[8]
On a constructive proof of kolmogorov’s superposition theorem
Jürgen Braun and Michael Griebel. On a constructive proof of kolmogorov’s superposition theorem. Constructive approximation, 30:653–675, 2009
work page 2009
-
[9]
Space-filling curves and kolmogorov superposition- based neural networks
David A Sprecher and Sorin Draghici. Space-filling curves and kolmogorov superposition- based neural networks. Neural Networks, 15(1):57–67, 2002
work page 2002
-
[10]
On the training of a kolmogorov network
Mario Köppen. On the training of a kolmogorov network. In Artificial Neural Net- works—ICANN 2002: International Conference Madrid, Spain, August 28–30, 2002 Pro- ceedings 12, pages 474–479. Springer, 2002
work page 2002
-
[11]
On the realization of a kolmogorov network
Ji-Nan Lin and Rolf Unbehauen. On the realization of a kolmogorov network. Neural Com- putation, 5(1):18–20, 1993
work page 1993
-
[12]
Ming-Jun Lai and Zhaiming Shen. The kolmogorov superposition theorem can break the curse of dimensionality when approximating high dimensional functions. arXiv preprint arXiv:2112.09963, 2021
-
[13]
The kolmogorov spline network for image processing
Pierre-Emmanuel Leni, Yohan D Fougerolle, and Frédéric Truchetet. The kolmogorov spline network for image processing. In Image Processing: Concepts, Methodologies, Tools, and Applications, pages 54–78. IGI Global, 2013
work page 2013
-
[14]
Exsplinet: An interpretable and expressive spline-based neural network
Daniele Fakhoury, Emanuele Fakhoury, and Hendrik Speleers. Exsplinet: An interpretable and expressive spline-based neural network. Neural Networks, 152:332–346, 2022
work page 2022
-
[15]
Error bounds for deep relu networks using the kolmogorov–arnold superposition theorem
Hadrien Montanelli and Haizhao Yang. Error bounds for deep relu networks using the kolmogorov–arnold superposition theorem. Neural Networks, 129:1–6, 2020
work page 2020
-
[16]
Juncai He. On the optimal expressive power of relu dnns and its application in approximation with kolmogorov superposition theorem. arXiv preprint arXiv:2308.05509, 2023. 34
-
[17]
Relu deep neural networks and linear finite elements
Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. Relu deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973, 2018
-
[18]
Deep neural networks and finite elements of any order on arbitrary dimensions
Juncai He and Jinchao Xu. Deep neural networks and finite elements of any order on arbitrary dimensions. arXiv preprint arXiv:2312.14276, 2023
-
[19]
Theoretical issues in deep networks
Tomaso Poggio, Andrzej Banburski, and Qianli Liao. Theoretical issues in deep networks. Proceedings of the National Academy of Sciences, 117(48):30039–30045, 2020
work page 2020
-
[20]
Representation properties of networks: Kolmogorov’s theorem is irrelevant
Federico Girosi and Tomaso Poggio. Representation properties of networks: Kolmogorov’s theorem is irrelevant. Neural Computation, 1(4):465–469, 1989
work page 1989
-
[21]
Why does deep and cheap learning work so well? Journal of Statistical Physics, 168:1223–1247, 2017
Henry W Lin, Max Tegmark, and David Rolnick. Why does deep and cheap learning work so well? Journal of Statistical Physics, 168:1223–1247, 2017
work page 2017
-
[22]
Nonlinear material design using principal stretches
Hongyi Xu, Funshing Sin, Yufeng Zhu, and Jernej Barbi ˇc. Nonlinear material design using principal stretches. ACM Transactions on Graphics (TOG), 34(4):1–11, 2015
work page 2015
-
[23]
A practical guide to splines, volume 27
Carl De Boor. A practical guide to splines, volume 27. springer-verlag New York, 1978
work page 1978
-
[24]
Utkarsh Sharma and Jared Kaplan. A neural scaling law from the dimension of the data manifold. arXiv preprint arXiv:2004.10802, 2020
-
[25]
Eric J Michaud, Ziming Liu, and Max Tegmark. Precision machine learning. Entropy, 25(1):175, 2023
work page 2023
-
[26]
Joel L Horowitz and Enno Mammen. Rate-optimal estimation for a general class of nonpara- metric regression models with unknown link functions. 2007
work page 2007
-
[27]
On the rate of convergence of fully connected deep neural network regression estimates
Michael Kohler and Sophie Langer. On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021
work page 2021
-
[28]
Nonparametric regression using deep neural networks with relu activation function
Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. 2020
work page 2020
-
[29]
Optimal nonlinear approximation
Ronald A DeV ore, Ralph Howard, and Charles Micchelli. Optimal nonlinear approximation. Manuscripta mathematica, 63:469–478, 1989
work page 1989
-
[30]
Wavelet compression and nonlinear n-widths
Ronald A DeV ore, George Kyriazis, Dany Leviatan, and Vladimir M Tikhomirov. Wavelet compression and nonlinear n-widths. Adv. Comput. Math., 1(2):197–214, 1993
work page 1993
-
[31]
Sharp lower bounds on the manifold widths of sobolev and besov spaces
Jonathan W Siegel. Sharp lower bounds on the manifold widths of sobolev and besov spaces. arXiv preprint arXiv:2402.04407, 2024
-
[32]
Error bounds for approximations with deep relu networks
Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Net- works, 94:103–114, 2017
work page 2017
-
[33]
Nearly-tight vc- dimension and pseudodimension bounds for piecewise linear neural networks
Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight vc- dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20(63):1–17, 2019
work page 2019
-
[34]
Optimal approximation rates for deep relu neural networks on sobolev and besov spaces
Jonathan W Siegel. Optimal approximation rates for deep relu neural networks on sobolev and besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023
work page 2023
-
[35]
Multi-stage neural networks: Function approximator of machine precision
Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of machine precision. Journal of Computational Physics, page 112865, 2024
work page 2024
-
[36]
Ai feynman: A physics-inspired method for sym- bolic regression
Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for sym- bolic regression. Science Advances, 6(16):eaay2631, 2020. 35
work page 2020
-
[37]
Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity
Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity. Advances in Neural Information Processing Systems, 33:4860–4871, 2020
work page 2020
-
[38]
Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural net- works: A deep learning framework for solving forward and inverse problems involving non- linear partial differential equations. Journal of Computational physics, 378:686–707, 2019
work page 2019
-
[39]
Physics-informed machine learning
George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021
work page 2021
-
[40]
Measuring catastrophic forgetting in neural networks
Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018
work page 2018
-
[41]
Bryan Kolb and Ian Q Whishaw. Brain plasticity and behavior. Annual review of psychology, 49(1):43–64, 1998
work page 1998
-
[42]
Modular and hierarchically modular organization of brain networks
David Meunier, Renaud Lambiotte, and Edward T Bullmore. Modular and hierarchically modular organization of brain networks. Frontiers in neuroscience, 4:7572, 2010
work page 2010
-
[43]
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017
work page 2017
-
[44]
Revisiting neural net- works for continual learning: An architectural perspective, 2024
Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, and Yanan Sun. Revisiting neural net- works for continual learning: An architectural perspective, 2024
work page 2024
-
[45]
Advancing mathe- matics by guiding human intuition with ai
Alex Davies, Petar Veliˇckovi´c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathe- matics by guiding human intuition with ai. Nature, 600(7887):70–74, 2021
work page 2021
-
[46]
Searching for rib- bons with machine learning, 2023
Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for rib- bons with machine learning, 2023
work page 2023
- [47]
-
[48]
Absence of diffusion in certain random lattices
Philip W Anderson. Absence of diffusion in certain random lattices. Physical review, 109(5):1492, 1958
work page 1958
-
[49]
David J Thouless. A relation between the density of states and range of localization for one dimensional random systems. Journal of Physics C: Solid State Physics, 5(1):77, 1972
work page 1972
-
[50]
Scaling theory of localization: Absence of quantum diffusion in two dimensions
Elihu Abrahams, PW Anderson, DC Licciardello, and TV Ramakrishnan. Scaling theory of localization: Absence of quantum diffusion in two dimensions. Physical Review Letters, 42(10):673, 1979
work page 1979
-
[51]
Fifty years of anderson localiza- tion
Ad Lagendijk, Bart van Tiggelen, and Diederik S Wiersma. Fifty years of anderson localiza- tion. Physics today, 62(8):24–29, 2009
work page 2009
-
[52]
Anderson localization of light
Mordechai Segev, Yaron Silberberg, and Demetrios N Christodoulides. Anderson localization of light. Nature Photonics, 7(3):197–204, 2013
work page 2013
-
[53]
Optics of photonic quasicrystals
Z Valy Vardeny, Ajay Nahata, and Amit Agrawal. Optics of photonic quasicrystals. Nature photonics, 7(3):177–187, 2013. 36
work page 2013
-
[54]
Strong localization of photons in certain disordered dielectric superlattices
Sajeev John. Strong localization of photons in certain disordered dielectric superlattices. Physical review letters, 58(23):2486, 1987
work page 1987
-
[55]
Observation of a localization transition in quasiperiodic photonic lattices
Yoav Lahini, Rami Pugatch, Francesca Pozzi, Marc Sorel, Roberto Morandotti, Nir David- son, and Yaron Silberberg. Observation of a localization transition in quasiperiodic photonic lattices. Physical review letters, 103(1):013901, 2009
work page 2009
-
[56]
Reen- trant delocalization transition in one-dimensional photonic quasicrystals
Sachin Vaidya, Christina Jörg, Kyle Linn, Megan Goh, and Mikael C Rechtsman. Reen- trant delocalization transition in one-dimensional photonic quasicrystals. Physical Review Research, 5(3):033170, 2023
work page 2023
-
[57]
Absence of many-body mobility edges
Wojciech De Roeck, Francois Huveneers, Markus Müller, and Mauro Schiulaz. Absence of many-body mobility edges. Physical Review B, 93(1):014203, 2016
work page 2016
-
[58]
Many-body localization and quantum nonergodicity in a model with a single-particle mobility edge
Xiaopeng Li, Sriram Ganeshan, JH Pixley, and S Das Sarma. Many-body localization and quantum nonergodicity in a model with a single-particle mobility edge. Physical review letters, 115(18):186601, 2015
work page 2015
-
[59]
Interactions and mobility edges: Observing the generalized aubry-andré model
Fangzhao Alex An, Karmela Padavi´c, Eric J Meier, Suraj Hegde, Sriram Ganeshan, JH Pixley, Smitha Vishveshwara, and Bryce Gadway. Interactions and mobility edges: Observing the generalized aubry-andré model. Physical review letters, 126(4):040603, 2021
work page 2021
-
[60]
J Biddle and S Das Sarma. Predicted mobility edges in one-dimensional incommensurate optical lattices: An exactly solvable model of anderson localization. Physical review letters, 104(7):070601, 2010
work page 2010
-
[61]
Self-consistent theory of mobility edges in quasiperiodic chains
Alexander Duthie, Sthitadhi Roy, and David E Logan. Self-consistent theory of mobility edges in quasiperiodic chains. Physical Review B, 103(6):L060201, 2021
work page 2021
-
[62]
Nearest neighbor tight binding models with an exact mobility edge in one dimension
Sriram Ganeshan, JH Pixley, and S Das Sarma. Nearest neighbor tight binding models with an exact mobility edge in one dimension. Physical review letters, 114(14):146601, 2015
work page 2015
-
[63]
One-dimensional quasiperiodic mosaic lattice with exact mobility edges
Yucheng Wang, Xu Xia, Long Zhang, Hepeng Yao, Shu Chen, Jiangong You, Qi Zhou, and Xiong-Jun Liu. One-dimensional quasiperiodic mosaic lattice with exact mobility edges. Physical Review Letters, 125(19):196604, 2020
work page 2020
-
[64]
Duality be- tween two generalized aubry-andré models with exact mobility edges
Yucheng Wang, Xu Xia, Yongjian Wang, Zuohuan Zheng, and Xiong-Jun Liu. Duality be- tween two generalized aubry-andré models with exact mobility edges. Physical Review B , 103(17):174205, 2021
work page 2021
-
[65]
Exact new mobility edges between critical and localized states
Xin-Chi Zhou, Yongjian Wang, Ting-Fung Jeffrey Poon, Qi Zhou, and Xiong-Jun Liu. Exact new mobility edges between critical and localized states. Physical Review Letters , 131(17):176401, 2023
work page 2023
-
[66]
Tomaso Poggio. How deep sparse networks avoid the curse of dimensionality: Efficiently computable functions are compositionally sparse. CBMM Memo, 10:2022, 2022
work page 2022
-
[67]
The kolmogorov–arnold representation theorem revisited
Johannes Schmidt-Hieber. The kolmogorov–arnold representation theorem revisited. Neural networks, 137:119–126, 2021
work page 2021
-
[68]
On the kolmogorov neural networks
Aysu Ismayilova and Vugar E Ismailov. On the kolmogorov neural networks. Neural Net- works, page 106333, 2024
work page 2024
-
[69]
A new iterative method for construction of the kolmogorov-arnold representation
Michael Poluektov and Andrew Polar. A new iterative method for construction of the kolmogorov-arnold representation. arXiv preprint arXiv:2305.08194, 2023. 37
-
[70]
Neural additive models: Interpretable machine learning with neural nets
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34:4699–4711, 2021
work page 2021
- [71]
-
[72]
Optimizing kernel machines using deep learning
Huan Song, Jayaraman J Thiagarajan, Prasanna Sattigeri, and Andreas Spanias. Optimizing kernel machines using deep learning. IEEE transactions on neural networks and learning systems, 29(11):5528–5540, 2018
work page 2018
-
[73]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[74]
Scaling Laws for Autoregressive Generative Modeling
Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Hee- woo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701, 2020
work page internal anchor Pith review arXiv 2010
-
[75]
Data and parameter scaling laws for neural machine translation
Mitchell A Gordon, Kevin Duh, and Jared Kaplan. Data and parameter scaling laws for neural machine translation. In ACL Rolling Review - May 2021, 2021
work page 2021
-
[76]
Deep Learning Scaling is Predictable, Empirically
Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kia- ninejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017
work page internal anchor Pith review arXiv 2017
- [77]
-
[78]
The quantization model of neural scaling
Eric J Michaud, Ziming Liu, Uzay Girit, and Max Tegmark. The quantization model of neural scaling. In Thirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[79]
A resource model for neural scaling law
Jinyeop Song, Ziming Liu, Max Tegmark, and Jeff Gore. A resource model for neural scaling law. arXiv preprint arXiv:2402.05164, 2024
-
[80]
In-context Learning and Induction Heads
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.