Optimized Architectures for Kolmogorov-Arnold Networks

James Bagrow; Josh Bongard

arxiv: 2512.12448 · v2 · submitted 2025-12-13 · 💻 cs.LG · cs.NE· physics.data-an· stat.ML

Optimized Architectures for Kolmogorov-Arnold Networks

James Bagrow , Josh Bongard This is my paper

Pith reviewed 2026-05-16 22:23 UTC · model grok-4.3

classification 💻 cs.LG cs.NEphysics.data-anstat.ML

keywords Kolmogorov-Arnold networkssparsificationdepth selectionminimum description lengthmodel optimizationinterpretable machine learningfunction approximation

0 comments

The pith

Combining sparsification with depth selection in overprovisioned KANs yields smaller models with competitive accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Kolmogorov-Arnold networks offer interpretable alternatives to standard neural nets but enhancements often reduce that benefit. This work shows that starting with overprovisioned architectures and applying sparsification along with depth selection under a minimum description length objective allows end-to-end differentiable optimization of the model structure. Experiments indicate that while sparsification by itself is not enough, adding depth selection produces models that are substantially smaller yet match or beat accuracy on function approximation, dynamical systems, and real-world tasks. This matters for scientific machine learning where both accuracy and the ability to inspect the model are needed.

Core claim

Overprovisioned KAN architectures combined with sparsification, deep supervision, and depth selection, optimized differentiably under a minimum description length objective, allow learning compact interpretable networks that achieve competitive or superior accuracy across benchmarks without the complexity of other enhancements.

What carries the argument

Differentiable joint optimization of activations, structure, and depth under a minimum description length objective applied to overprovisioned KANs

If this is right

Substantially smaller models are discovered while accuracy remains competitive or better.
Interpretability is preserved through the principled optimization process.
The approach outperforms sparsification alone on multiple task types.
End-to-end optimization of model depth becomes practical for KANs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar optimization strategies could be tested on other network types to balance size and performance.
This may enable wider adoption of KANs in domains requiring model inspection such as physics-informed modeling.
The method suggests a general template for making interpretable models more practical by overprovisioning then pruning.

Load-bearing premise

Differentiable mechanisms under the minimum description length objective can jointly optimize activations, structure, and depth end-to-end while preserving interpretability.

What would settle it

Demonstrating on the paper's benchmarks that the full method does not produce smaller models with accuracy at least as good as sparsification alone would falsify the central result.

Figures

Figures reproduced from arXiv: 2512.12448 by James Bagrow, Josh Bongard.

**Figure 1.** Figure 1: Learning the example function 𝑧 = sin 𝑥 + 𝑦 2 with Kolmogorov–Arnold Networks (KANs). Forward connections are highlighted in blue. 2 Background 2.1 Kolmogorov–Arnold Networks Kolmogorov–Arnold Networks (KANs), motivated by the Kolmogorov–Arnold Representation Theorem [17, 18, 19], consist of 𝐿 layers with shapes [𝑛0, 𝑛1, . . . , 𝑛𝐿]. The layer update is given by: 𝑥 (ℓ+1) 𝑗 = ∑︁𝑛ℓ 𝑖=1 𝜙ℓ𝑖 𝑗 𝑥 (ℓ ) 𝑖 (… view at source ↗

read the original abstract

Efforts to improve Kolmogorov--Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification, deep supervision, and depth selection, to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable mechanisms under a principled minimum description length objective, jointly optimizing activations, structure, and depth end-to-end. Experiments across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks demonstrate that sparsification alone is insufficient, but the combination with depth selection achieves competitive or superior accuracy while discovering substantially smaller models. The result is a principled path toward models that are both more expressive and more interpretable, addressing a key tension in scientific machine learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical recipe for compact KANs by overprovisioning then sparsifying under a joint differentiable MDL objective, but the gains rest on benchmarks without visible error bars or clean separation of selection and evaluation data.

read the letter

The core contribution is showing that overprovisioned KANs, sparsified with a differentiable MDL term plus depth selection and deep supervision, produce smaller networks that still hit competitive accuracy on function approximation, dynamical systems, and real-world tasks. Sparsification by itself falls short; adding the depth choice step is what actually shrinks the models without much accuracy loss. That combination under one end-to-end objective is the new piece relative to earlier KAN work, and it directly targets the expressivity-interpretability tradeoff that has limited adoption so far. The approach is straightforward to implement on top of existing KAN code, which is a plus for anyone already using them in scientific settings. The MDL framing supplies a clean regularization signal that avoids arbitrary pruning thresholds. On the downside, the reported results lack error bars and detailed ablation tables, so the size reductions could be sensitive to random seeds or specific data splits. There is also a real risk that the same data used to drive the MDL objective is later used to claim superiority, which would make the “substantially smaller” claim circular unless a held-out validation set was used for the final comparisons. Interpretability is asserted but not measured beyond architecture size, so it is unclear whether the resulting networks remain as readable as the original KAN motivation requires. This is solid engineering work for the KAN community and for people building surrogate models in physics or engineering. It is not a theoretical breakthrough, but the method is concrete enough that a referee could give useful feedback on the experimental controls and on whether the interpretability benefit survives the sparsification. I would send it out for review rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an approach to optimize Kolmogorov-Arnold Networks (KANs) using overprovisioned architectures sparsified via differentiable mechanisms under a minimum description length (MDL) objective, combined with deep supervision and depth selection. This is claimed to yield compact, interpretable models with competitive accuracy on function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks, where sparsification alone is insufficient but the full combination succeeds.

Significance. If the central claims hold, this work offers a significant contribution to scientific machine learning by providing a principled, end-to-end differentiable method to balance expressiveness and interpretability in KANs. The emphasis on MDL for joint optimization of structure and depth is a strength, potentially leading to more reliable models in applications requiring interpretability.

major comments (2)

Abstract: The abstract reports competitive results on multiple benchmarks but provides no visible error bars, ablation details, or data exclusion criteria; this makes the central claim of 'substantially smaller models' with competitive accuracy difficult to verify from the given information.
Abstract/Experiments narrative: The MDL objective is presented as principled for jointly optimizing activations, structure, and depth, yet the description leaves open whether structure and depth selection use the same data as the accuracy evaluation; without explicit train/validation separation this risks circularity in the 'substantially smaller models' claim.

minor comments (2)

Abstract: The role of 'deep supervision' in the overall pipeline is mentioned but not elaborated; a brief description of its differentiable implementation would improve clarity.
Abstract: The assumption that the differentiable MDL mechanisms preserve the original interpretability motivation of KANs is stated but would benefit from a short supporting discussion or example in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will revise the manuscript to strengthen the abstract and clarify experimental protocols. All changes will be incorporated in the next version.

read point-by-point responses

Referee: Abstract: The abstract reports competitive results on multiple benchmarks but provides no visible error bars, ablation details, or data exclusion criteria; this makes the central claim of 'substantially smaller models' with competitive accuracy difficult to verify from the given information.

Authors: We agree that the abstract would benefit from greater specificity. In the revised version we will add a brief statement on error bars (computed over multiple random seeds), summarize the key ablation outcomes that isolate the contribution of depth selection, and note the data exclusion criteria used for the real-world tasks. These additions will be kept concise while making the 'substantially smaller models' claim directly verifiable from the abstract. revision: yes
Referee: Abstract/Experiments narrative: The MDL objective is presented as principled for jointly optimizing activations, structure, and depth, yet the description leaves open whether structure and depth selection use the same data as the accuracy evaluation; without explicit train/validation separation this risks circularity in the 'substantially smaller models' claim.

Authors: We appreciate the referee highlighting this potential ambiguity. In the full experimental protocol, structure and depth selection are performed on a held-out validation split that is disjoint from both the training data and the final test sets used for accuracy reporting. We will explicitly state this separation in the revised abstract and in the experimental setup section to remove any possibility of circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents an approach using differentiable mechanisms under a minimum description length objective to jointly optimize activations, structure, and depth in overprovisioned KANs, with experimental results across benchmarks showing that combining sparsification and depth selection yields smaller models with competitive accuracy. No load-bearing derivation step in the abstract or described claims reduces by construction to a self-definition, fitted input renamed as prediction, or self-citation chain. The experimental narrative relies on ablation-style comparisons that remain independent of the optimization inputs, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are detailed in the provided text.

axioms (1)

domain assumption Sparsification and depth selection under MDL preserve KAN interpretability advantages
Central to the claim that the resulting models remain more interpretable than alternatives.

pith-pipeline@v0.9.0 · 5442 in / 1169 out tokens · 26219 ms · 2026-05-16T22:23:39.192556+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

LMDL = Lmodel + Lmodel|data ... Lmodel = (log n / n) ||θ||0 ... with ||θ||0 approximated via E[z] gate expectations under differentiable L0 relaxation
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Overprovisioning and sparsification are synergistic... combination with depth selection achieves competitive or superior accuracy while discovering substantially smaller models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

KANs need curvature: penalties for compositional smoothness
cs.LG 2026-05 unverdicted novelty 7.0

A curvature penalty for KANs, derived to respect compositional effects and equipped with a proven upper bound on full-model curvature, produces smoother activations while preserving accuracy.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Highly accurate protein structure prediction with alphafold

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvu- nakool, Russ Bates, Augustin ˇZ´ıdek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021. 1

work page 2021
[2]

Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021. 1

work page 2021
[3]

Scientific discovery in the age of artificial intelligence.Nature, 620(7972):47–60, 2023

Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. Scientific discovery in the age of artificial intelligence.Nature, 620(7972):47–60, 2023. 1

work page 2023
[4]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019. 1

work page 2019
[5]

A survey of methods for explaining black box models.ACM computing surveys (CSUR), 51(5):1–42, 2018

Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models.ACM computing surveys (CSUR), 51(5):1–42, 2018. 1

work page 2018
[6]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 1, 4 9

work page 2016
[7]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017. 1, 4

work page 2017
[8]

Statistical learning with sparsity.Monographs on statistics and applied probability, 143(143):8, 2015

Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity.Monographs on statistics and applied probability, 143(143):8, 2015. 1

work page 2015
[9]

Optimal brain damage.Advances in neural information processing systems, 2,

Yann LeCun, John Denker, and Sara Solla. Optimal brain damage.Advances in neural information processing systems, 2,

work page
[10]

Brunton, Joshua L

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identifi- cation of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016. 1, 6

work page 2016
[11]

Christos Louizos, Max Welling, and Diederik P. Kingma. Learning sparse neural networks through𝑙 0 regularization. In International Conference on Learning Representations, 2018. 1, 3, 5

work page 2018
[12]

DARTS: Differentiable architecture search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019. 1

work page 2019
[13]

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Neural architecture search: A survey.Journal of Machine Learning Research, 20(55):1–21, 2019

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey.Journal of Machine Learning Research, 20(55):1–21, 2019. 1

work page 2019
[15]

Hou, and Max Tegmark

Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljacic, Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov–Arnold networks. InThe Thirteenth International Conference on Learning Representations,

work page
[16]

arXiv preprint arXiv:2408.10205 , year=

Ziming Liu, Pingchuan Ma, Yixuan Wang, Wojciech Matusik, and Max Tegmark. KAN 2.0: Kolmogorov–Arnold Networks meet science.arXiv preprint arXiv:2408.10205, 2024. 1, 2, 3

work page arXiv 2024
[17]

American Mathematical Society, 1961

Andre ˘ı Nikolaevich Kolmogorov.On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables. American Mathematical Society, 1961. 2

work page 1961
[18]

On functions of three variables.Collected Works: Representations of Functions, Celestial Mechanics and KAM Theory, 1957–1965, pages 5–8, 2009

Vladimir I Arnold. On functions of three variables.Collected Works: Representations of Functions, Celestial Mechanics and KAM Theory, 1957–1965, pages 5–8, 2009. 2

work page 1957
[19]

On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition

Andrei Nikolaevich Kolmogorov. On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition. InDokl. Akad. Nauk USSR, volume 114, pages 953–956, 1957. 2

work page 1957
[20]

Kolmogorov-Arnold networks are radial basis function ne tworks

Ziyao Li. Kolmogorov–Arnold networks are Radial Basis Function networks.arXiv preprint arXiv:2405.06721, 2024. 2

work page arXiv 2024
[21]

FourierKAN.https://github.com/GistNoesis/FourierKAN, 2024

GistNoesis. FourierKAN.https://github.com/GistNoesis/FourierKAN, 2024. Accessed: 2025-07-07. 2

work page 2024
[22]

SineKAN: Kolmogorov–Arnold networks using sinusoidal activation functions.Frontiers in Artificial Intelligence, 7, 2025

Eric Reinhardt, Dinesh Ramakrishnan, and Sergei Gleyzer. SineKAN: Kolmogorov–Arnold networks using sinusoidal activation functions.Frontiers in Artificial Intelligence, 7, 2025. ISSN 2624-8212. doi: 10.3389/frai.2024.1462952. 2

work page doi:10.3389/frai.2024.1462952 2025
[23]

Bozorgasl and H

Zavareh Bozorgasl and Hao Chen. Wav-KAN: Wavelet Kolmogorov–Arnold networks.arXiv preprint arXiv:2405.12832,

work page arXiv
[24]

Sidharth, A

SS Sidharth, AR Keerthana, R Gokul, and KP Anas. Chebyshev polynomial-based Kolmogorov–Arnold networks: An efficient architecture for nonlinear function approximation.arXiv preprint arXiv:2405.07200, 2024. 2

work page arXiv 2024
[25]

Inferences from Multinomal Data: Learning about a bag of marbles

Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, December 2018. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1996.tb02080.x. URL https://doi.org/10.1111/j.2517-6161.1996.tb02080.x. 3

work page doi:10.1111/j.2517-6161.1996.tb02080.x 2018
[26]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. 3, 9 10

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

The State of Sparsity in Deep Neural Networks

Trevor Gale, Erich Elsen, and Sara Hooker. The state of sparsity in deep neural networks.arXiv preprint arXiv:1902.09574,

work page internal anchor Pith review Pith/arXiv arXiv 1902
[28]

Extrapolation and learning equations

Georg Martius and Christoph H Lampert. Extrapolation and learning equations.arXiv preprint arXiv:1610.02995, 2016. 3

work page internal anchor Pith review Pith/arXiv arXiv 2016
[29]

Learning equations for extrapolation and control

Subham Sahoo, Christoph Lampert, and Georg Martius. Learning equations for extrapolation and control. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4442–4450. PMLR, 10–15 Jul 2018. 3

work page 2018
[30]

Integration of neural network-based symbolic regression in deep learning for scientific discovery.IEEE transactions on neural networks and learning systems, 32(9):4166–4177, 2020

Samuel Kim, Peter Y Lu, Srijon Mukherjee, Michael Gilbert, Li Jing, VladimirˇCeperi´c, and Marin Soljaˇci´c. Integration of neural network-based symbolic regression in deep learning for scientific discovery.IEEE transactions on neural networks and learning systems, 32(9):4166–4177, 2020. 4

work page 2020
[31]

Lu, and Marin Solja ˇci´c

Michael Zhang, Samuel Kim, Peter Y. Lu, and Marin Solja ˇci´c. Deep learning and symbolic regression for discovering parametric equations.IEEE Transactions on Neural Networks and Learning Systems, 35(11):16775–16787, 2024. 4

work page 2024
[32]

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks.Journal of Machine Learning Research, 22(241):1–124,

Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks.Journal of Machine Learning Research, 22(241):1–124,

work page
[33]

Model selection and estimation in regression with grouped variables.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006. 4

work page 2006
[34]

The benefit of group sparsity.The Annals of Statistics, 38(4):1978 – 2004, 2010

Junzhou Huang and Tong Zhang. The benefit of group sparsity.The Annals of Statistics, 38(4):1978 – 2004, 2010. doi: 10.1214/09-AOS778. URLhttps://doi.org/10.1214/09-AOS778. 4

work page doi:10.1214/09-aos778 1978
[35]

Softly symbolifying kolmogorov-arnold networks.arXiv preprint arXiv:2512.07875,

James Bagrow and Josh Bongard. Softly symbolifying kolmogorov-arnold networks.arXiv preprint arXiv:2512.07875,

work page arXiv
[36]

Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

Gideon Schwarz. Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978. ISSN 00905364, 21688966. 5

work page 1978
[37]

Nguyen Quang Uy, Nguyen Xuan Hoai, Michael O’Neill, R. I. McKay, and Edgar Galv ´an-L´opez. Semantically-based crossover in genetic programming: application to real-valued symbolic regression.Genetic Programming and Evolvable Machines, 12(2):91–119, 2011. 5, 9

work page 2011
[38]

Benjamin C Koenig, Suyong Kim, and Sili Deng. KAN-ODEs: Kolmogorov–Arnold network ordinary differential equations for learning dynamical systems and hidden physics.Computer Methods in Applied Mechanics and Engineering, 432:117397, 2024. 5

work page 2024
[39]

Bollt, and Ying-Cheng Lai

Shirin Panahi, Mohammadamin Moradi, Erik M. Bollt, and Ying-Cheng Lai. Data-driven model discovery with Kolmogorov–Arnold networks.Phys. Rev. Res., 7:023037, Apr 2025. 5, 6, 9

work page 2025
[40]

Multi-exit kolmogorov–arnold networks: enhancing accuracy and parsimony.Machine Learning: Science and Technology, 6(3):035037, aug 2025

James Bagrow and Josh Bongard. Multi-exit kolmogorov–arnold networks: enhancing accuracy and parsimony.Machine Learning: Science and Technology, 6(3):035037, aug 2025. 5, 6, 7, 8, 9

work page 2025
[41]

Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system.Optics communications, 30(2):257–261, 1979

Kensuke Ikeda. Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system.Optics communications, 30(2):257–261, 1979. 5

work page 1979
[42]

Global dynamical behavior of the optical field in a ring cavity.Journal of the Optical Society of America B, 2(4):552–564, 1985

SM Hammel, CKRT Jones, and Jerome V Moloney. Global dynamical behavior of the optical field in a ring cavity.Journal of the Optical Society of America B, 2(4):552–564, 1985. 5

work page 1985
[43]

Nonlinear dynamics and population disappearances.The American Naturalist, 144(5): 873–879, 1994

Kevin McCann and Peter Yodzis. Nonlinear dynamics and population disappearances.The American Naturalist, 144(5): 873–879, 1994. 6

work page 1994
[44]

Neville.Properties of Concrete

Adam M. Neville.Properties of Concrete. Pearson, 5th edition, 2011. 7

work page 2011
[45]

I.-C. Yeh. Modeling of strength of high-performance concrete using artificial neural networks.Cement and Concrete Research, 28(12):1797–1808, 1998. ISSN 0008-8846. 7, 9 11

work page 1998
[46]

Analysis of strength of concrete using design of experiments and neural networks.Journal of Materials in Civil Engineering, 18(4):597–604, 2006

I-Cheng Yeh. Analysis of strength of concrete using design of experiments and neural networks.Journal of Materials in Civil Engineering, 18(4):597–604, 2006. 7, 9

work page 2006
[47]

A data-driven statistical model for predicting the critical temperature of a superconductor.Computational Materials Science, 154:346–354, 2018

Kam Hamidieh. A data-driven statistical model for predicting the critical temperature of a superconductor.Computational Materials Science, 154:346–354, 2018. 7, 9

work page 2018
[48]

MDR SuperCon datasheet ver.240322

Center for Basic Research on Materials. MDR SuperCon datasheet ver.240322. 7, 9

work page
[49]

Categorical reparameterization with gumbel-softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=rkE3y85ee. 8

work page 2017
[50]

Maddison, Andriy Mnih, and Yee Whye Teh

Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/ forum?id=S1jE5L5gl. 8 12

work page 2017

[1] [1]

Highly accurate protein structure prediction with alphafold

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvu- nakool, Russ Bates, Augustin ˇZ´ıdek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021. 1

work page 2021

[2] [2]

Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021. 1

work page 2021

[3] [3]

Scientific discovery in the age of artificial intelligence.Nature, 620(7972):47–60, 2023

Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. Scientific discovery in the age of artificial intelligence.Nature, 620(7972):47–60, 2023. 1

work page 2023

[4] [4]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019. 1

work page 2019

[5] [5]

A survey of methods for explaining black box models.ACM computing surveys (CSUR), 51(5):1–42, 2018

Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models.ACM computing surveys (CSUR), 51(5):1–42, 2018. 1

work page 2018

[6] [6]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 1, 4 9

work page 2016

[7] [7]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017. 1, 4

work page 2017

[8] [8]

Statistical learning with sparsity.Monographs on statistics and applied probability, 143(143):8, 2015

Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity.Monographs on statistics and applied probability, 143(143):8, 2015. 1

work page 2015

[9] [9]

Optimal brain damage.Advances in neural information processing systems, 2,

Yann LeCun, John Denker, and Sara Solla. Optimal brain damage.Advances in neural information processing systems, 2,

work page

[10] [10]

Brunton, Joshua L

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identifi- cation of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016. 1, 6

work page 2016

[11] [11]

Christos Louizos, Max Welling, and Diederik P. Kingma. Learning sparse neural networks through𝑙 0 regularization. In International Conference on Learning Representations, 2018. 1, 3, 5

work page 2018

[12] [12]

DARTS: Differentiable architecture search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019. 1

work page 2019

[13] [13]

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Neural architecture search: A survey.Journal of Machine Learning Research, 20(55):1–21, 2019

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey.Journal of Machine Learning Research, 20(55):1–21, 2019. 1

work page 2019

[15] [15]

Hou, and Max Tegmark

Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljacic, Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov–Arnold networks. InThe Thirteenth International Conference on Learning Representations,

work page

[16] [16]

arXiv preprint arXiv:2408.10205 , year=

Ziming Liu, Pingchuan Ma, Yixuan Wang, Wojciech Matusik, and Max Tegmark. KAN 2.0: Kolmogorov–Arnold Networks meet science.arXiv preprint arXiv:2408.10205, 2024. 1, 2, 3

work page arXiv 2024

[17] [17]

American Mathematical Society, 1961

Andre ˘ı Nikolaevich Kolmogorov.On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables. American Mathematical Society, 1961. 2

work page 1961

[18] [18]

On functions of three variables.Collected Works: Representations of Functions, Celestial Mechanics and KAM Theory, 1957–1965, pages 5–8, 2009

Vladimir I Arnold. On functions of three variables.Collected Works: Representations of Functions, Celestial Mechanics and KAM Theory, 1957–1965, pages 5–8, 2009. 2

work page 1957

[19] [19]

On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition

Andrei Nikolaevich Kolmogorov. On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition. InDokl. Akad. Nauk USSR, volume 114, pages 953–956, 1957. 2

work page 1957

[20] [20]

Kolmogorov-Arnold networks are radial basis function ne tworks

Ziyao Li. Kolmogorov–Arnold networks are Radial Basis Function networks.arXiv preprint arXiv:2405.06721, 2024. 2

work page arXiv 2024

[21] [21]

FourierKAN.https://github.com/GistNoesis/FourierKAN, 2024

GistNoesis. FourierKAN.https://github.com/GistNoesis/FourierKAN, 2024. Accessed: 2025-07-07. 2

work page 2024

[22] [22]

SineKAN: Kolmogorov–Arnold networks using sinusoidal activation functions.Frontiers in Artificial Intelligence, 7, 2025

Eric Reinhardt, Dinesh Ramakrishnan, and Sergei Gleyzer. SineKAN: Kolmogorov–Arnold networks using sinusoidal activation functions.Frontiers in Artificial Intelligence, 7, 2025. ISSN 2624-8212. doi: 10.3389/frai.2024.1462952. 2

work page doi:10.3389/frai.2024.1462952 2025

[23] [23]

Bozorgasl and H

Zavareh Bozorgasl and Hao Chen. Wav-KAN: Wavelet Kolmogorov–Arnold networks.arXiv preprint arXiv:2405.12832,

work page arXiv

[24] [24]

Sidharth, A

SS Sidharth, AR Keerthana, R Gokul, and KP Anas. Chebyshev polynomial-based Kolmogorov–Arnold networks: An efficient architecture for nonlinear function approximation.arXiv preprint arXiv:2405.07200, 2024. 2

work page arXiv 2024

[25] [25]

Inferences from Multinomal Data: Learning about a bag of marbles

Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, December 2018. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1996.tb02080.x. URL https://doi.org/10.1111/j.2517-6161.1996.tb02080.x. 3

work page doi:10.1111/j.2517-6161.1996.tb02080.x 2018

[26] [26]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. 3, 9 10

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [27]

The State of Sparsity in Deep Neural Networks

Trevor Gale, Erich Elsen, and Sara Hooker. The state of sparsity in deep neural networks.arXiv preprint arXiv:1902.09574,

work page internal anchor Pith review Pith/arXiv arXiv 1902

[28] [28]

Extrapolation and learning equations

Georg Martius and Christoph H Lampert. Extrapolation and learning equations.arXiv preprint arXiv:1610.02995, 2016. 3

work page internal anchor Pith review Pith/arXiv arXiv 2016

[29] [29]

Learning equations for extrapolation and control

Subham Sahoo, Christoph Lampert, and Georg Martius. Learning equations for extrapolation and control. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4442–4450. PMLR, 10–15 Jul 2018. 3

work page 2018

[30] [30]

Integration of neural network-based symbolic regression in deep learning for scientific discovery.IEEE transactions on neural networks and learning systems, 32(9):4166–4177, 2020

Samuel Kim, Peter Y Lu, Srijon Mukherjee, Michael Gilbert, Li Jing, VladimirˇCeperi´c, and Marin Soljaˇci´c. Integration of neural network-based symbolic regression in deep learning for scientific discovery.IEEE transactions on neural networks and learning systems, 32(9):4166–4177, 2020. 4

work page 2020

[31] [31]

Lu, and Marin Solja ˇci´c

Michael Zhang, Samuel Kim, Peter Y. Lu, and Marin Solja ˇci´c. Deep learning and symbolic regression for discovering parametric equations.IEEE Transactions on Neural Networks and Learning Systems, 35(11):16775–16787, 2024. 4

work page 2024

[32] [32]

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks.Journal of Machine Learning Research, 22(241):1–124,

Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks.Journal of Machine Learning Research, 22(241):1–124,

work page

[33] [33]

Model selection and estimation in regression with grouped variables.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006

Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006. 4

work page 2006

[34] [34]

The benefit of group sparsity.The Annals of Statistics, 38(4):1978 – 2004, 2010

Junzhou Huang and Tong Zhang. The benefit of group sparsity.The Annals of Statistics, 38(4):1978 – 2004, 2010. doi: 10.1214/09-AOS778. URLhttps://doi.org/10.1214/09-AOS778. 4

work page doi:10.1214/09-aos778 1978

[35] [35]

Softly symbolifying kolmogorov-arnold networks.arXiv preprint arXiv:2512.07875,

James Bagrow and Josh Bongard. Softly symbolifying kolmogorov-arnold networks.arXiv preprint arXiv:2512.07875,

work page arXiv

[36] [36]

Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978

Gideon Schwarz. Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464, 1978. ISSN 00905364, 21688966. 5

work page 1978

[37] [37]

Nguyen Quang Uy, Nguyen Xuan Hoai, Michael O’Neill, R. I. McKay, and Edgar Galv ´an-L´opez. Semantically-based crossover in genetic programming: application to real-valued symbolic regression.Genetic Programming and Evolvable Machines, 12(2):91–119, 2011. 5, 9

work page 2011

[38] [38]

Benjamin C Koenig, Suyong Kim, and Sili Deng. KAN-ODEs: Kolmogorov–Arnold network ordinary differential equations for learning dynamical systems and hidden physics.Computer Methods in Applied Mechanics and Engineering, 432:117397, 2024. 5

work page 2024

[39] [39]

Bollt, and Ying-Cheng Lai

Shirin Panahi, Mohammadamin Moradi, Erik M. Bollt, and Ying-Cheng Lai. Data-driven model discovery with Kolmogorov–Arnold networks.Phys. Rev. Res., 7:023037, Apr 2025. 5, 6, 9

work page 2025

[40] [40]

Multi-exit kolmogorov–arnold networks: enhancing accuracy and parsimony.Machine Learning: Science and Technology, 6(3):035037, aug 2025

James Bagrow and Josh Bongard. Multi-exit kolmogorov–arnold networks: enhancing accuracy and parsimony.Machine Learning: Science and Technology, 6(3):035037, aug 2025. 5, 6, 7, 8, 9

work page 2025

[41] [41]

Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system.Optics communications, 30(2):257–261, 1979

Kensuke Ikeda. Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system.Optics communications, 30(2):257–261, 1979. 5

work page 1979

[42] [42]

Global dynamical behavior of the optical field in a ring cavity.Journal of the Optical Society of America B, 2(4):552–564, 1985

SM Hammel, CKRT Jones, and Jerome V Moloney. Global dynamical behavior of the optical field in a ring cavity.Journal of the Optical Society of America B, 2(4):552–564, 1985. 5

work page 1985

[43] [43]

Nonlinear dynamics and population disappearances.The American Naturalist, 144(5): 873–879, 1994

Kevin McCann and Peter Yodzis. Nonlinear dynamics and population disappearances.The American Naturalist, 144(5): 873–879, 1994. 6

work page 1994

[44] [44]

Neville.Properties of Concrete

Adam M. Neville.Properties of Concrete. Pearson, 5th edition, 2011. 7

work page 2011

[45] [45]

I.-C. Yeh. Modeling of strength of high-performance concrete using artificial neural networks.Cement and Concrete Research, 28(12):1797–1808, 1998. ISSN 0008-8846. 7, 9 11

work page 1998

[46] [46]

Analysis of strength of concrete using design of experiments and neural networks.Journal of Materials in Civil Engineering, 18(4):597–604, 2006

I-Cheng Yeh. Analysis of strength of concrete using design of experiments and neural networks.Journal of Materials in Civil Engineering, 18(4):597–604, 2006. 7, 9

work page 2006

[47] [47]

A data-driven statistical model for predicting the critical temperature of a superconductor.Computational Materials Science, 154:346–354, 2018

Kam Hamidieh. A data-driven statistical model for predicting the critical temperature of a superconductor.Computational Materials Science, 154:346–354, 2018. 7, 9

work page 2018

[48] [48]

MDR SuperCon datasheet ver.240322

Center for Basic Research on Materials. MDR SuperCon datasheet ver.240322. 7, 9

work page

[49] [49]

Categorical reparameterization with gumbel-softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=rkE3y85ee. 8

work page 2017

[50] [50]

Maddison, Andriy Mnih, and Yee Whye Teh

Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/ forum?id=S1jE5L5gl. 8 12

work page 2017