arxiv: 2604.23765 · v1 · submitted 2026-04-26 · 💻 cs.LG · cs.NE· math.FA

Recognition: unknown

Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Vugar Ismailov

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:23 UTC · model grok-4.3

classification 💻 cs.LG cs.NEmath.FA

keywords Kolmogorov-Arnold networksuniversal approximationedge functionsaffine functionsnon-affine functionsspline parameterizationdensity in C(K)

0 comments

The pith

Deep KANs achieve universal approximation on compact sets precisely when they include one fixed non-affine continuous edge function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper determines exactly when Kolmogorov-Arnold networks can approximate any continuous function on a compact domain. Deep versions that restrict all edge functions to be either affine or identical to one fixed continuous function σ succeed if and only if σ itself is non-affine. Networks limited to exactly two hidden layers require the stronger condition that σ be nonpolynomial. The full set of affine functions can be replaced by a finite collection without losing the density property, and even fixed spline parameterizations of the edges remain universal.

Core claim

Deep KANs in which all edge functions are either affine or equal to a fixed continuous function σ are dense in C(K) for every compact set K⊂R^n if and only if σ is non-affine. For KANs with exactly two hidden layers, universality holds if and only if σ is nonpolynomial. The full class of affine functions is not required and can be replaced by a finite set; in the nonpolynomial case a fixed family of five affine functions suffices for arbitrary depth. KANs that use the spline-based edge parameterization with fixed degree and knot sequence are also universal approximators.

What carries the argument

The layered KAN structure of summation nodes connected by univariate edge functions, where the edge functions are restricted to the union of all affine maps and one fixed continuous σ.

If this is right

Any continuous function on a compact domain can be approximated arbitrarily well by deep KANs once a single non-affine continuous σ is included among the edge functions.
Two-hidden-layer KANs require σ to be nonpolynomial rather than merely non-affine for the same density result.
A finite collection of affine functions together with one σ is enough to recover universality for arbitrary depth.
Spline-parameterized KANs remain universal approximators even when the spline degree and knot locations are fixed in advance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Architectures can be simplified by using only one carefully chosen non-affine function instead of many different activations.
The gap between the non-affine condition for deep networks and the nonpolynomial condition for shallow ones indicates that depth relaxes the required degree of nonlinearity.
Fixed finite affine families offer a route to reduce parameter count while preserving approximation guarantees.

Load-bearing premise

Edge functions are continuous real-valued maps and the network uses the standard KAN architecture of sums at nodes with function compositions along edges.

What would settle it

An explicit continuous function on a compact set K⊂R^n that cannot be approximated uniformly to arbitrary accuracy by any deep KAN whose edges are chosen from the affines together with one fixed non-affine σ.

read the original abstract

We analyze the universal approximation property of Kolmogorov-Arnold Networks (KANs) in terms of their edge functions. If these functions are all affine, then universality clearly fails. How many non-affine functions are needed, in addition to affine ones, to ensure universality? We show that a single one suffices. More precisely, we prove that deep KANs in which all edge functions are either affine or equal to a fixed continuous function $\sigma$ are dense in $C(K)$ for every compact set $K\subset\mathbb{R}^n$ if and only if $\sigma$ is non-affine. In contrast, for KANs with exactly two hidden layers, universality holds if and only if $\sigma$ is nonpolynomial. We further show that the full class of affine functions is not required; it can be replaced by a finite set without affecting universality. In particular, in the nonpolynomial case, a fixed family of five affine functions suffices when the depth is arbitrary. More generally, for every continuous non-affine function $\sigma$, there exists a finite affine family $A_\sigma$ such that deep KANs with edge functions in $A_\sigma\cup\{\sigma\}$ remain universal. We also prove that KANs with the spline-based edge parameterization introduced by Liu et al.~\cite{Liu2024} are universal approximators in the classical sense, even when the spline degree and knot sequence are fixed in advance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A single fixed non-affine function plus affines makes deep KANs universal, with finite affines enough and two-layer cases needing non-polynomials.

read the letter

The main result is that deep KANs using only affine edge functions plus one fixed continuous non-affine function sigma are dense in the continuous functions on any compact set if and only if sigma is non-affine. For networks with exactly two hidden layers the condition is that sigma must be non-polynomial. They also show that the full affine family can be cut down to a finite subset, with five affines working in the non-polynomial case for arbitrary depth, and that the fixed spline parameterization is universal. This is new because prior KAN work did not have these sharp if-and-only-if characterizations or the finite reduction. The paper does well by separating the depth cases clearly and by confirming the practical spline version directly. The necessity proofs are immediate from the fact that affines or bounded-degree polynomials cannot approximate arbitrary continuous functions. The sufficiency arguments appear to rest on standard density results from approximation theory applied to the KAN structure. I see no major soft spots in the logic. The continuity of the functions and the compact domain are standard assumptions that keep the claims grounded. Any details on the exact multivariate compositions would be worth checking in the proofs, but they do not seem to introduce contradictions. This paper is for readers interested in the theoretical foundations of KANs and neural approximation more generally. It gives clear guidance on minimal requirements for universality that could help both analysis and implementation. It deserves a serious referee because the results are precise and add new theorems to the literature on expressivity. I would recommend putting it through peer review.

Referee Report

0 major / 3 minor

Summary. The paper establishes necessary and sufficient conditions for the universal approximation property of Kolmogorov-Arnold Networks (KANs). It proves that deep KANs with edge functions restricted to affine maps or a single fixed continuous function σ are dense in C(K) for every compact K ⊂ R^n if and only if σ is non-affine. For KANs with exactly two hidden layers the corresponding condition is that σ is non-polynomial. The authors further show that the full class of affine functions can be replaced by a finite family A_σ without losing universality, with an explicit construction of five affine functions sufficing in the non-polynomial case, and that the fixed spline parameterization of Liu et al. yields universal KANs even when degree and knots are held constant.

Significance. The results supply a precise characterization of when KANs are universal approximators, clarifying the minimal requirements on edge functions. The separation between the deep and two-layer cases, the reduction to finite affine families, and the confirmation that fixed splines remain universal are all load-bearing for practical and theoretical use of KANs. These contributions place KANs on firmer mathematical footing and directly address parameterization concerns raised by the original KAN work.

minor comments (3)

The definition of the KAN architecture (width, depth, and node summation) is used throughout but would benefit from an explicit diagram or formal inductive definition in §2 to aid readers unfamiliar with the Kolmogorov-Arnold representation.
In the statement of the finite-affine-family result, the dependence of A_σ on σ is stated but the explicit construction for the non-polynomial case (five functions) is only sketched; a short appendix listing the five functions would improve reproducibility.
The spline-universality theorem assumes the standard B-spline basis; a brief remark on whether the result extends to other fixed bases (e.g., truncated power functions) would clarify the scope.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our manuscript, for highlighting its significance in providing precise necessary and sufficient conditions for the universality of KANs, and for recommending acceptance. We are pleased that the separation between deep and two-layer cases, the reduction to finite affine families, and the universality of fixed splines were recognized as load-bearing contributions.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper establishes its central claims through direct mathematical proofs in approximation theory: necessity follows immediately from the fact that affine edge functions yield only affine maps (not dense in C(K)), while sufficiency for non-affine σ is shown by constructing dense approximations via the Kolmogorov-Arnold structure and a finite affine family A_σ. These arguments rely on continuity, the layered summation architecture, and standard density results for non-polynomial functions, without any fitted parameters, self-referential definitions, or load-bearing self-citations. The spline universality result is likewise proved structurally from the fixed parameterization, independent of external fitted values or prior author-specific theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard results from real analysis and functional analysis concerning density of certain function classes in C(K). No free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

standard math The space of continuous real-valued functions on a compact set K is a Banach space under the uniform norm, and density arguments apply to it.
Invoked implicitly in all universality statements about approximation in C(K).

pith-pipeline@v0.9.0 · 5560 in / 1396 out tokens · 64783 ms · 2026-05-08T06:23:18.008354+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇ ci´ c, T. Y. Hou, M. Tegmark, KAN: Kolmogorov–Arnold Networks,arXiv:2404.19756, 2024

work page internal anchor Pith review arXiv 2024
[2]

A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. (Russian), Dokl. Akad. Nauk SSSR114(1957), 953–956

1957
[3]

V. I. Arnold, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables. (Russian),Mat. Sb. (N.S.) 48/90(1959), 3–74; English transl. in:Amer. Math. Soc. Transl.(2)28(1963), 61–147. 17

1959
[4]

S. Ya. Khavinson,Best approximation by linear superpositions (approximate nomog- raphy), American Mathematical Society, Providence, RI, 1997, 175 pp

1997
[5]

V. E. Ismailov,Ridge functions and applications in neural networks, American Math- ematical Society, Providence, RI, 2021, 186 pp

2021
[6]

V. E. Ismailov, A three layer neural network can represent any multivariate function, J. Math. Anal. Appl.523(2023), no. 1, Article No. 127096, 8 pp

2023
[7]

G. G. Lorentz, Metric entropy, widths, and superpositions of functions,Amer. Math. Monthly69(1962), 469–485

1962
[8]

D. A. Sprecher, On the structure of continuous functions of several variables,Trans. Amer. Math. Soc.115(1965), 340–355

1965
[9]

Ismayilova and V

A. Ismayilova and V. E. Ismailov, On the Kolmogorov neural networks,Neural Net- works176(2024), Article No. 106333

2024
[10]

Igelnik and N

B. Igelnik and N. Parikh, Kolmogorov’s spline network,IEEE Trans. Neural Netw. 14(2003), no. 4, 725–733

2003
[11]

Polar and M

A. Polar and M. Poluektov, A deep machine learning algorithm for construction of the Kolmogorov—Arnold representation,Eng. Appl. Artif. Intell.99(2021), Article No. 104137

2021
[12]

Poluektov and A

M. Poluektov and A. Polar, Construction of the Kolmogorov–Arnold networks using the Newton–Kaczmarz method,Mach. Learn.114(2025), Article No. 185

2025
[13]

W. Liu, E. Chatzi, and Z. Lai, On the rate of convergence of Kolmogorov–Arnold network regression estimators,arXiv:2509.19830, 2025

work page arXiv 2025
[14]

Kratsios, B

A. Kratsios, B. J. Kim, and T. Furuya, Approximation rates in Besov norms and sample-complexity of Kolmogorov–Arnold networks with residual connections, arXiv:2504.15110, 2025

work page arXiv 2025
[15]

Gleyzer, H

S. Gleyzer, H. Nguyen, D. P. Ramakrishnan, and E. A. F. Reinhardt, Sinusoidal approximation theorem for Kolmogorov–Arnold networks,Mathematics13(2025), no. 19, Article No. 3157

2025
[16]

S-T. Chiu, S. W. Cheung, U. Braga-Neto, C. S. Lee, and R. P. Li, Free-RBF-KAN: Kolmogorov-Arnold networks with adaptive radial basis functions for efficient func- tion learning,arXiv:2601.07760, 2026

work page arXiv 2026
[17]

J. D. Toscano, L.-L. Wang, and G. E. Karniadakis, KKANs: K˚ urkov´ a–Kolmogorov– Arnold networks and their learning dynamics,Neural Networks191(2025), Article No. 107831

2025
[18]

Y. Wang, J. W. Siegel, Z. Liu, and T. Y. Hou, On the expressiveness and spectral bias of KANs,arXiv:2410.01803, 2024

work page arXiv 2024
[19]

Zhang and H

X. Zhang and H. Zhou, Generalization bounds and model complexity for Kolmo- gorov–Arnold networks,The Thirteenth International Conference on Learning Rep- resentations, 2025. 18

2025
[20]

S. M. Eshtehardian, M. H. Yassaee, and B. Khalaj, On the convergence of two-layer Kolmogorov–Arnold networks with first-layer training,The Fourteenth International Conference on Learning Representations, 2026

2026
[21]

Somvanshi, S

S. Somvanshi, S. A. Javed, M. M. Islam, D. Pandit, and S. Das, A survey on Kolmogorov–Arnold network,ACM Comput. Surv., 58 (2025), no. 2, Article 55

2025
[22]

A Practitioner's Guide to Kolmogorov-Arnold Networks

A. Noorizadegan, S. Wang, L. Ling, J. P. Dominguez-Morales A practitioner’s guide to Kolmogorov–Arnold networks,arXiv:2510.25781, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Leshno, V

M. Leshno, V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,Neural Networks,6(1993), 861–867

1993
[24]

Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

A. Pinkus, Approximation theory of the MLP model in neural networks,Acta nu- merica8(1999), 143–195

1999
[25]

de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

C. de Boor,A practical guide to splines, Revised Edition, Springer, New York, 2001, 346 pp

2001
[26]

L. L. Schumaker,Spline functions: basic theory, Third Edition, Cambridge Univer- sity Press, Cambridge, 2007, 582 pp. 19

2007