Algorithmic Foundations of Deep Learning: Complexity-Theoretic Rates and a Characterization of Universal Approximation

Anastasis Kratsios; Bum Jun Kim; Gregory Cousins; Haitz S\'aez de Oc\'ariz Borde; Simone Brugiapaglia

arxiv: 2606.26705 · v1 · pith:DZUCHT5Hnew · submitted 2026-06-25 · 💻 cs.LG · cs.AI· cs.LO· cs.NA· math.NA

Algorithmic Foundations of Deep Learning: Complexity-Theoretic Rates and a Characterization of Universal Approximation

Anastasis Kratsios , Simone Brugiapaglia , Bum Jun Kim , Gregory Cousins , Haitz S\'aez de Oc\'ariz Borde This is my paper

Pith reviewed 2026-06-26 05:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.LOcs.NAmath.NA

keywords neural network expressivityuniversal approximationcircuit complexityalgorithmic complexityfeedforward networksnon-affine nonlinearityparallelization conditionreal-valued circuits

0 comments

The pith

Neural networks emulate real-valued circuits with explicit depth, width, and parameter bounds, and universally approximate continuous functions if and only if they contain a non-affine nonlinearity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes neural network expressivity around algorithmic computation rather than regularity alone. It proves that any function computable by a circuit over a fixed set of elementary real-valued gates can be realized by a neural network whose size scales directly with the circuit's depth, width, gate count, and structure. The same viewpoint yields an if-and-only-if criterion: any definable network model obeying a parallelization condition (which permits multivariate operations such as attention) is a universal approximator precisely when it includes at least one non-affine nonlinearity. Concrete consequences include minimax-optimal rates on Besov classes, logarithmic error for holomorphic functions, and emulation of classical numerical algorithms without ad-hoc architecture arguments.

Core claim

If a function is computable by a real-valued circuit over a prescribed elementary gate language, then it can be computed to comparable accuracy by an NN with explicit depth, width, and non-zero-parameter bounds controlled by the depth, width, gate count, and gate structure. Any definable NN model satisfying a natural parallelization condition is a universal approximator if and only if it contains a non-affine nonlinearity.

What carries the argument

Emulation of real-valued circuits over elementary gates inside neural networks, together with the parallelization condition on definable models that allows multivariate nonlinearities.

If this is right

Universal approximation holds for all continuous functions once a non-affine nonlinearity is present.
Minimax-optimal approximation rates are recovered for Besov classes.
Holomorphic functions admit logarithmic-error approximation by neural networks.
Numerical algorithms such as Newton-Raphson and power iteration can be emulated directly by the network.
Shortest-path computation on k-vertex graphs yields networks with O(log(1/ε)) non-zero parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Circuit descriptions of target functions could be used to construct architecture-specific networks with near-optimal parameter counts.
The parallelization condition may extend the characterization to attention-based or normalization-heavy models without separate proofs.
Known results from real computation and circuit complexity could be imported to obtain new approximation bounds for structured function classes.
The distinction between regularity and algorithmic complexity suggests testing whether certain high-regularity functions still require large networks when their circuit complexity is high.

Load-bearing premise

The neural-network model under consideration must be definable and must satisfy the parallelization condition that permits possibly multivariate nonlinearities.

What would settle it

Exhibit either a circuit-computable function whose approximation by any neural network requires super-linear growth in non-zero parameters relative to the circuit size, or a definable parallelizable model containing only affine nonlinearities that still approximates every continuous function on compact sets to arbitrary accuracy.

Figures

Figures reproduced from arXiv: 2606.26705 by Anastasis Kratsios, Bum Jun Kim, Gregory Cousins, Haitz S\'aez de Oc\'ariz Borde, Simone Brugiapaglia.

**Figure 1.** Figure 1: The Gap: In both panels, the red function is intuitively more difficult to describe, and hence should be harder to approximate, than its green counterpart. Nevertheless, the two functions in each panel have the same classical regularity: low regularity in a and high regularity in b. Thus, the regularity-based lens of classical constructive approximation theory, cf. [53, 52], or compressed sensing [4] (e.g.… view at source ↗

**Figure 2.** Figure 2: The missing piece: algorithmic complexity. Classical approximation theory is black-box: it sees the regularity of the input-output map, but not the algorithmic structure producing it. Our framework interpolates between this regularity-based viewpoint and a white-box compilation of explicit algorithms. The grey-box theory forms the bridge: it converts circuit-level descriptions of a computation into neural-… view at source ↗

**Figure 3.** Figure 3: Informal Summary of the Quantitative Neural Compilation Theorem 3.2. Given a G-circuit computing f : [−1, 1]d → R up to error ε > 0, we construct a neural network computing f up to error 2ε, with explicit depth, width, and non-zero-parameter bounds inherited from the circuit depth, size, and elementary-operation language. Each elementary gate is replaced by a neural emulator using the abstract surgery tech… view at source ↗

**Figure 4.** Figure 4: Realistic network sizes for structured problems, beyond worst-case regularity. The all-pairs shortest paths (APSP) problem on a k-vertex graph defines a map f : R k(k−1)/2 → R k(k−1)/2 sending positive edge weights to the corresponding shortest-path distance matrix. Although this APSP map is 1-Lipschitz, classical bestapproximation rates for the Lipschitz class, cf. [203, 173], lead to networks of size O(… view at source ↗

**Figure 5.** Figure 5: We obtain the k-fold product (here k = 3) by nesting the binary product in Proposition A.4 and padding by 1s until the inputs to the nested binary products are a power of 2 (here 2 2 ). Proposition A.4 (Approximate Binary Multiplication Gate). Under Assumption 2.4, for every M > 0 and ε > 0 there exists an ANN Multε,M : R 2 → R satisfying sup |u|≤M, |v|≤M [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗

**Figure 6.** Figure 6: Canonicalization of the G-Circuit in Figure 6a: The circuit computing the same function f is endowed with a multi-partite structure, thereby mimicking the “uniform” structure of an ANN. This is achieved by first aligning computation nodes according to their distance from the input variables, and then subdividing edges which “hop across” levels of the circuit by inserting new nodes labelled with identity ga… view at source ↗

**Figure 7.** Figure 7: Algorithmic Surgery of the G-Circuit in [PITH_FULL_IMAGE:figures/full_fig_p051_7.png] view at source ↗

**Figure 8.** Figure 8: Alignment of ANN sub-networks emulating the G-Circuit in [PITH_FULL_IMAGE:figures/full_fig_p052_8.png] view at source ↗

**Figure 9.** Figure 9: Illustration of lower and non-lower sets of N 2 0. Left: the blue dots provide an example of lower set S ⊆ N 2 0. Dots and rectangles in red, orange, and green color illustrate the property that × d i=1[0, νi] ∩ N d 0 ⊆ S for each ν ∈ S. Right: a multi-index set of N 2 0 not satisfying the lower set property. The lattice points in the orange box corresponding to the multi-index (4, 2) in orange are not inc… view at source ↗

read the original abstract

Feedforward neural network (NN) expressivity is typically studied by emulating optimal basis-expansion schemes. While powerful, this perspective is incomplete: it primarily captures complexity through regularity, and therefore does not distinguish intuitively simple and complicated objects with comparable regularity, such as the square-root function and a typical Brownian path. The guiding message is that neural networks should be viewed not only as flexible basis functions, but also as models of computation. If a function is computable by a real-valued circuit over a prescribed elementary gate language, then it can be computed to comparable accuracy by an NN with explicit depth, width, and non-zero-parameter bounds controlled by the depth, width, gate count, and gate structure. Thus, neural-network complexity is not governed by regularity alone, but also by algorithmic complexity. We then show that any definable NN model satisfying a natural parallelization condition, allowing possibly multivariate non-linearities such as attention or layer normalization, is a universal approximator if and only if it contains a non-affine nonlinearity. The scope of our theory is illustrated by deducing universal approximation guarantees for continuous functions, minimax-optimal approximation guarantees for Besov classes, logarithmic-error complexity for holomorphic functions, and by showing that NNs can emulate numerical algorithms such as Newton-Raphson root finding and power iteration without architecture-specific arguments. Its precision is illustrated by shortest-path computation on $k$-vertex graphs: compiling the tropical dynamic-programming circuit yields NNs with O(log(1/{\epsilon})) non-zero parameters, exponentially improving in 1/{\epsilon} over the generic $O({\epsilon}^{-c k^2})$ Lipschitz-approximation scale, for a constant c>0.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links NN size bounds to circuit complexity rather than regularity, with an iff UA result under definability plus parallelization, and a concrete shortest-path example that improves on Lipschitz rates.

read the letter

The core contribution is treating NNs as emulators of real-valued circuits over a fixed gate set. If a function comes from such a circuit, the paper gives explicit depth, width, and nonzero-parameter counts for an NN that matches the accuracy, controlled by the circuit's own structure. This is paired with an if-and-only-if statement: any definable NN model that meets the parallelization condition is universal exactly when it has a non-affine nonlinearity. The parallelization step is what lets the result cover attention and layer norm without forcing everything to be univariate.

The shortest-path example is the clearest payoff. Compiling the tropical dynamic-programming circuit produces NNs with O(log(1/ε)) nonzero parameters for k-vertex graphs, which is exponentially better than the generic O(ε^{-c k^2}) scale that comes from Lipschitz approximation. The paper also shows the same circuit-to-NN translation recovers minimax rates for Besov classes, logarithmic rates for holomorphic functions, and emulation of Newton iteration and power iteration. These are not just existence proofs; they come with parameter counts tied to the source algorithm.

The main limitation is that both the emulation bounds and the iff characterization are conditional on the model being definable and satisfying the parallelization condition. How often real architectures meet the latter without extra work is not obvious from the abstract, and the paper does not appear to derive the condition from first principles. Without the full proofs it is also hard to judge whether the circuit-to-NN translation introduces hidden constants or requires the gate language to be chosen carefully. The claims themselves do not look circular; they reduce to external circuit models rather than fitted quantities inside the paper.

This is worth a serious referee. The circuit viewpoint is a genuine alternative to the usual regularity arguments, the concrete parameter improvement is verifiable in principle, and the UA characterization is stated sharply enough to be checked. Readers working on expressivity, approximation theory, or algorithmic aspects of architectures will get the most out of it. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The paper claims that neural networks should be analyzed as computational models: any function computable by a real-valued circuit over a fixed gate language can be approximated to comparable accuracy by a feedforward NN whose depth, width, and number of non-zero parameters are explicitly bounded in terms of the circuit's depth, width, gate count, and structure. It further asserts that any definable NN model obeying a natural parallelization condition (permitting multivariate nonlinearities such as attention or layer normalization) is a universal approximator if and only if it contains at least one non-affine nonlinearity. The theory is illustrated by deriving universal-approximation statements for continuous functions, minimax rates for Besov classes, logarithmic-error bounds for holomorphic functions, and by showing that standard numerical algorithms (Newton-Raphson, power iteration) can be emulated; a concrete highlight is the construction of NNs for shortest-path computation on k-vertex graphs that use only O(log(1/ε)) non-zero parameters.

Significance. If the stated theorems hold with the claimed explicit bounds, the work supplies a complexity-theoretic foundation for NN expressivity that incorporates algorithmic structure rather than regularity alone. The circuit-emulation result and the if-and-only-if universal-approximation characterization under an explicitly stated modeling condition would be useful for deriving architecture-specific guarantees and for explaining why certain multivariate operations succeed. The concrete parameter-count improvement for shortest-path computation demonstrates the potential tightness of the bounds relative to generic Lipschitz arguments.

major comments (2)

[Abstract and the section stating the UA theorem] The universal-approximation characterization (abstract, second paragraph) is conditional on the NN model being 'definable' and satisfying the 'natural parallelization condition.' The manuscript must supply a precise, checkable definition of both notions and verify that the listed examples (attention, layer normalization, Newton iteration) satisfy the condition without additional restrictions that would narrow the function class.
[The section on shortest-path computation] The shortest-path claim (abstract, final sentence) asserts O(log(1/ε)) non-zero parameters obtained by compiling the tropical dynamic-programming circuit. The derivation of this count, including the precise mapping from circuit gates to network parameters and the verification that the resulting network indeed solves the problem to accuracy ε, is load-bearing for the claimed exponential improvement over the generic O(ε^{-c k^2}) scale and must be presented with all intermediate steps.

minor comments (2)

The abstract is information-dense; separating the circuit-emulation theorem, the UA characterization, and the illustrative applications into distinct sentences would improve readability.
The term 'definable NN model' appears without an inline definition in the abstract; a brief parenthetical gloss or forward reference to its formal definition would help readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address the two major comments below and will incorporate the requested clarifications and expansions in the revised manuscript.

read point-by-point responses

Referee: [Abstract and the section stating the UA theorem] The universal-approximation characterization (abstract, second paragraph) is conditional on the NN model being 'definable' and satisfying the 'natural parallelization condition.' The manuscript must supply a precise, checkable definition of both notions and verify that the listed examples (attention, layer normalization, Newton iteration) satisfy the condition without additional restrictions that would narrow the function class.

Authors: We agree that explicit, checkable definitions are required for reproducibility. In the revised manuscript we will insert formal definitions of 'definable' (as a model whose operations are given by a fixed finite set of real-valued functions closed under composition and parallel application) and the 'natural parallelization condition' (the requirement that any collection of independent scalar or vector operations can be realized by a single layer whose width scales linearly with the number of parallel instances) directly in the section containing the UA theorem. We will then verify, with explicit constructions, that attention, layer normalization, and Newton iteration satisfy both notions under the modeling assumptions already stated in the paper, without imposing further restrictions on the representable function class. revision: yes
Referee: [The section on shortest-path computation] The shortest-path claim (abstract, final sentence) asserts O(log(1/ε)) non-zero parameters obtained by compiling the tropical dynamic-programming circuit. The derivation of this count, including the precise mapping from circuit gates to network parameters and the verification that the resulting network indeed solves the problem to accuracy ε, is load-bearing for the claimed exponential improvement over the generic O(ε^{-c k^2}) scale and must be presented with all intermediate steps.

Authors: We agree that the parameter-count derivation is central and must be fully explicit. In the revised manuscript we will expand the shortest-path section to include: (i) the complete tropical dynamic-programming circuit for k-vertex graphs, (ii) the gate-by-gate translation into NN layers together with the exact non-zero parameter count at each step, and (iii) a direct verification that the resulting network computes shortest paths to accuracy ε. This will make the O(log(1/ε)) bound and the comparison to the generic Lipschitz scale fully self-contained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained reductions

full rationale

The paper frames its core results as explicit reductions from real-valued circuit models (with given gate language, depth, width, gate count) to NN depth/width/parameter bounds, plus an if-and-only-if universal-approximation characterization conditioned on the external modeling premises of definability and the parallelization condition. These premises are stated as modeling choices rather than derived quantities, and the circuit-to-NN emulation supplies concrete bounds in terms of the source circuit without reducing to any fitted parameter or self-referential definition inside the paper. No load-bearing step equates a claimed prediction to an input by construction, and no self-citation chain is invoked to justify uniqueness or an ansatz. The shortest-path example and other illustrations are presented as applications of the stated theorems rather than circular validations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on treating functions as outputs of real-valued circuits over an elementary gate language and on introducing the definable-model and parallelization-condition framework; no numerical free parameters are fitted to data.

axioms (2)

domain assumption Functions of interest are exactly those computable by real-valued circuits over a fixed elementary gate language
Invoked to obtain the explicit depth-width-parameter bounds from circuit structure.
ad hoc to paper The neural-network family under study is definable and obeys the parallelization condition
Required for the universal-approximation if-and-only-if statement.

invented entities (1)

definable NN model no independent evidence
purpose: Generalizes standard feedforward networks to include multivariate nonlinearities such as attention
New modeling class introduced to state the universal-approximation characterization

pith-pipeline@v0.9.1-grok · 5878 in / 1467 out tokens · 52497 ms · 2026-06-26T05:18:09.225642+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 13 canonical work pages · 2 internal anchors

[1]

[2]Adcock, B., Brugiapaglia, S., Dexter, N., and Moraga, S.Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data

Software available from tensorflow.org. [2]Adcock, B., Brugiapaglia, S., Dexter, N., and Moraga, S.Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data. InProceedings of the 2nd Mathematical and Scientific Machine Learning Conference(16–19 Aug 2022), J. Bruna, J. Hesthaven, and L. Zdeborova, Eds., vol....

2022
[2]

[6]Aftab, J., Schwab, C., Yang, H., and Zech, J.Quantum circuit encodings of polynomial chaos expansions.arXiv preprint arXiv:2506.01811(2025)

[5]Adcock, B., Dexter, N., and Moraga, S.Optimal deep learning of holomorphic operators between Banach spaces.Advances in Neural Information Processing Systems 37(2024), 27725–27789. [6]Aftab, J., Schwab, C., Yang, H., and Zech, J.Quantum circuit encodings of polynomial chaos expansions.arXiv preprint arXiv:2506.01811(2025). [7]Aldous, D., and Diaconis, P...

work page arXiv 2024
[3]

[11]Bartlett, P

[10]Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G.Optimal learners for realizable regression: PAC learning and online learning.Advances in Neural Information Processing Systems 36(2023). [11]Bartlett, P. L., Maiorov, V., and Meir, R.Almost linear VC-dimension bounds for piecewise polynomial networks.Neural Computation 10, 8 (1998), ...

2023
[4]

[14]Bennett, J., Carbery, A., Christ, M., and Tao, T.The Brascamp–Lieb inequalities: finiteness, structure and extremals.Geometric and Functional Analysis 17, 5 (2008), 1343–1415

[13]Bellman, R.Dynamic programming treatment of the travelling salesman problem.Journal of the ACM (JACM) 9, 1 (1962), 61–63. [14]Bennett, J., Carbery, A., Christ, M., and Tao, T.The Brascamp–Lieb inequalities: finiteness, structure and extremals.Geometric and Functional Analysis 17, 5 (2008), 1343–1415. [15]Bernstein, S. N.Démonstration du théorème de we...

1962
[5]

[25]Chen, Z., Villar, S., Chen, L., and Bruna, J.On the equivalence between graph isomorphism testing and function approximation with GNNs

[24]Chen, Y., Dong, B., and Xu, J.Meta-MgNet: Meta multigrid networks for solving parameterized partial differential equations.Journal of Computational Physics 455(2022), 110996. [25]Chen, Z., Villar, S., Chen, L., and Bruna, J.On the equivalence between graph isomorphism testing and function approximation with GNNs. InAdvances in Neural Information Proce...

2022
[6]

[27]Chiang, D.Transformers in uniform TC 0.Transactions on Machine Learning Research(Jan

[26]Cheridito, P., Jentzen, A., and Rossmannek, F.Efficient approximation of high-dimensional functions with neural networks.IEEE Transactions on Neural Networks and Learning Systems 33, 7 (July 2022), 3079–3093. [27]Chiang, D.Transformers in uniform TC 0.Transactions on Machine Learning Research(Jan. 2025). [28]Chkifa, A., Cohen, A., and Schwab, C.High-d...

2022
[7]

[41]Cuchiero, C., Schmocker, P., and Teichmann, J.Global universal approximation of functional input maps on weighted spaces.Constructive Approximation(2026), 1–76

Lecture notes. [41]Cuchiero, C., Schmocker, P., and Teichmann, J.Global universal approximation of functional input maps on weighted spaces.Constructive Approximation(2026), 1–76. [42]Dahmen, W.ApproximationbylinearcombinationsofmultivariateB-splines.Journal of Approximation Theory 31, 4 (1981), 299–324. [43]Dahmen, W.Compositional sparsity, approximation...

work page arXiv 2026
[8]

[47]de Boor, C., and DeVore, R.Approximation by smooth multivariate splines.Transactions of the American Mathematical Society 276, 2 (1983), 775–788

[46]Daws, J., and Webster, C.Analysis of deep neural networks with quasi-optimal polynomial approx- imation rates.arXiv preprint arXiv:1912.02302(2019). [47]de Boor, C., and DeVore, R.Approximation by smooth multivariate splines.Transactions of the American Mathematical Society 276, 2 (1983), 775–788. [48]De Ryck, T., Lanthaler, S., and Mishra, S.On the a...

work page arXiv 1912
[9]

A., and Popov, V

[53]DeVore, R. A., and Popov, V. A.Interpolation of Besov spaces.Transactions of the American Mathematical Society 305, 1 (1988), 397–414. [54]DeVore, R. A., and Sharpley, R. C.Besov spaces on domains inR.Transactions of the American Mathematical Society 335, 2 (1993), 843–864. [55]Dolbeault, M., Krieg, D., and Ullrich, M.A sharp upper bound for sampling ...

1988
[10]

R., and Brugiapaglia, S.A practical existence theorem for reduced order models based on convolutional autoencoders.Foundations of Data Science 7, 1 (2025), 72–98

[60]Franco, N. R., and Brugiapaglia, S.A practical existence theorem for reduced order models based on convolutional autoencoders.Foundations of Data Science 7, 1 (2025), 72–98. [61]Furuya, T., and Kratsios, A.Simultaneously Solving FBSDEs with Neural Operators of Logarithmic Depth, Constant Width, and Sub-Linear Rank,

2025
[11]

[63]Georgiev, D., Barbiero, P., Kazhdan, D., Veličković, P., and Liò, P.Algorithmic concept- based explainable reasoning

[62]Furuya, T., Kratsios, A., Possamaï, D., and Raonić, B.One model to solve them all: 2BSDE families via neural operators.arXiv preprint arXiv:2511.01125(2025). [63]Georgiev, D., Barbiero, P., Kazhdan, D., Veličković, P., and Liò, P.Algorithmic concept- based explainable reasoning. InProceedings of the AAAI Conference on Artificial Intelligence(2022), vo...

work page arXiv 2025
[12]

[68]Gonon, L., Grigoryeva, L., and Ortega, J.-P.Approximation bounds for random neural networks and reservoir systems.The Annals of Applied Probability 33, 1 (2023), 28–69

[67]Goldbring, I., Hart, B., and Kruckman, A.The almost sure theory of finite metric spaces.Bulletin of the London Mathematical Society 53, 6 (2021), 1740–1748. [68]Gonon, L., Grigoryeva, L., and Ortega, J.-P.Approximation bounds for random neural networks and reservoir systems.The Annals of Applied Probability 33, 1 (2023), 28–69. [69]Greenfeld, D., Galu...

2021
[13]

InProceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC)(1986), ACM, pp

[74]Håstad, J.Almost optimal lower bounds for small depth circuits. InProceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC)(1986), ACM, pp. 6–20. [75]Håstad, J.On the correlation of parity and small-depth circuits.SIAM Journal on Computing 43, 5 (2014), 1699–1708. [76]He, J., Liu, X., and Xu, J.MgNO: Efficient parameterization of line...

1986
[14]

M.A dynamic programming approach to sequencing problems.Journal of the Society for Industrial and Applied mathematics 10, 1 (1962), 196–210

[79]Held, M., and Karp, R. M.A dynamic programming approach to sequencing problems.Journal of the Society for Industrial and Applied mathematics 10, 1 (1962), 196–210. [80]HIERONYMI, P., and MILLER, C.Metric dimensions and tameness in expansions of the real field. Transactions of the American Mathematical Society 373, 2 (1029) (2020), pp. 849–874. [81]Hon...

work page arXiv 1962
[15]

J., Bošnjak, M., Vitvitskyi, A., Rubanova, Y., Deac, A., Bevilacqua, B., Ganin, Y., Blundell, C., and Veličković, P.A generalist neural algorithmic learner

[84]Ibarz, B., Kurin, V., Papamakarios, G., Nikiforou, K., Bennani, M., Csordás, R., Dudzik, A. J., Bošnjak, M., Vitvitskyi, A., Rubanova, Y., Deac, A., Bevilacqua, B., Ganin, Y., Blundell, C., and Veličković, P.A generalist neural algorithmic learner. InProceedings of the First Learning on Graphs Conference(2022), vol. 198 ofProceedings of Machine Learni...

2022
[16]

Journal of the ACM (JACM) 29, 3 (1982), 874–897

[87]Jerrum, M., and Snir, M.Some exact complexity results for straight-line computations over semirings. Journal of the ACM (JACM) 29, 3 (1982), 874–897. [88]Jones, P. W.Quasiconformal mappings and extendability of functions in Sobolev spaces.Acta Math- ematica 147(1981), 71–88. [89]Jukna, S.Boolean function complexity, vol. 27 ofAlgorithms and Combinator...

1982
[17]

[93]Kerr, L

79 [92]Karpinski, M., and Macintyre, A.Polynomial bounds for VC dimension of sigmoidal and general pfaffian neural networks.Journal of Computer and System Sciences 54, 1 (1997), 169–176. [93]Kerr, L. R.The Effect of Algebraic Structure on the Computation Complexity of Matrix Multiplications. PhD thesis, Cornell University, Ithaca, NY,

1997
[18]

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

Translated from the Russian by Smilka Zdravkovska. [95]Kidger, P., and Lyons, T.Universal approximation with deep narrow networks. InConference on Learning Theory(2020), PMLR, pp. 2306–2327. [96]Kolmogorov, A. N.On certain asymptotic characteristics of completely bounded metric spaces. Doklady Akademii Nauk SSSR 108, 3 (1956), 385–388. In Russian. [97]Kol...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[19]

Approx 23, 1 (2006), 61–77

[106]Kühn, T., Leopold, H.-G., Sickel, W., and Skrzypczak, L.Entropy numbers of embeddings of weighted besov spaces.Constr. Approx 23, 1 (2006), 61–77. [107]Kujawa, Z., Poole, J., Georgiev, D., Numeroso, D., and Liò, P.Neural algorithmic reasoning with multiple correct solutions,

2006
[20]

[109]Kurdyka, K.On gradients of functions definable in o-minimal structures.Annales de l’Institut Fourier 48, 3 (1998), 769–783

[108]Kulbatov, V., Lang, J., Schneider, C., and Vybíral, J.Bases of Lebesgue spaces formed by neural networks.arXiv preprint arXiv:2511.23179(2025). [109]Kurdyka, K.On gradients of functions definable in o-minimal structures.Annales de l’Institut Fourier 48, 3 (1998), 769–783. [110]Li, W., Kratsios, A., Ghoukasian, H., and Zvigelsky, D.Certifiable Boolean...

work page arXiv 2025
[21]

[116]Lu, J., Shen, Z., Yang, H., and Zhang, S.Deep network approximation for smooth functions. SIAM J. Math. Anal. 53, 5 (2021), 5465–5506. [117]Maass, W., Schnitger, G., and Sontag, E. D.On the computational power of sigmoid versus Boolean threshold circuits. InProceedings of the 32nd Annual IEEE Symposium on Foundations of Computer Science(1991), pp. 76...

2021
[22]

[124]Mayer, S., and Ullrich, T.Entropy numbers of finite dimensional mixed-norm balls and function space embeddings with small mixed smoothness.Constr. Approx. 53, 2 (2021), 249–279. [125]McCulloch, W. S., and Pitts, W.A logical calculus of the ideas immanent in nervous activity.The Bulletin of Mathematical Biophysics 5(1943), 115–133. [126]Merrill, W., S...

2021
[23]

N., and Micchelli, C

[128]Mhaskar, H. N., and Micchelli, C. A.Approximation by superposition of sigmoidal and radial basis functions.Advances in Applied Mathematics 13, 3 (1992), 350–373. [129]Mhaskar, H. N., and Poggio, T.Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications 14, 06 (2016), 829–848. [130]Mises, R., and Pollaczek-Geiringer, ...

1992
[24]

J., and Brugiapaglia, S.Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms.arXiv preprint arXiv:2505.15661(2025)

81 [132]Mohammad-Taheri, S., Colbrook, M. J., and Brugiapaglia, S.Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms.arXiv preprint arXiv:2505.15661(2025). [133]Monga, V., Li, Y., and Eldar, Y. C.Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine 38, 2...

work page arXiv 2025
[25]

[140]Nachbin, L.Weighted approximation for algebras and modules of continuous functions: Real and self-adjoint complex cases.Annals of Mathematics 81, 2 (1965), 289–302

[139]Nachbin, L.An extension of the notion of integral functions of the finite exponential type.Anais da Academia Brasileira de Ciências 16(1944), 143–147. [140]Nachbin, L.Weighted approximation for algebras and modules of continuous functions: Real and self-adjoint complex cases.Annals of Mathematics 81, 2 (1965), 289–302. [141]Neuman, A. M., and Brambur...

work page arXiv 1944
[26]

[144]Opschoor, J

Accessed: 2026-05-26. [144]Opschoor, J. A. A., Schwab, C., and Zech, J.Exponential ReLU DNN expression of holomorphic maps in high dimension.Constructive Approximation 55, 1 (2022), 537–582. [145]Pándy, M., Qiu, W., Corso, G., Veličković, P., Ying, Z., Leskovec, J., and Liò, P.Learning graph search heuristics. InProceedings of the First Learning on Graphs...

2026
[27]

E.On threshold circuits for parity.IEEE Transactions on Industry Applications 27, 1 (1991), 397–404

[147]Paturi, R., and Saks, M. E.On threshold circuits for parity.IEEE Transactions on Industry Applications 27, 1 (1991), 397–404. [148]Petersen, P., and Voigtlaender, F.Optimal approximation of piecewise smooth functions using deep ReLU neural networks.Neural Networks 108(2018), 296–330. [149]Petersen, P., and Zech, J.Mathematical Theory of Deep Learning...

work page arXiv 1991
[28]

[154]Poggio, T., and Fraser, M.Compositional sparsity of learnable functions.Bulletin of the American Mathematical Society 61, 3 (2024), 438–456

[153]Pinkus, A.Approximation theory of the MLP model in neural networks.Acta Numerica 8(1999), 143–195. [154]Poggio, T., and Fraser, M.Compositional sparsity of learnable functions.Bulletin of the American Mathematical Society 61, 3 (2024), 438–456. [155]Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T.Numerical Recipes in FORTRAN: Th...

1999
[29]

InInternational Conference on Learning Representations(2019)

[157]Pérez, J., Marinković, J., and Barceló, P.On the Turing completeness of modern neural network architectures. InInternational Conference on Learning Representations(2019). [158]Raphson, J.Analysis Aequationum Universalis. Thomas Braddyll, London,

2019
[30]

[160]Robinson, J

[159]Rauhut, H., and Ward, R.Sparse Legendre expansions viaℓ1-minimization.Journal of Approxima- tion Theory 164, 5 (2012), 517–533. [160]Robinson, J. C.Dimensions, Embeddings, and Attractors, vol

2012
[31]

Representation Benefits of Deep Feedforward Networks

[161]Rogers, L. G.Degree-independent Sobolev extension on locally uniform domains.Journal of Func- tional Analysis 235, 2 (2006), 619–665. [162]Rosenblatt, F.The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review 65, 6 (1958), 386–408. [163]Roy, B.Transitivité et connexité.C. R. Acad. Sci. Paris 24...

work page internal anchor Pith review Pith/arXiv arXiv 2006
[32]

[186]van den Dries, L., and Miller, C.On the real exponential field with restricted analytic functions

[185]van den Dries, L., Macintyre, A., and Marker, D.The elementary theory of restricted analytic fields with exponentiation.Annals of Mathematics 140, 1 (1994), 183–205. [186]van den Dries, L., and Miller, C.On the real exponential field with restricted analytic functions. Israel Journal of Mathematics 85, 1–3 (1994), 19–56. [187]van den Dries, L., and M...

1994
[33]

arXiv preprint arXiv:2603.01191(2026)

[195]Wang, C., and Townsend, A.Beyond singular value gaps in randomized subspace approximation. arXiv preprint arXiv:2603.01191(2026). [196]Wang, Z., Ling, Q., and Huang, T. S.Learning deepℓ 0 encoders. InProceedings of the AAAI Conference on Artificial Intelligence(2016), vol. 30, pp. 2194–2200. [197]Warshall, S.A theorem on Boolean matrices.Journal of t...

work page arXiv 2026
[34]

[201]Xin, B., Wang, Y., Gao, W., Wipf, D., and Wang, B.Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems(2016), vol

[200]Xhonneux, L.-P., Deac, A.-I., Veličković, P., and Tang, J.Howtotransferalgorithmicreasoning knowledge to learn new algorithms?Advances in Neural Information Processing Systems 34(2021), 19500–19512. [201]Xin, B., Wang, Y., Gao, W., Wipf, D., and Wang, B.Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems(2016), v...

work page arXiv 2021

[1] [1]

[2]Adcock, B., Brugiapaglia, S., Dexter, N., and Moraga, S.Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data

Software available from tensorflow.org. [2]Adcock, B., Brugiapaglia, S., Dexter, N., and Moraga, S.Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data. InProceedings of the 2nd Mathematical and Scientific Machine Learning Conference(16–19 Aug 2022), J. Bruna, J. Hesthaven, and L. Zdeborova, Eds., vol....

2022

[2] [2]

[6]Aftab, J., Schwab, C., Yang, H., and Zech, J.Quantum circuit encodings of polynomial chaos expansions.arXiv preprint arXiv:2506.01811(2025)

[5]Adcock, B., Dexter, N., and Moraga, S.Optimal deep learning of holomorphic operators between Banach spaces.Advances in Neural Information Processing Systems 37(2024), 27725–27789. [6]Aftab, J., Schwab, C., Yang, H., and Zech, J.Quantum circuit encodings of polynomial chaos expansions.arXiv preprint arXiv:2506.01811(2025). [7]Aldous, D., and Diaconis, P...

work page arXiv 2024

[3] [3]

[11]Bartlett, P

[10]Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G.Optimal learners for realizable regression: PAC learning and online learning.Advances in Neural Information Processing Systems 36(2023). [11]Bartlett, P. L., Maiorov, V., and Meir, R.Almost linear VC-dimension bounds for piecewise polynomial networks.Neural Computation 10, 8 (1998), ...

2023

[4] [4]

[14]Bennett, J., Carbery, A., Christ, M., and Tao, T.The Brascamp–Lieb inequalities: finiteness, structure and extremals.Geometric and Functional Analysis 17, 5 (2008), 1343–1415

[13]Bellman, R.Dynamic programming treatment of the travelling salesman problem.Journal of the ACM (JACM) 9, 1 (1962), 61–63. [14]Bennett, J., Carbery, A., Christ, M., and Tao, T.The Brascamp–Lieb inequalities: finiteness, structure and extremals.Geometric and Functional Analysis 17, 5 (2008), 1343–1415. [15]Bernstein, S. N.Démonstration du théorème de we...

1962

[5] [5]

[25]Chen, Z., Villar, S., Chen, L., and Bruna, J.On the equivalence between graph isomorphism testing and function approximation with GNNs

[24]Chen, Y., Dong, B., and Xu, J.Meta-MgNet: Meta multigrid networks for solving parameterized partial differential equations.Journal of Computational Physics 455(2022), 110996. [25]Chen, Z., Villar, S., Chen, L., and Bruna, J.On the equivalence between graph isomorphism testing and function approximation with GNNs. InAdvances in Neural Information Proce...

2022

[6] [6]

[27]Chiang, D.Transformers in uniform TC 0.Transactions on Machine Learning Research(Jan

[26]Cheridito, P., Jentzen, A., and Rossmannek, F.Efficient approximation of high-dimensional functions with neural networks.IEEE Transactions on Neural Networks and Learning Systems 33, 7 (July 2022), 3079–3093. [27]Chiang, D.Transformers in uniform TC 0.Transactions on Machine Learning Research(Jan. 2025). [28]Chkifa, A., Cohen, A., and Schwab, C.High-d...

2022

[7] [7]

[41]Cuchiero, C., Schmocker, P., and Teichmann, J.Global universal approximation of functional input maps on weighted spaces.Constructive Approximation(2026), 1–76

Lecture notes. [41]Cuchiero, C., Schmocker, P., and Teichmann, J.Global universal approximation of functional input maps on weighted spaces.Constructive Approximation(2026), 1–76. [42]Dahmen, W.ApproximationbylinearcombinationsofmultivariateB-splines.Journal of Approximation Theory 31, 4 (1981), 299–324. [43]Dahmen, W.Compositional sparsity, approximation...

work page arXiv 2026

[8] [8]

[47]de Boor, C., and DeVore, R.Approximation by smooth multivariate splines.Transactions of the American Mathematical Society 276, 2 (1983), 775–788

[46]Daws, J., and Webster, C.Analysis of deep neural networks with quasi-optimal polynomial approx- imation rates.arXiv preprint arXiv:1912.02302(2019). [47]de Boor, C., and DeVore, R.Approximation by smooth multivariate splines.Transactions of the American Mathematical Society 276, 2 (1983), 775–788. [48]De Ryck, T., Lanthaler, S., and Mishra, S.On the a...

work page arXiv 1912

[9] [9]

A., and Popov, V

[53]DeVore, R. A., and Popov, V. A.Interpolation of Besov spaces.Transactions of the American Mathematical Society 305, 1 (1988), 397–414. [54]DeVore, R. A., and Sharpley, R. C.Besov spaces on domains inR.Transactions of the American Mathematical Society 335, 2 (1993), 843–864. [55]Dolbeault, M., Krieg, D., and Ullrich, M.A sharp upper bound for sampling ...

1988

[10] [10]

R., and Brugiapaglia, S.A practical existence theorem for reduced order models based on convolutional autoencoders.Foundations of Data Science 7, 1 (2025), 72–98

[60]Franco, N. R., and Brugiapaglia, S.A practical existence theorem for reduced order models based on convolutional autoencoders.Foundations of Data Science 7, 1 (2025), 72–98. [61]Furuya, T., and Kratsios, A.Simultaneously Solving FBSDEs with Neural Operators of Logarithmic Depth, Constant Width, and Sub-Linear Rank,

2025

[11] [11]

[63]Georgiev, D., Barbiero, P., Kazhdan, D., Veličković, P., and Liò, P.Algorithmic concept- based explainable reasoning

[62]Furuya, T., Kratsios, A., Possamaï, D., and Raonić, B.One model to solve them all: 2BSDE families via neural operators.arXiv preprint arXiv:2511.01125(2025). [63]Georgiev, D., Barbiero, P., Kazhdan, D., Veličković, P., and Liò, P.Algorithmic concept- based explainable reasoning. InProceedings of the AAAI Conference on Artificial Intelligence(2022), vo...

work page arXiv 2025

[12] [12]

[68]Gonon, L., Grigoryeva, L., and Ortega, J.-P.Approximation bounds for random neural networks and reservoir systems.The Annals of Applied Probability 33, 1 (2023), 28–69

[67]Goldbring, I., Hart, B., and Kruckman, A.The almost sure theory of finite metric spaces.Bulletin of the London Mathematical Society 53, 6 (2021), 1740–1748. [68]Gonon, L., Grigoryeva, L., and Ortega, J.-P.Approximation bounds for random neural networks and reservoir systems.The Annals of Applied Probability 33, 1 (2023), 28–69. [69]Greenfeld, D., Galu...

2021

[13] [13]

InProceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC)(1986), ACM, pp

[74]Håstad, J.Almost optimal lower bounds for small depth circuits. InProceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC)(1986), ACM, pp. 6–20. [75]Håstad, J.On the correlation of parity and small-depth circuits.SIAM Journal on Computing 43, 5 (2014), 1699–1708. [76]He, J., Liu, X., and Xu, J.MgNO: Efficient parameterization of line...

1986

[14] [14]

M.A dynamic programming approach to sequencing problems.Journal of the Society for Industrial and Applied mathematics 10, 1 (1962), 196–210

[79]Held, M., and Karp, R. M.A dynamic programming approach to sequencing problems.Journal of the Society for Industrial and Applied mathematics 10, 1 (1962), 196–210. [80]HIERONYMI, P., and MILLER, C.Metric dimensions and tameness in expansions of the real field. Transactions of the American Mathematical Society 373, 2 (1029) (2020), pp. 849–874. [81]Hon...

work page arXiv 1962

[15] [15]

J., Bošnjak, M., Vitvitskyi, A., Rubanova, Y., Deac, A., Bevilacqua, B., Ganin, Y., Blundell, C., and Veličković, P.A generalist neural algorithmic learner

[84]Ibarz, B., Kurin, V., Papamakarios, G., Nikiforou, K., Bennani, M., Csordás, R., Dudzik, A. J., Bošnjak, M., Vitvitskyi, A., Rubanova, Y., Deac, A., Bevilacqua, B., Ganin, Y., Blundell, C., and Veličković, P.A generalist neural algorithmic learner. InProceedings of the First Learning on Graphs Conference(2022), vol. 198 ofProceedings of Machine Learni...

2022

[16] [16]

Journal of the ACM (JACM) 29, 3 (1982), 874–897

[87]Jerrum, M., and Snir, M.Some exact complexity results for straight-line computations over semirings. Journal of the ACM (JACM) 29, 3 (1982), 874–897. [88]Jones, P. W.Quasiconformal mappings and extendability of functions in Sobolev spaces.Acta Math- ematica 147(1981), 71–88. [89]Jukna, S.Boolean function complexity, vol. 27 ofAlgorithms and Combinator...

1982

[17] [17]

[93]Kerr, L

79 [92]Karpinski, M., and Macintyre, A.Polynomial bounds for VC dimension of sigmoidal and general pfaffian neural networks.Journal of Computer and System Sciences 54, 1 (1997), 169–176. [93]Kerr, L. R.The Effect of Algebraic Structure on the Computation Complexity of Matrix Multiplications. PhD thesis, Cornell University, Ithaca, NY,

1997

[18] [18]

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

Translated from the Russian by Smilka Zdravkovska. [95]Kidger, P., and Lyons, T.Universal approximation with deep narrow networks. InConference on Learning Theory(2020), PMLR, pp. 2306–2327. [96]Kolmogorov, A. N.On certain asymptotic characteristics of completely bounded metric spaces. Doklady Akademii Nauk SSSR 108, 3 (1956), 385–388. In Russian. [97]Kol...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[19] [19]

Approx 23, 1 (2006), 61–77

[106]Kühn, T., Leopold, H.-G., Sickel, W., and Skrzypczak, L.Entropy numbers of embeddings of weighted besov spaces.Constr. Approx 23, 1 (2006), 61–77. [107]Kujawa, Z., Poole, J., Georgiev, D., Numeroso, D., and Liò, P.Neural algorithmic reasoning with multiple correct solutions,

2006

[20] [20]

[109]Kurdyka, K.On gradients of functions definable in o-minimal structures.Annales de l’Institut Fourier 48, 3 (1998), 769–783

[108]Kulbatov, V., Lang, J., Schneider, C., and Vybíral, J.Bases of Lebesgue spaces formed by neural networks.arXiv preprint arXiv:2511.23179(2025). [109]Kurdyka, K.On gradients of functions definable in o-minimal structures.Annales de l’Institut Fourier 48, 3 (1998), 769–783. [110]Li, W., Kratsios, A., Ghoukasian, H., and Zvigelsky, D.Certifiable Boolean...

work page arXiv 2025

[21] [21]

[116]Lu, J., Shen, Z., Yang, H., and Zhang, S.Deep network approximation for smooth functions. SIAM J. Math. Anal. 53, 5 (2021), 5465–5506. [117]Maass, W., Schnitger, G., and Sontag, E. D.On the computational power of sigmoid versus Boolean threshold circuits. InProceedings of the 32nd Annual IEEE Symposium on Foundations of Computer Science(1991), pp. 76...

2021

[22] [22]

[124]Mayer, S., and Ullrich, T.Entropy numbers of finite dimensional mixed-norm balls and function space embeddings with small mixed smoothness.Constr. Approx. 53, 2 (2021), 249–279. [125]McCulloch, W. S., and Pitts, W.A logical calculus of the ideas immanent in nervous activity.The Bulletin of Mathematical Biophysics 5(1943), 115–133. [126]Merrill, W., S...

2021

[23] [23]

N., and Micchelli, C

[128]Mhaskar, H. N., and Micchelli, C. A.Approximation by superposition of sigmoidal and radial basis functions.Advances in Applied Mathematics 13, 3 (1992), 350–373. [129]Mhaskar, H. N., and Poggio, T.Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications 14, 06 (2016), 829–848. [130]Mises, R., and Pollaczek-Geiringer, ...

1992

[24] [24]

J., and Brugiapaglia, S.Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms.arXiv preprint arXiv:2505.15661(2025)

81 [132]Mohammad-Taheri, S., Colbrook, M. J., and Brugiapaglia, S.Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms.arXiv preprint arXiv:2505.15661(2025). [133]Monga, V., Li, Y., and Eldar, Y. C.Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine 38, 2...

work page arXiv 2025

[25] [25]

[140]Nachbin, L.Weighted approximation for algebras and modules of continuous functions: Real and self-adjoint complex cases.Annals of Mathematics 81, 2 (1965), 289–302

[139]Nachbin, L.An extension of the notion of integral functions of the finite exponential type.Anais da Academia Brasileira de Ciências 16(1944), 143–147. [140]Nachbin, L.Weighted approximation for algebras and modules of continuous functions: Real and self-adjoint complex cases.Annals of Mathematics 81, 2 (1965), 289–302. [141]Neuman, A. M., and Brambur...

work page arXiv 1944

[26] [26]

[144]Opschoor, J

Accessed: 2026-05-26. [144]Opschoor, J. A. A., Schwab, C., and Zech, J.Exponential ReLU DNN expression of holomorphic maps in high dimension.Constructive Approximation 55, 1 (2022), 537–582. [145]Pándy, M., Qiu, W., Corso, G., Veličković, P., Ying, Z., Leskovec, J., and Liò, P.Learning graph search heuristics. InProceedings of the First Learning on Graphs...

2026

[27] [27]

E.On threshold circuits for parity.IEEE Transactions on Industry Applications 27, 1 (1991), 397–404

[147]Paturi, R., and Saks, M. E.On threshold circuits for parity.IEEE Transactions on Industry Applications 27, 1 (1991), 397–404. [148]Petersen, P., and Voigtlaender, F.Optimal approximation of piecewise smooth functions using deep ReLU neural networks.Neural Networks 108(2018), 296–330. [149]Petersen, P., and Zech, J.Mathematical Theory of Deep Learning...

work page arXiv 1991

[28] [28]

[154]Poggio, T., and Fraser, M.Compositional sparsity of learnable functions.Bulletin of the American Mathematical Society 61, 3 (2024), 438–456

[153]Pinkus, A.Approximation theory of the MLP model in neural networks.Acta Numerica 8(1999), 143–195. [154]Poggio, T., and Fraser, M.Compositional sparsity of learnable functions.Bulletin of the American Mathematical Society 61, 3 (2024), 438–456. [155]Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T.Numerical Recipes in FORTRAN: Th...

1999

[29] [29]

InInternational Conference on Learning Representations(2019)

[157]Pérez, J., Marinković, J., and Barceló, P.On the Turing completeness of modern neural network architectures. InInternational Conference on Learning Representations(2019). [158]Raphson, J.Analysis Aequationum Universalis. Thomas Braddyll, London,

2019

[30] [30]

[160]Robinson, J

[159]Rauhut, H., and Ward, R.Sparse Legendre expansions viaℓ1-minimization.Journal of Approxima- tion Theory 164, 5 (2012), 517–533. [160]Robinson, J. C.Dimensions, Embeddings, and Attractors, vol

2012

[31] [31]

Representation Benefits of Deep Feedforward Networks

[161]Rogers, L. G.Degree-independent Sobolev extension on locally uniform domains.Journal of Func- tional Analysis 235, 2 (2006), 619–665. [162]Rosenblatt, F.The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review 65, 6 (1958), 386–408. [163]Roy, B.Transitivité et connexité.C. R. Acad. Sci. Paris 24...

work page internal anchor Pith review Pith/arXiv arXiv 2006

[32] [32]

[186]van den Dries, L., and Miller, C.On the real exponential field with restricted analytic functions

[185]van den Dries, L., Macintyre, A., and Marker, D.The elementary theory of restricted analytic fields with exponentiation.Annals of Mathematics 140, 1 (1994), 183–205. [186]van den Dries, L., and Miller, C.On the real exponential field with restricted analytic functions. Israel Journal of Mathematics 85, 1–3 (1994), 19–56. [187]van den Dries, L., and M...

1994

[33] [33]

arXiv preprint arXiv:2603.01191(2026)

[195]Wang, C., and Townsend, A.Beyond singular value gaps in randomized subspace approximation. arXiv preprint arXiv:2603.01191(2026). [196]Wang, Z., Ling, Q., and Huang, T. S.Learning deepℓ 0 encoders. InProceedings of the AAAI Conference on Artificial Intelligence(2016), vol. 30, pp. 2194–2200. [197]Warshall, S.A theorem on Boolean matrices.Journal of t...

work page arXiv 2026

[34] [34]

[201]Xin, B., Wang, Y., Gao, W., Wipf, D., and Wang, B.Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems(2016), vol

[200]Xhonneux, L.-P., Deac, A.-I., Veličković, P., and Tang, J.Howtotransferalgorithmicreasoning knowledge to learn new algorithms?Advances in Neural Information Processing Systems 34(2021), 19500–19512. [201]Xin, B., Wang, Y., Gao, W., Wipf, D., and Wang, B.Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems(2016), v...

work page arXiv 2021