Algorithmic Foundations of Deep Learning: Complexity-Theoretic Rates and a Characterization of Universal Approximation
Pith reviewed 2026-06-26 05:18 UTC · model grok-4.3
The pith
Neural networks emulate real-valued circuits with explicit depth, width, and parameter bounds, and universally approximate continuous functions if and only if they contain a non-affine nonlinearity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If a function is computable by a real-valued circuit over a prescribed elementary gate language, then it can be computed to comparable accuracy by an NN with explicit depth, width, and non-zero-parameter bounds controlled by the depth, width, gate count, and gate structure. Any definable NN model satisfying a natural parallelization condition is a universal approximator if and only if it contains a non-affine nonlinearity.
What carries the argument
Emulation of real-valued circuits over elementary gates inside neural networks, together with the parallelization condition on definable models that allows multivariate nonlinearities.
If this is right
- Universal approximation holds for all continuous functions once a non-affine nonlinearity is present.
- Minimax-optimal approximation rates are recovered for Besov classes.
- Holomorphic functions admit logarithmic-error approximation by neural networks.
- Numerical algorithms such as Newton-Raphson and power iteration can be emulated directly by the network.
- Shortest-path computation on k-vertex graphs yields networks with O(log(1/ε)) non-zero parameters.
Where Pith is reading between the lines
- Circuit descriptions of target functions could be used to construct architecture-specific networks with near-optimal parameter counts.
- The parallelization condition may extend the characterization to attention-based or normalization-heavy models without separate proofs.
- Known results from real computation and circuit complexity could be imported to obtain new approximation bounds for structured function classes.
- The distinction between regularity and algorithmic complexity suggests testing whether certain high-regularity functions still require large networks when their circuit complexity is high.
Load-bearing premise
The neural-network model under consideration must be definable and must satisfy the parallelization condition that permits possibly multivariate nonlinearities.
What would settle it
Exhibit either a circuit-computable function whose approximation by any neural network requires super-linear growth in non-zero parameters relative to the circuit size, or a definable parallelizable model containing only affine nonlinearities that still approximates every continuous function on compact sets to arbitrary accuracy.
Figures
read the original abstract
Feedforward neural network (NN) expressivity is typically studied by emulating optimal basis-expansion schemes. While powerful, this perspective is incomplete: it primarily captures complexity through regularity, and therefore does not distinguish intuitively simple and complicated objects with comparable regularity, such as the square-root function and a typical Brownian path. The guiding message is that neural networks should be viewed not only as flexible basis functions, but also as models of computation. If a function is computable by a real-valued circuit over a prescribed elementary gate language, then it can be computed to comparable accuracy by an NN with explicit depth, width, and non-zero-parameter bounds controlled by the depth, width, gate count, and gate structure. Thus, neural-network complexity is not governed by regularity alone, but also by algorithmic complexity. We then show that any definable NN model satisfying a natural parallelization condition, allowing possibly multivariate non-linearities such as attention or layer normalization, is a universal approximator if and only if it contains a non-affine nonlinearity. The scope of our theory is illustrated by deducing universal approximation guarantees for continuous functions, minimax-optimal approximation guarantees for Besov classes, logarithmic-error complexity for holomorphic functions, and by showing that NNs can emulate numerical algorithms such as Newton-Raphson root finding and power iteration without architecture-specific arguments. Its precision is illustrated by shortest-path computation on $k$-vertex graphs: compiling the tropical dynamic-programming circuit yields NNs with O(log(1/{\epsilon})) non-zero parameters, exponentially improving in 1/{\epsilon} over the generic $O({\epsilon}^{-c k^2})$ Lipschitz-approximation scale, for a constant c>0.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that neural networks should be analyzed as computational models: any function computable by a real-valued circuit over a fixed gate language can be approximated to comparable accuracy by a feedforward NN whose depth, width, and number of non-zero parameters are explicitly bounded in terms of the circuit's depth, width, gate count, and structure. It further asserts that any definable NN model obeying a natural parallelization condition (permitting multivariate nonlinearities such as attention or layer normalization) is a universal approximator if and only if it contains at least one non-affine nonlinearity. The theory is illustrated by deriving universal-approximation statements for continuous functions, minimax rates for Besov classes, logarithmic-error bounds for holomorphic functions, and by showing that standard numerical algorithms (Newton-Raphson, power iteration) can be emulated; a concrete highlight is the construction of NNs for shortest-path computation on k-vertex graphs that use only O(log(1/ε)) non-zero parameters.
Significance. If the stated theorems hold with the claimed explicit bounds, the work supplies a complexity-theoretic foundation for NN expressivity that incorporates algorithmic structure rather than regularity alone. The circuit-emulation result and the if-and-only-if universal-approximation characterization under an explicitly stated modeling condition would be useful for deriving architecture-specific guarantees and for explaining why certain multivariate operations succeed. The concrete parameter-count improvement for shortest-path computation demonstrates the potential tightness of the bounds relative to generic Lipschitz arguments.
major comments (2)
- [Abstract and the section stating the UA theorem] The universal-approximation characterization (abstract, second paragraph) is conditional on the NN model being 'definable' and satisfying the 'natural parallelization condition.' The manuscript must supply a precise, checkable definition of both notions and verify that the listed examples (attention, layer normalization, Newton iteration) satisfy the condition without additional restrictions that would narrow the function class.
- [The section on shortest-path computation] The shortest-path claim (abstract, final sentence) asserts O(log(1/ε)) non-zero parameters obtained by compiling the tropical dynamic-programming circuit. The derivation of this count, including the precise mapping from circuit gates to network parameters and the verification that the resulting network indeed solves the problem to accuracy ε, is load-bearing for the claimed exponential improvement over the generic O(ε^{-c k^2}) scale and must be presented with all intermediate steps.
minor comments (2)
- The abstract is information-dense; separating the circuit-emulation theorem, the UA characterization, and the illustrative applications into distinct sentences would improve readability.
- The term 'definable NN model' appears without an inline definition in the abstract; a brief parenthetical gloss or forward reference to its formal definition would help readers.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. We address the two major comments below and will incorporate the requested clarifications and expansions in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract and the section stating the UA theorem] The universal-approximation characterization (abstract, second paragraph) is conditional on the NN model being 'definable' and satisfying the 'natural parallelization condition.' The manuscript must supply a precise, checkable definition of both notions and verify that the listed examples (attention, layer normalization, Newton iteration) satisfy the condition without additional restrictions that would narrow the function class.
Authors: We agree that explicit, checkable definitions are required for reproducibility. In the revised manuscript we will insert formal definitions of 'definable' (as a model whose operations are given by a fixed finite set of real-valued functions closed under composition and parallel application) and the 'natural parallelization condition' (the requirement that any collection of independent scalar or vector operations can be realized by a single layer whose width scales linearly with the number of parallel instances) directly in the section containing the UA theorem. We will then verify, with explicit constructions, that attention, layer normalization, and Newton iteration satisfy both notions under the modeling assumptions already stated in the paper, without imposing further restrictions on the representable function class. revision: yes
-
Referee: [The section on shortest-path computation] The shortest-path claim (abstract, final sentence) asserts O(log(1/ε)) non-zero parameters obtained by compiling the tropical dynamic-programming circuit. The derivation of this count, including the precise mapping from circuit gates to network parameters and the verification that the resulting network indeed solves the problem to accuracy ε, is load-bearing for the claimed exponential improvement over the generic O(ε^{-c k^2}) scale and must be presented with all intermediate steps.
Authors: We agree that the parameter-count derivation is central and must be fully explicit. In the revised manuscript we will expand the shortest-path section to include: (i) the complete tropical dynamic-programming circuit for k-vertex graphs, (ii) the gate-by-gate translation into NN layers together with the exact non-zero parameter count at each step, and (iii) a direct verification that the resulting network computes shortest paths to accuracy ε. This will make the O(log(1/ε)) bound and the comparison to the generic Lipschitz scale fully self-contained. revision: yes
Circularity Check
No significant circularity; derivations are self-contained reductions
full rationale
The paper frames its core results as explicit reductions from real-valued circuit models (with given gate language, depth, width, gate count) to NN depth/width/parameter bounds, plus an if-and-only-if universal-approximation characterization conditioned on the external modeling premises of definability and the parallelization condition. These premises are stated as modeling choices rather than derived quantities, and the circuit-to-NN emulation supplies concrete bounds in terms of the source circuit without reducing to any fitted parameter or self-referential definition inside the paper. No load-bearing step equates a claimed prediction to an input by construction, and no self-citation chain is invoked to justify uniqueness or an ansatz. The shortest-path example and other illustrations are presented as applications of the stated theorems rather than circular validations.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Functions of interest are exactly those computable by real-valued circuits over a fixed elementary gate language
- ad hoc to paper The neural-network family under study is definable and obeys the parallelization condition
invented entities (1)
-
definable NN model
no independent evidence
Reference graph
Works this paper leans on
-
[1]
[2]Adcock, B., Brugiapaglia, S., Dexter, N., and Moraga, S.Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data
Software available from tensorflow.org. [2]Adcock, B., Brugiapaglia, S., Dexter, N., and Moraga, S.Deep neural networks are effective at learning high-dimensional Hilbert-valued functions from limited data. InProceedings of the 2nd Mathematical and Scientific Machine Learning Conference(16–19 Aug 2022), J. Bruna, J. Hesthaven, and L. Zdeborova, Eds., vol....
2022
-
[2]
[5]Adcock, B., Dexter, N., and Moraga, S.Optimal deep learning of holomorphic operators between Banach spaces.Advances in Neural Information Processing Systems 37(2024), 27725–27789. [6]Aftab, J., Schwab, C., Yang, H., and Zech, J.Quantum circuit encodings of polynomial chaos expansions.arXiv preprint arXiv:2506.01811(2025). [7]Aldous, D., and Diaconis, P...
-
[3]
[11]Bartlett, P
[10]Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G.Optimal learners for realizable regression: PAC learning and online learning.Advances in Neural Information Processing Systems 36(2023). [11]Bartlett, P. L., Maiorov, V., and Meir, R.Almost linear VC-dimension bounds for piecewise polynomial networks.Neural Computation 10, 8 (1998), ...
2023
-
[4]
[14]Bennett, J., Carbery, A., Christ, M., and Tao, T.The Brascamp–Lieb inequalities: finiteness, structure and extremals.Geometric and Functional Analysis 17, 5 (2008), 1343–1415
[13]Bellman, R.Dynamic programming treatment of the travelling salesman problem.Journal of the ACM (JACM) 9, 1 (1962), 61–63. [14]Bennett, J., Carbery, A., Christ, M., and Tao, T.The Brascamp–Lieb inequalities: finiteness, structure and extremals.Geometric and Functional Analysis 17, 5 (2008), 1343–1415. [15]Bernstein, S. N.Démonstration du théorème de we...
1962
-
[5]
[25]Chen, Z., Villar, S., Chen, L., and Bruna, J.On the equivalence between graph isomorphism testing and function approximation with GNNs
[24]Chen, Y., Dong, B., and Xu, J.Meta-MgNet: Meta multigrid networks for solving parameterized partial differential equations.Journal of Computational Physics 455(2022), 110996. [25]Chen, Z., Villar, S., Chen, L., and Bruna, J.On the equivalence between graph isomorphism testing and function approximation with GNNs. InAdvances in Neural Information Proce...
2022
-
[6]
[27]Chiang, D.Transformers in uniform TC 0.Transactions on Machine Learning Research(Jan
[26]Cheridito, P., Jentzen, A., and Rossmannek, F.Efficient approximation of high-dimensional functions with neural networks.IEEE Transactions on Neural Networks and Learning Systems 33, 7 (July 2022), 3079–3093. [27]Chiang, D.Transformers in uniform TC 0.Transactions on Machine Learning Research(Jan. 2025). [28]Chkifa, A., Cohen, A., and Schwab, C.High-d...
2022
-
[7]
Lecture notes. [41]Cuchiero, C., Schmocker, P., and Teichmann, J.Global universal approximation of functional input maps on weighted spaces.Constructive Approximation(2026), 1–76. [42]Dahmen, W.ApproximationbylinearcombinationsofmultivariateB-splines.Journal of Approximation Theory 31, 4 (1981), 299–324. [43]Dahmen, W.Compositional sparsity, approximation...
-
[8]
[46]Daws, J., and Webster, C.Analysis of deep neural networks with quasi-optimal polynomial approx- imation rates.arXiv preprint arXiv:1912.02302(2019). [47]de Boor, C., and DeVore, R.Approximation by smooth multivariate splines.Transactions of the American Mathematical Society 276, 2 (1983), 775–788. [48]De Ryck, T., Lanthaler, S., and Mishra, S.On the a...
-
[9]
A., and Popov, V
[53]DeVore, R. A., and Popov, V. A.Interpolation of Besov spaces.Transactions of the American Mathematical Society 305, 1 (1988), 397–414. [54]DeVore, R. A., and Sharpley, R. C.Besov spaces on domains inR.Transactions of the American Mathematical Society 335, 2 (1993), 843–864. [55]Dolbeault, M., Krieg, D., and Ullrich, M.A sharp upper bound for sampling ...
1988
-
[10]
R., and Brugiapaglia, S.A practical existence theorem for reduced order models based on convolutional autoencoders.Foundations of Data Science 7, 1 (2025), 72–98
[60]Franco, N. R., and Brugiapaglia, S.A practical existence theorem for reduced order models based on convolutional autoencoders.Foundations of Data Science 7, 1 (2025), 72–98. [61]Furuya, T., and Kratsios, A.Simultaneously Solving FBSDEs with Neural Operators of Logarithmic Depth, Constant Width, and Sub-Linear Rank,
2025
-
[11]
[62]Furuya, T., Kratsios, A., Possamaï, D., and Raonić, B.One model to solve them all: 2BSDE families via neural operators.arXiv preprint arXiv:2511.01125(2025). [63]Georgiev, D., Barbiero, P., Kazhdan, D., Veličković, P., and Liò, P.Algorithmic concept- based explainable reasoning. InProceedings of the AAAI Conference on Artificial Intelligence(2022), vo...
-
[12]
[68]Gonon, L., Grigoryeva, L., and Ortega, J.-P.Approximation bounds for random neural networks and reservoir systems.The Annals of Applied Probability 33, 1 (2023), 28–69
[67]Goldbring, I., Hart, B., and Kruckman, A.The almost sure theory of finite metric spaces.Bulletin of the London Mathematical Society 53, 6 (2021), 1740–1748. [68]Gonon, L., Grigoryeva, L., and Ortega, J.-P.Approximation bounds for random neural networks and reservoir systems.The Annals of Applied Probability 33, 1 (2023), 28–69. [69]Greenfeld, D., Galu...
2021
-
[13]
InProceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC)(1986), ACM, pp
[74]Håstad, J.Almost optimal lower bounds for small depth circuits. InProceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC)(1986), ACM, pp. 6–20. [75]Håstad, J.On the correlation of parity and small-depth circuits.SIAM Journal on Computing 43, 5 (2014), 1699–1708. [76]He, J., Liu, X., and Xu, J.MgNO: Efficient parameterization of line...
1986
-
[14]
[79]Held, M., and Karp, R. M.A dynamic programming approach to sequencing problems.Journal of the Society for Industrial and Applied mathematics 10, 1 (1962), 196–210. [80]HIERONYMI, P., and MILLER, C.Metric dimensions and tameness in expansions of the real field. Transactions of the American Mathematical Society 373, 2 (1029) (2020), pp. 849–874. [81]Hon...
-
[15]
J., Bošnjak, M., Vitvitskyi, A., Rubanova, Y., Deac, A., Bevilacqua, B., Ganin, Y., Blundell, C., and Veličković, P.A generalist neural algorithmic learner
[84]Ibarz, B., Kurin, V., Papamakarios, G., Nikiforou, K., Bennani, M., Csordás, R., Dudzik, A. J., Bošnjak, M., Vitvitskyi, A., Rubanova, Y., Deac, A., Bevilacqua, B., Ganin, Y., Blundell, C., and Veličković, P.A generalist neural algorithmic learner. InProceedings of the First Learning on Graphs Conference(2022), vol. 198 ofProceedings of Machine Learni...
2022
-
[16]
Journal of the ACM (JACM) 29, 3 (1982), 874–897
[87]Jerrum, M., and Snir, M.Some exact complexity results for straight-line computations over semirings. Journal of the ACM (JACM) 29, 3 (1982), 874–897. [88]Jones, P. W.Quasiconformal mappings and extendability of functions in Sobolev spaces.Acta Math- ematica 147(1981), 71–88. [89]Jukna, S.Boolean function complexity, vol. 27 ofAlgorithms and Combinator...
1982
-
[17]
[93]Kerr, L
79 [92]Karpinski, M., and Macintyre, A.Polynomial bounds for VC dimension of sigmoidal and general pfaffian neural networks.Journal of Computer and System Sciences 54, 1 (1997), 169–176. [93]Kerr, L. R.The Effect of Algebraic Structure on the Computation Complexity of Matrix Multiplications. PhD thesis, Cornell University, Ithaca, NY,
1997
-
[18]
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
Translated from the Russian by Smilka Zdravkovska. [95]Kidger, P., and Lyons, T.Universal approximation with deep narrow networks. InConference on Learning Theory(2020), PMLR, pp. 2306–2327. [96]Kolmogorov, A. N.On certain asymptotic characteristics of completely bounded metric spaces. Doklady Akademii Nauk SSSR 108, 3 (1956), 385–388. In Russian. [97]Kol...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[19]
Approx 23, 1 (2006), 61–77
[106]Kühn, T., Leopold, H.-G., Sickel, W., and Skrzypczak, L.Entropy numbers of embeddings of weighted besov spaces.Constr. Approx 23, 1 (2006), 61–77. [107]Kujawa, Z., Poole, J., Georgiev, D., Numeroso, D., and Liò, P.Neural algorithmic reasoning with multiple correct solutions,
2006
-
[20]
[108]Kulbatov, V., Lang, J., Schneider, C., and Vybíral, J.Bases of Lebesgue spaces formed by neural networks.arXiv preprint arXiv:2511.23179(2025). [109]Kurdyka, K.On gradients of functions definable in o-minimal structures.Annales de l’Institut Fourier 48, 3 (1998), 769–783. [110]Li, W., Kratsios, A., Ghoukasian, H., and Zvigelsky, D.Certifiable Boolean...
-
[21]
[116]Lu, J., Shen, Z., Yang, H., and Zhang, S.Deep network approximation for smooth functions. SIAM J. Math. Anal. 53, 5 (2021), 5465–5506. [117]Maass, W., Schnitger, G., and Sontag, E. D.On the computational power of sigmoid versus Boolean threshold circuits. InProceedings of the 32nd Annual IEEE Symposium on Foundations of Computer Science(1991), pp. 76...
2021
-
[22]
[124]Mayer, S., and Ullrich, T.Entropy numbers of finite dimensional mixed-norm balls and function space embeddings with small mixed smoothness.Constr. Approx. 53, 2 (2021), 249–279. [125]McCulloch, W. S., and Pitts, W.A logical calculus of the ideas immanent in nervous activity.The Bulletin of Mathematical Biophysics 5(1943), 115–133. [126]Merrill, W., S...
2021
-
[23]
N., and Micchelli, C
[128]Mhaskar, H. N., and Micchelli, C. A.Approximation by superposition of sigmoidal and radial basis functions.Advances in Applied Mathematics 13, 3 (1992), 350–373. [129]Mhaskar, H. N., and Poggio, T.Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications 14, 06 (2016), 829–848. [130]Mises, R., and Pollaczek-Geiringer, ...
1992
-
[24]
81 [132]Mohammad-Taheri, S., Colbrook, M. J., and Brugiapaglia, S.Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms.arXiv preprint arXiv:2505.15661(2025). [133]Monga, V., Li, Y., and Eldar, Y. C.Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine 38, 2...
-
[25]
[139]Nachbin, L.An extension of the notion of integral functions of the finite exponential type.Anais da Academia Brasileira de Ciências 16(1944), 143–147. [140]Nachbin, L.Weighted approximation for algebras and modules of continuous functions: Real and self-adjoint complex cases.Annals of Mathematics 81, 2 (1965), 289–302. [141]Neuman, A. M., and Brambur...
-
[26]
[144]Opschoor, J
Accessed: 2026-05-26. [144]Opschoor, J. A. A., Schwab, C., and Zech, J.Exponential ReLU DNN expression of holomorphic maps in high dimension.Constructive Approximation 55, 1 (2022), 537–582. [145]Pándy, M., Qiu, W., Corso, G., Veličković, P., Ying, Z., Leskovec, J., and Liò, P.Learning graph search heuristics. InProceedings of the First Learning on Graphs...
2026
-
[27]
E.On threshold circuits for parity.IEEE Transactions on Industry Applications 27, 1 (1991), 397–404
[147]Paturi, R., and Saks, M. E.On threshold circuits for parity.IEEE Transactions on Industry Applications 27, 1 (1991), 397–404. [148]Petersen, P., and Voigtlaender, F.Optimal approximation of piecewise smooth functions using deep ReLU neural networks.Neural Networks 108(2018), 296–330. [149]Petersen, P., and Zech, J.Mathematical Theory of Deep Learning...
-
[28]
[154]Poggio, T., and Fraser, M.Compositional sparsity of learnable functions.Bulletin of the American Mathematical Society 61, 3 (2024), 438–456
[153]Pinkus, A.Approximation theory of the MLP model in neural networks.Acta Numerica 8(1999), 143–195. [154]Poggio, T., and Fraser, M.Compositional sparsity of learnable functions.Bulletin of the American Mathematical Society 61, 3 (2024), 438–456. [155]Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T.Numerical Recipes in FORTRAN: Th...
1999
-
[29]
InInternational Conference on Learning Representations(2019)
[157]Pérez, J., Marinković, J., and Barceló, P.On the Turing completeness of modern neural network architectures. InInternational Conference on Learning Representations(2019). [158]Raphson, J.Analysis Aequationum Universalis. Thomas Braddyll, London,
2019
-
[30]
[160]Robinson, J
[159]Rauhut, H., and Ward, R.Sparse Legendre expansions viaℓ1-minimization.Journal of Approxima- tion Theory 164, 5 (2012), 517–533. [160]Robinson, J. C.Dimensions, Embeddings, and Attractors, vol
2012
-
[31]
Representation Benefits of Deep Feedforward Networks
[161]Rogers, L. G.Degree-independent Sobolev extension on locally uniform domains.Journal of Func- tional Analysis 235, 2 (2006), 619–665. [162]Rosenblatt, F.The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review 65, 6 (1958), 386–408. [163]Roy, B.Transitivité et connexité.C. R. Acad. Sci. Paris 24...
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[32]
[186]van den Dries, L., and Miller, C.On the real exponential field with restricted analytic functions
[185]van den Dries, L., Macintyre, A., and Marker, D.The elementary theory of restricted analytic fields with exponentiation.Annals of Mathematics 140, 1 (1994), 183–205. [186]van den Dries, L., and Miller, C.On the real exponential field with restricted analytic functions. Israel Journal of Mathematics 85, 1–3 (1994), 19–56. [187]van den Dries, L., and M...
1994
-
[33]
arXiv preprint arXiv:2603.01191(2026)
[195]Wang, C., and Townsend, A.Beyond singular value gaps in randomized subspace approximation. arXiv preprint arXiv:2603.01191(2026). [196]Wang, Z., Ling, Q., and Huang, T. S.Learning deepℓ 0 encoders. InProceedings of the AAAI Conference on Artificial Intelligence(2016), vol. 30, pp. 2194–2200. [197]Warshall, S.A theorem on Boolean matrices.Journal of t...
-
[34]
[200]Xhonneux, L.-P., Deac, A.-I., Veličković, P., and Tang, J.Howtotransferalgorithmicreasoning knowledge to learn new algorithms?Advances in Neural Information Processing Systems 34(2021), 19500–19512. [201]Xin, B., Wang, Y., Gao, W., Wipf, D., and Wang, B.Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems(2016), v...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.