Expressive Power of Floating-Point Neural Networks with Arbitrary Reduction Orders and Inexact Activation Implementations

Geonho Hwang; Sejun Park; Wonyeol Lee; Yeachan Park

arxiv: 2605.28704 · v1 · pith:PCW47THAnew · submitted 2026-05-27 · 💻 cs.LG

Expressive Power of Floating-Point Neural Networks with Arbitrary Reduction Orders and Inexact Activation Implementations

Yeachan Park , Geonho Hwang , Wonyeol Lee , Sejun Park This is my paper

Pith reviewed 2026-06-29 13:35 UTC · model grok-4.3

classification 💻 cs.LG

keywords floating-point neural networksexpressive poweruniversal representabilityactivation functionsdistinguishabilityreduction ordersulp error

0 comments

The pith

Floating-point neural networks represent any function between floating-point domains exactly when their first-layer activations distinguish every pair of distinct inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the conditions under which neural networks executed with real floating-point arithmetic, arbitrary reduction orders, and inexact activation implementations can exactly represent every possible mapping from one floating-point set to another. It introduces a distinguishability framework showing that the first layer must separate all distinct inputs as a necessary condition for this universal representability. The work further proves that distinguishability is also sufficient under mild conditions on the activation, and verifies that this holds for implementations of common functions including sigmoid, tanh, ReLU, ELU, SeLU, GeLU, Swish, Mish, and sin.

Core claim

A floating-point neural network achieves universal representability over floating-point domains if and only if its first layer distinguishes every pair of distinct inputs, with the sufficiency direction holding once the activation implementation meets mild conditions that allow distinctions to propagate through the network.

What carries the argument

The distinguishability framework, which requires that for every pair of distinct inputs there is at least one first-layer neuron whose activation output differs on that pair.

If this is right

Implementations of sigmoid, tanh, ReLU, ELU, SeLU, GeLU, Swish, Mish, and sin become universal representators under arbitrary reduction orders and bounded ulp errors.
Correctly rounded cosine and certain other activations remain non-universal even under the generalized model.
Universal representability fails for any activation whose first layer cannot separate all distinct inputs.
Prior results limited to fixed left-to-right reduction and exact rounding are subsumed by the new framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware or compiler changes to reduction order could turn a previously universal network non-universal for a given activation.
Verifying distinguishability only on the first layer offers a practical test for universality without enumerating all possible target functions.

Load-bearing premise

The activation implementation satisfies additional mild conditions beyond bounded ulp error that let first-layer distinctions propagate to the full network.

What would settle it

A concrete activation implementation that distinguishes all input pairs in the first layer yet fails to represent some target function on the floating-point domain when reduction order is arbitrary.

Figures

Figures reproduced from arXiv: 2605.28704 by Geonho Hwang, Sejun Park, Wonyeol Lee, Yeachan Park.

**Figure 2.** Figure 2: Visualization of the conditions in Lemma [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Most existing expressivity theories for neural networks assume exact real arithmetic, whereas practical neural networks are executed under finite-precision floating-point arithmetic with implementation-dependent execution semantics. Recent works have begun studying the expressive power of floating-point neural networks, but existing results are limited to highly restricted activation functions and idealized assumptions such as fixed left-to-right reduction orders and correctly rounded activation implementations. In this work, we study the expressive power of floating-point neural networks under generalized floating-point execution semantics, including arbitrary reduction orders and inexact activation implementations with bounded ulp errors. We investigate when floating-point neural networks can represent arbitrary functions between floating-point domains exactly. To this end, we introduce a general distinguishability framework and show that the ability to distinguish every pair of distinct inputs in the first layer is necessary for universal representability. This characterization yields broad classes of activation implementations that are not universal representators, extending previous isolated counterexamples such as the correctly rounded cosine activation. We further prove that a suitable form of distinguishability is also sufficient for universal representability under mild conditions on the activation implementation. Using this framework, we establish universal representability results for a broad class of practical activation functions, including implementations of $\mathrm{Sigmoid}$, $\tanh$, $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{SeLU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Mish}$, and $\sin$, under significantly more realistic floating-point execution models than previously known.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's distinguishability framework extends expressivity results to arbitrary reduction orders and inexact activations, with necessity shown generally and sufficiency claimed under mild conditions for many practical activations.

read the letter

The new piece is the generalization to arbitrary reduction orders and bounded-error inexact activations, plus the distinguishability condition that is necessary in general and sufficient under mild conditions on the activation. This lets them cover Sigmoid, tanh, ReLU, ELU, SeLU, GeLU, Swish, Mish, and sin, which goes beyond the fixed left-to-right and correctly-rounded cases in prior work.

The necessity direction looks clean because it follows directly from the execution model without extra assumptions. The sufficiency lift is the part that depends on those mild conditions propagating through the network and the reduction order.

The soft spot is exactly the one flagged in the stress test: the abstract invokes mild conditions for sufficiency but does not indicate that they were checked explicitly for sin, Mish, or GeLU under arbitrary reductions. If those conditions turn out to be something like error-bounded monotonicity that survives reduction, the paper needs to show they hold for each listed activation; otherwise the universal-representability claims for the harder functions rest on unverified steps. The necessity results and the framework itself still stand.

This is for readers working on floating-point verification or expressivity theory who want results that match actual execution semantics. It is worth sending to peer review because the framework is new, the necessity part is general, and the concrete claims for common activations are broader than before, even if the sufficiency details need tightening.

Referee Report

1 major / 1 minor

Summary. The paper introduces a distinguishability framework for floating-point neural networks under arbitrary reduction orders and inexact activations with bounded ulp errors. It proves that first-layer distinguishability of distinct inputs is necessary for a network to exactly represent arbitrary functions between floating-point domains. It further shows that a suitable distinguishability property is sufficient for universal representability under mild conditions on the activation implementation, and applies the framework to establish universal representability for implementations of Sigmoid, tanh, ReLU, ELU, SeLU, GeLU, Swish, Mish, and sin.

Significance. If the results hold, the work substantially extends prior floating-point expressivity results (limited to fixed reduction orders and correctly rounded activations) by providing a general necessity-sufficiency characterization under more realistic execution models. The framework enables both negative classifications and positive results for a broad class of practical activations, which would be a meaningful advance in the theory of neural network expressivity.

major comments (1)

[Sufficiency theorem] Sufficiency theorem (framework section): the claim that distinguishability is sufficient for universal representability requires 'mild conditions' on the activation implementation to propagate through arbitrary reduction orders and inexact activations; the manuscript does not explicitly verify these conditions (e.g., continuity or error-bounded monotonicity surviving reduction) for sin, Mish, or GeLU, which is load-bearing for the universal-representability results listed for those activations.

minor comments (1)

[Abstract / Introduction] The abstract invokes 'mild conditions' without a precise statement; a brief enumeration of the conditions in the introduction or framework section would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address the major comment regarding the sufficiency theorem below.

read point-by-point responses

Referee: [Sufficiency theorem] Sufficiency theorem (framework section): the claim that distinguishability is sufficient for universal representability requires 'mild conditions' on the activation implementation to propagate through arbitrary reduction orders and inexact activations; the manuscript does not explicitly verify these conditions (e.g., continuity or error-bounded monotonicity surviving reduction) for sin, Mish, or GeLU, which is load-bearing for the universal-representability results listed for those activations.

Authors: The referee correctly identifies that the sufficiency result depends on mild conditions on the activation implementations. The manuscript applies the framework to sin, Mish, and GeLU based on the fact that their standard floating-point implementations satisfy the required properties (such as the error being bounded by a small number of ulps and preserving sufficient monotonicity or continuity for the distinguishability to hold under arbitrary reductions). However, we agree that explicit verification would strengthen the presentation. We will revise the manuscript to include a dedicated verification subsection or appendix detailing how the mild conditions are satisfied for each of these activations, including sin, Mish, and GeLU. This will make the universal representability claims fully rigorous and self-contained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper defines a distinguishability framework, proves necessity of first-layer distinguishability directly from the floating-point execution model, and establishes sufficiency under separately stated mild conditions on activations. No quoted step reduces a central claim to a fitted parameter, self-definition, or load-bearing self-citation chain; the listed activation results follow from applying the framework rather than presupposing the target representability. The derivation remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on standard mathematical properties of floating-point arithmetic and the assumption of bounded ulp errors in activation implementations; no free parameters are fitted and no new entities are postulated.

pith-pipeline@v0.9.1-grok · 5806 in / 1182 out tokens · 33404 ms · 2026-06-29T13:35:12.560373+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 7 canonical work pages · 5 internal anchors

[1]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Mart ´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems.arXiv preprint arXiv:1603.04467, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arn ´e Clevert. Fast and accurate deep network learning by exponential linear units (elus).arXiv preprint arXiv:1511.07289, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[3]

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

1989
[4]

On the universal approximability and complexity bounds of quantized ReLU neural networks

Yukun Ding, Jinglan Liu, Jinjun Xiong, and Yiyu Shi. On the universal approximability and complexity bounds of quantized ReLU neural networks. InInternational Conference on Learning Representations (ICLR), 2019

2019
[5]

Accuracy of mathematical functions in single, double, double extended, and quadruple precision

Brian Gladman, Vincenzo Innocente, John Mather, and Paul Zimmer- mann. Accuracy of mathematical functions in single, double, double extended, and quadruple precision. 2025

2025
[6]

Approximation speed of quantized versus unquantized ReLU neural networks and beyond.IEEE Transactions on Information Theory, 69(6):3960–3977, 2023

Antoine Gonon, Nicolas Brisebarre, R ´emi Gribonval, and Elisa Riccietti. Approximation speed of quantized versus unquantized ReLU neural networks and beyond.IEEE Transactions on Information Theory, 69(6):3960–3977, 2023

2023
[7]

Improve your model’s performance with bfloat16

Google. Improve your model’s performance with bfloat16. https://cloud. google.com/tpu/docs/bfloat16
[8]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

1989
[10]

Floating- point neural networks can represent almost all floating-point functions

Geonho Hwang, Yeachan Park, Wonyeol Lee, and Sejun Park. Floating- point neural networks can represent almost all floating-point functions. InForty-second International Conference on Machine Learning, 2025

2025
[11]

On expressive power of quantized neural networks under fixed-point arithmetic.arXiv preprint arXiv:2409.00297, 2024

Geonho Hwang, Yeachan Park, and Sejun Park. On expressive power of quantized neural networks under fixed-point arithmetic.arXiv preprint arXiv:2409.00297, 2024

work page arXiv 2024
[12]

IEEE, Piscataway, NJ, USA, 2019

IEEE.IEEE Standard for Floating-Point Arithmetic (IEEE Std 754- 2019). IEEE, Piscataway, NJ, USA, 2019

2019
[13]

Universal approximation with deep narrow networks

Patrick Kidger and Terry Lyons. Universal approximation with deep narrow networks. InConference on Learning Theory (COLT), 2020

2020
[14]

Self-normalizing neural networks

G ¨unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-normalizing neural networks. InAnnual Conference on Neural Information Processing Systems (NeurIPS), 2017

2017
[15]

Deep learning

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015

2015
[16]

Multilayer feedforward networks with a nonpolynomial activation func- tion can approximate any function.Neural Networks, 1993

Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation func- tion can approximate any function.Neural Networks, 1993

1993
[17]

The expressive power of neural networks: A view from the width

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width. InAnnual Conference on Neural Information Processing Systems (NeurIPS), 2017

2017
[18]

FP8 Formats for Deep Learning

Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Hei- necke, Patrick Judd, John Kamalu, et al. FP8 formats for deep learning. arXiv preprint arXiv:2209.05433, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Mish: A self regularized non-monotonic activation function.arXiv preprint arXiv:1908.08681, 2019

Diganta Misra. Mish: A self regularized non-monotonic activation function.arXiv preprint arXiv:1908.08681, 2019

work page arXiv 1908
[20]

Minimum width for universal approximation

Sejun Park, Chulhee Yun, Jaeho Lee, and Jinwoo Shin. Minimum width for universal approximation. InInternational Conference on Learning Representations (ICLR), 2021

2021
[21]

Expres- sive power of ReLU and step networks under floating-point operations

Yeachan Park, Geonho Hwang, Wonyeol Lee, and Sejun Park. Expres- sive power of ReLU and step networks under floating-point operations. Neural Networks, 175:106297, 2024

2024
[22]

Approximation theory of the MLP model in neural networks.Acta Numerica, 8:143 – 195, 1999

Allan Pinkus. Approximation theory of the MLP model in neural networks.Acta Numerica, 8:143 – 195, 1999

1999
[23]

Searching for Activation Functions

Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions.arXiv preprint arXiv:1710.05941, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

What’s hidden in a randomly weighted neural network? InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, and Mohammad Rastegari. What’s hidden in a randomly weighted neural network? InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2020
[25]

Universal approximation power of deep residual neural networks via nonlinear control theory

Paulo Tabuada and Bahman Gharesifard. Universal approximation power of deep residual neural networks via nonlinear control theory. In International Conference on Learning Representations (ICLR), 2021

2021
[26]

A comprehensive review of binary neural network.Artificial Intelligence Review, 56(11):12949–13013, 2023

Chunyu Yuan and Sos S Agaian. A comprehensive review of binary neural network.Artificial Intelligence Review, 56(11):12949–13013, 2023

2023
[27]

Are transformers universal approximators of sequence-to-sequence functions? InInternational Conference on Learning Representations (ICLR), 2020

Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, and Sanjiv Kumar. Are transformers universal approximators of sequence-to-sequence functions? InInternational Conference on Learning Representations (ICLR), 2020

2020
[28]

Universality of deep convolutional neural networks

Ding-Xuan Zhou. Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis, 48(2):787–794, 2020

2020

[1] [1]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Mart ´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems.arXiv preprint arXiv:1603.04467, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arn ´e Clevert. Fast and accurate deep network learning by exponential linear units (elus).arXiv preprint arXiv:1511.07289, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[3] [3]

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

1989

[4] [4]

On the universal approximability and complexity bounds of quantized ReLU neural networks

Yukun Ding, Jinglan Liu, Jinjun Xiong, and Yiyu Shi. On the universal approximability and complexity bounds of quantized ReLU neural networks. InInternational Conference on Learning Representations (ICLR), 2019

2019

[5] [5]

Accuracy of mathematical functions in single, double, double extended, and quadruple precision

Brian Gladman, Vincenzo Innocente, John Mather, and Paul Zimmer- mann. Accuracy of mathematical functions in single, double, double extended, and quadruple precision. 2025

2025

[6] [6]

Approximation speed of quantized versus unquantized ReLU neural networks and beyond.IEEE Transactions on Information Theory, 69(6):3960–3977, 2023

Antoine Gonon, Nicolas Brisebarre, R ´emi Gribonval, and Elisa Riccietti. Approximation speed of quantized versus unquantized ReLU neural networks and beyond.IEEE Transactions on Information Theory, 69(6):3960–3977, 2023

2023

[7] [7]

Improve your model’s performance with bfloat16

Google. Improve your model’s performance with bfloat16. https://cloud. google.com/tpu/docs/bfloat16

[8] [8]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[9] [9]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359–366, 1989

1989

[10] [10]

Floating- point neural networks can represent almost all floating-point functions

Geonho Hwang, Yeachan Park, Wonyeol Lee, and Sejun Park. Floating- point neural networks can represent almost all floating-point functions. InForty-second International Conference on Machine Learning, 2025

2025

[11] [11]

On expressive power of quantized neural networks under fixed-point arithmetic.arXiv preprint arXiv:2409.00297, 2024

Geonho Hwang, Yeachan Park, and Sejun Park. On expressive power of quantized neural networks under fixed-point arithmetic.arXiv preprint arXiv:2409.00297, 2024

work page arXiv 2024

[12] [12]

IEEE, Piscataway, NJ, USA, 2019

IEEE.IEEE Standard for Floating-Point Arithmetic (IEEE Std 754- 2019). IEEE, Piscataway, NJ, USA, 2019

2019

[13] [13]

Universal approximation with deep narrow networks

Patrick Kidger and Terry Lyons. Universal approximation with deep narrow networks. InConference on Learning Theory (COLT), 2020

2020

[14] [14]

Self-normalizing neural networks

G ¨unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-normalizing neural networks. InAnnual Conference on Neural Information Processing Systems (NeurIPS), 2017

2017

[15] [15]

Deep learning

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015

2015

[16] [16]

Multilayer feedforward networks with a nonpolynomial activation func- tion can approximate any function.Neural Networks, 1993

Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation func- tion can approximate any function.Neural Networks, 1993

1993

[17] [17]

The expressive power of neural networks: A view from the width

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width. InAnnual Conference on Neural Information Processing Systems (NeurIPS), 2017

2017

[18] [18]

FP8 Formats for Deep Learning

Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Hei- necke, Patrick Judd, John Kamalu, et al. FP8 formats for deep learning. arXiv preprint arXiv:2209.05433, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[19] [19]

Mish: A self regularized non-monotonic activation function.arXiv preprint arXiv:1908.08681, 2019

Diganta Misra. Mish: A self regularized non-monotonic activation function.arXiv preprint arXiv:1908.08681, 2019

work page arXiv 1908

[20] [20]

Minimum width for universal approximation

Sejun Park, Chulhee Yun, Jaeho Lee, and Jinwoo Shin. Minimum width for universal approximation. InInternational Conference on Learning Representations (ICLR), 2021

2021

[21] [21]

Expres- sive power of ReLU and step networks under floating-point operations

Yeachan Park, Geonho Hwang, Wonyeol Lee, and Sejun Park. Expres- sive power of ReLU and step networks under floating-point operations. Neural Networks, 175:106297, 2024

2024

[22] [22]

Approximation theory of the MLP model in neural networks.Acta Numerica, 8:143 – 195, 1999

Allan Pinkus. Approximation theory of the MLP model in neural networks.Acta Numerica, 8:143 – 195, 1999

1999

[23] [23]

Searching for Activation Functions

Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions.arXiv preprint arXiv:1710.05941, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

What’s hidden in a randomly weighted neural network? InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, and Mohammad Rastegari. What’s hidden in a randomly weighted neural network? InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2020

[25] [25]

Universal approximation power of deep residual neural networks via nonlinear control theory

Paulo Tabuada and Bahman Gharesifard. Universal approximation power of deep residual neural networks via nonlinear control theory. In International Conference on Learning Representations (ICLR), 2021

2021

[26] [26]

A comprehensive review of binary neural network.Artificial Intelligence Review, 56(11):12949–13013, 2023

Chunyu Yuan and Sos S Agaian. A comprehensive review of binary neural network.Artificial Intelligence Review, 56(11):12949–13013, 2023

2023

[27] [27]

Are transformers universal approximators of sequence-to-sequence functions? InInternational Conference on Learning Representations (ICLR), 2020

Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, and Sanjiv Kumar. Are transformers universal approximators of sequence-to-sequence functions? InInternational Conference on Learning Representations (ICLR), 2020

2020

[28] [28]

Universality of deep convolutional neural networks

Ding-Xuan Zhou. Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis, 48(2):787–794, 2020

2020