arxiv: 2604.15769 · v1 · submitted 2026-04-17 · 💻 cs.LG · cs.AI

Recognition: unknown

Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

Dongxin Guo , Jikun Wu , Siu Ming Yiu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 09:17 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords spiking transformersuniversal approximationLeaky Integrate-and-Fire neuronseffective dimensionrate-distortion theoryneuromorphic computingself-attentionpermutation-equivariant functions

0 comments

The pith

Spiking attention with Leaky Integrate-and-Fire neurons approximates any continuous permutation-equivariant function using explicit spike circuits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proves that spiking self-attention can represent the same class of functions as standard transformer attention. It gives explicit constructions of spike circuits, including a lateral inhibition network that normalizes like softmax and converges at rate O(1 over square root of T). Rate-distortion theory supplies lower bounds on total spikes needed for a target error, but these bounds become practical once the input's effective dimension is measured instead of using worst-case size. The result explains why four timesteps already work on real vision and language data and supplies concrete rules for choosing neuron parameters and timestep counts.

Core claim

Spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, with explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven O(1/√T) convergence. We derive tight spike-count lower bounds via rate-distortion theory: ε-approximation requires Ω(L_f² nd/ε²) spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions (d_eff=47--89 for CIFAR/ImageNet), explaining why T=4 timesteps suffice despite worst-case T ≥ 10,000 predictions. We provide concrete design rules with calibrated constants (C=2.

What carries the argument

Effective dimension of the input, which converts worst-case rate-distortion spike-count bounds into input-dependent predictions for the spiking attention circuits.

If this is right

Effective dimensions of 47-89 on CIFAR and ImageNet imply that only four timesteps suffice for accurate function approximation.
The calibrated constant C=2.3 (with 95% CI [1.9, 2.7]) directly predicts required timesteps and total spikes for new spiking transformer designs.
The theory matches observed accuracy on Spikformer, QKFormer, and SpikingResformer with R²=0.97.
Neuromorphic implementations can retain full expressivity while delivering the reported 38-57× energy savings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The lateral inhibition circuit could be mapped to analog or digital neuromorphic hardware for efficient normalization.
Measuring effective dimension on new data modalities could predict timestep requirements without exhaustive search.
The same rate-distortion approach may yield spike bounds for other spiking layers beyond self-attention.
Design rules derived here could be tested by building a spiking transformer whose timestep count is set solely from the formula and then measuring its error.

Load-bearing premise

Rate-distortion theory supplies tight lower bounds for the specific spiking attention construction and the measured effective dimensions from standard benchmarks generalize to arbitrary tasks.

What would settle it

A dataset where the measured effective dimension predicts far fewer spikes than actually needed to reach a stated approximation error, or where the lateral inhibition circuit fails to show the claimed O(1/√T) convergence.

Figures

Figures reproduced from arXiv: 2604.15769 by Dongxin Guo, Jikun Wu, Siu Ming Yiu.

**Figure 2.** Figure 2: Winner-Take-All (WTA) lateral inhibition circuit implementing [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Spike-error scaling law validation (log-log). Observed slope [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Spiking transformers achieve competitive accuracy with conventional transformers while offering $38$-$57\times$ energy efficiency on neuromorphic hardware, yet no theoretical framework guides their design. This paper establishes the first comprehensive expressivity theory for spiking self-attention. We prove that spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, providing explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven $O(1/\sqrt{T})$ convergence. We derive tight spike-count lower bounds via rate-distortion theory: $\varepsilon$-approximation requires $\Omega(L_f^2 nd/\varepsilon^2)$ spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions ($d_{\text{eff}}=47$--$89$ for CIFAR/ImageNet), explaining why $T=4$ timesteps suffice despite worst-case $T \geq 10{,}000$ predictions. We provide concrete design rules with calibrated constants ($C=2.3$, 95\% CI: $[1.9, 2.7]$). Experiments on Spikformer, QKFormer, and SpikingResformer across vision and language benchmarks validate predictions with $R^2=0.97$ ($p<0.001$). Our framework provides the first principled foundation for neuromorphic transformer design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies the first expressivity theory and input-dependent bounds for spiking self-attention, but the rate-distortion lower bound skips the explicit information-to-spike mapping under LIF dynamics.

read the letter

The main takeaway is that this work supplies the first explicit universal-approximation construction for spiking self-attention with LIF neurons, plus a lateral-inhibition circuit for softmax that converges at O(1/sqrt(T)). The effective-dimension analysis using measured d_eff values from CIFAR and ImageNet is a practical way to replace worst-case T bounds with data-dependent ones, and the R^2=0.97 fit across Spikformer, QKFormer, and SpikingResformer gives the claims some empirical grounding. Those pieces are genuinely new and address a real gap between the 38-57x energy numbers and the lack of design theory.

Referee Report

3 major / 2 minor

Summary. The paper claims to close the theory-practice gap for spiking transformers by proving that LIF-based spiking self-attention is a universal approximator of continuous permutation-equivariant functions, supplying explicit constructions (including a novel lateral-inhibition circuit for softmax with O(1/√T) convergence), deriving tight spike-count lower bounds Ω(L_f² nd/ε²) via rate-distortion theory, introducing input-dependent bounds that use measured effective dimensions (d_eff = 47–89 on CIFAR/ImageNet) to explain why T=4 suffices, and supplying calibrated design rules (C=2.3) that are validated on Spikformer, QKFormer and SpikingResformer with R²=0.97.

Significance. If the proofs and the information-to-spike mapping hold, the work would supply the first principled expressivity and resource theory for spiking attention, directly informing energy-efficient neuromorphic transformer design. The explicit circuit constructions and the high-R² empirical validation of the resulting design rules constitute concrete strengths; however, the reliance on dataset-specific d_eff values limits immediate generality.

major comments (3)

[Rate-distortion derivation] Rate-distortion section (and abstract claim of 'rigorous information-theoretic derivation'): rate-distortion supplies a lower bound on mutual information (bits) for ε-approximation, yet the manuscript does not exhibit the explicit mapping from that information rate to the number of spikes required under LIF membrane dynamics, reset, and the lateral-inhibition softmax network. Without a per-spike information-capacity calculation or a proof that the constructed circuit saturates the rate-distortion limit, the asserted tightness of Ω(L_f² nd/ε²) remains unclosed.
[Experiments and effective-dimension measurement] Effective-dimension calibration and validation experiments: d_eff values (47–89) and the constant C=2.3 (95 % CI [1.9, 2.7]) are obtained from the identical CIFAR/ImageNet data used for the R²=0.97 validation of the design rules. This data-dependent loop makes the explanatory claim that 'T=4 suffices' and the predicted spike counts circular with respect to the benchmarks on which they are tested.
[Universal approximation proof] Universal-approximation construction (§3 and lateral-inhibition network): the abstract asserts explicit spike-circuit constructions and a proven O(1/√T) convergence rate for the novel lateral-inhibition softmax, yet the manuscript provides neither the detailed construction equations nor the convergence proof steps sufficient to verify permutation-equivariance or the claimed approximation property for continuous functions.

minor comments (2)

[Abstract] The abstract reports R²=0.97 (p<0.001) but does not reference the specific table or supplementary figure that displays the per-model, per-task fits.
[Notation] Notation for L_f, n, d and T is introduced without a consolidated table of symbols; readers must hunt across sections to confirm definitions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. We address each major comment point by point below, providing the strongest honest defense of the manuscript while proposing targeted revisions to improve clarity and completeness where the concerns are valid.

read point-by-point responses

Referee: [Rate-distortion derivation] Rate-distortion section (and abstract claim of 'rigorous information-theoretic derivation'): rate-distortion supplies a lower bound on mutual information (bits) for ε-approximation, yet the manuscript does not exhibit the explicit mapping from that information rate to the number of spikes required under LIF membrane dynamics, reset, and the lateral-inhibition softmax network. Without a per-spike information-capacity calculation or a proof that the constructed circuit saturates the rate-distortion limit, the asserted tightness of Ω(L_f² nd/ε²) remains unclosed.

Authors: We appreciate the referee pointing out the need for greater explicitness in this derivation. The rate-distortion bound establishes a lower limit on mutual information I(X;Y) for ε-approximation of the target function. In the manuscript, this is connected to spikes via the observation that LIF neurons with reset encode information in binary spike trains whose rate is bounded by the membrane time constant. To close the gap, we will revise Section 4 to add an explicit lemma deriving the spike lower bound as Ω(I(X;Y)) where each spike contributes at most 1 bit of capacity (from the binary entropy of the spike train under low-rate Poisson-like statistics). We will also include a short argument that the lateral-inhibition circuit saturates this bound up to a small multiplicative constant by achieving near-optimal rate-distortion performance. These additions will be placed in the main text with supporting calculations in the appendix. revision: yes
Referee: [Experiments and effective-dimension measurement] Effective-dimension calibration and validation experiments: d_eff values (47–89) and the constant C=2.3 (95 % CI [1.9, 2.7]) are obtained from the identical CIFAR/ImageNet data used for the R²=0.97 validation of the design rules. This data-dependent loop makes the explanatory claim that 'T=4 suffices' and the predicted spike counts circular with respect to the benchmarks on which they are tested.

Authors: This observation correctly identifies a limitation in the current experimental design. The effective dimension d_eff is computed from the eigenvalue decay of the input covariance matrix on the training set, which is an intrinsic dataset property independent of the spiking model. The R² validation then checks whether the theoretical formula (using this fixed d_eff) predicts the actual spike counts observed during inference on the same benchmarks. While this demonstrates strong predictive accuracy on the evaluated data, it does not fully establish generality across arbitrary distributions. We will revise the manuscript to explicitly acknowledge this data-dependent aspect, add a limitations paragraph, and include d_eff measurements plus design-rule validation on at least one additional dataset (e.g., a language modeling benchmark) to support broader applicability. revision: partial
Referee: [Universal approximation proof] Universal-approximation construction (§3 and lateral-inhibition network): the abstract asserts explicit spike-circuit constructions and a proven O(1/√T) convergence rate for the novel lateral-inhibition softmax, yet the manuscript provides neither the detailed construction equations nor the convergence proof steps sufficient to verify permutation-equivariance or the claimed approximation property for continuous functions.

Authors: We acknowledge that the main text could have presented the constructions and proofs more accessibly. The explicit LIF-based spiking self-attention circuit, including the lateral-inhibition softmax with inhibition weights defined as w_ij = −α·δ_ij and membrane-potential dynamics, appears in Section 3.2. The universal-approximation theorem for continuous permutation-equivariant functions together with the O(1/√T) convergence proof (via concentration of averaged spike rates) is fully stated in Appendix A. In the revision we will (i) move the core circuit equations into the main body of Section 3 and (ii) insert a concise proof outline in the main text that highlights the key steps establishing permutation-equivariance and the convergence rate, while retaining the complete technical details in the appendix. revision: yes

Circularity Check

1 steps flagged

Effective dimension measured from CIFAR/ImageNet and calibrated C used to derive input-dependent bounds and design rules validated on identical benchmarks

specific steps

fitted input called prediction [Abstract]
"Our key insight is input-dependent bounds using measured effective dimensions (d_eff=47--89 for CIFAR/ImageNet), explaining why T=4 timesteps suffice despite worst-case T ≥ 10,000 predictions. We provide concrete design rules with calibrated constants (C=2.3, 95% CI: [1.9, 2.7]). Experiments on Spikformer, QKFormer, and SpikingResformer across vision and language benchmarks validate predictions with R²=0.97 (p<0.001)."

d_eff is measured directly from the CIFAR/ImageNet data used in experiments, and C is calibrated from the same empirical results. These data-dependent fitted values are inserted into the lower-bound formula and design rules to explain practical sufficiency of small T, then the resulting predictions are validated with R^2 on the identical benchmarks, so the explanatory and predictive claims reduce to quantities derived from the target data itself.

full rationale

The paper's central theoretical claims on universal approximation and explicit circuit constructions appear self-contained with explicit constructions and proven convergence rates. However, the spike-count lower bounds are made input-dependent via measured d_eff from the experimental datasets, and concrete design rules rely on a calibrated constant C fitted with CI from the same setup. These quantities then 'explain' why small T suffices and are validated with high R^2 on the same vision/language benchmarks, creating a fitted-input-called-prediction loop for the practical design rules and explanatory power. The rate-distortion application to LIF spike counts lacks an exhibited mapping in the provided text, but this is a correctness gap rather than a definitional reduction. No self-citation load-bearing or ansatz smuggling is evident from the abstract and claims.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claims rest on standard mathematical background for universal approximation and rate-distortion theory, plus domain assumptions that LIF dynamics plus lateral inhibition can realize softmax, and on data-derived quantities (effective dimension and calibrated C) rather than purely a priori derivations.

free parameters (2)

C = 2.3
Calibrated constant appearing in design rules, reported as 2.3 with 95% CI [1.9, 2.7]
d_eff = 47-89
Effective dimension measured on CIFAR and ImageNet to obtain input-dependent bounds

axioms (2)

domain assumption Rate-distortion theory supplies tight lower bounds on spike counts for ε-approximation of the target functions
Invoked to obtain the Ω(L_f² nd / ε²) expression
domain assumption LIF neurons combined with the proposed lateral inhibition circuit implement softmax normalization with O(1/√T) convergence
Central to the explicit construction for spiking attention

invented entities (1)

lateral inhibition network for softmax normalization no independent evidence
purpose: Provide an explicit spiking circuit realizing the normalization step in attention
Novel construction introduced to complete the universal-approximator proof

pith-pipeline@v0.9.0 · 5552 in / 1758 out tokens · 73043 ms · 2026-05-10T09:17:38.299916+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”NeurIPS 2017, pp. 5998–6008, 2017. [Online]. Available: https://proceedings.neurips. cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

2017
[2]

Loihi: a neuromorphic many- core processor with on-chip learning,

M. Davies, N. Srinivasa, T. Lin, G. N. Chinya, Y . Cao, S. H. Choday, G. D. Dimou, P. Joshi, N. Imam, S. Jain, Y . Liao, C. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y . Weng, A. Wild, Y . Yang, and H. Wang, “Loihi: A neuromorphic manycore processor with on-chip learning,” IEEE Micro, vol. 38, no. 1, pp. 82–99, 2...

work page doi:10.1109/mm.2018.112130359 2018
[3]

Akopyan, J

F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. V . Arthur, P. Merolla, N. Imam, Y . Y . Nakamura, P. Datta, G. Nam, B. Taba, M. P. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. L. Jackson, and D. S. Modha, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,”IEEE Trans. Comput. Aided Des. Inte...

work page doi:10.1109/tcad.2015.2474396 2015
[4]

Spikformer: When spiking neural network meets transformer,

Z. Zhou, Y . Zhu, C. He, Y . Wang, S. Yan, Y . Tian, and L. Yuan, “Spikformer: When spiking neural network meets transformer,” ICLR 2023, 2023. [Online]. Available: https://openreview.net/forum?id= frE4fUwz h

2023
[5]

Spike-driven transformer,

M. Yao, J. Hu, Z. Zhou, L. Yuan, Y . Tian, B. Xu, and G. Li, “Spike-driven transformer,”NeurIPS 2023, 2023. [Online]. Available: http://papers.nips.cc/paper files/paper/2023/hash/ ca0f5358dbadda74b3049711887e9ead-Abstract-Conference.html

2023
[6]

Qkformer: Hierarchical spiking transformer using Q-K attention,

C. Zhou, H. Zhang, Z. Zhou, L. Yu, L. Huang, X. Fan, L. Yuan, Z. Ma, H. Zhou, and Y . Tian, “Qkformer: Hierarchical spiking transformer using Q-K attention,”NeurIPS 2024, 2024. [Online]. Available: http://papers.nips.cc/paper files/paper/2024/hash/ 179f5dcdeedc149443ebd3ba70811dbd-Abstract-Conference.html

2024
[7]

Spike-driven transformer v2: Meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips,

M. Yao, J. Hu, T. Hu, Y . Xu, Z. Zhou, Y . Tian, B. Xu, and G. Li, “Spike-driven transformer v2: Meta spiking neural network architecture inspiring the design of next-generation neuromorphic chips,”ICLR 2024, 2024

2024
[8]

Are transformers universal approximators of sequence-to-sequence functions?

C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar, “Are transformers universal approximators of sequence-to-sequence functions?”ICLR 2020, 2020. [Online]. Available: https://openreview. net/forum?id=ByxRM0Ntvr

2020
[9]

Attention is turing-complete,

J. P ´erez, P. Barcel ´o, and J. Marinkovic, “Attention is turing-complete,” J. Mach. Learn. Res., vol. 22, pp. 75:1–75:35, 2021. [Online]. Available: https://jmlr.org/papers/v22/20-302.html

2021
[10]

Networks of spiking neurons: the third generation of neural network models.Neural Networks, 10(9):1659–1671, 1997

W. Maass, “Networks of spiking neurons: The third generation of neural network models,”Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997. [Online]. Available: https://doi.org/10.1016/S0893-6080(97)00011-7

work page doi:10.1016/s0893-6080(97)00011-7 1997
[11]

On the computational power of circuits of spiking neurons,

W. Maass and H. Markram, “On the computational power of circuits of spiking neurons,”J. Comput. Syst. Sci., vol. 69, no. 4, pp. 593–616,
[12]

Available: https://doi.org/10.1016/j.jcss.2004.04.001

[Online]. Available: https://doi.org/10.1016/j.jcss.2004.04.001

work page doi:10.1016/j.jcss.2004.04.001 2004
[13]

On the intrinsic structures of spiking neural networks,

S. Zhang, J. Chen, J. Wu, G. Zhang, H. Xiong, B. Gu, and Z. Zhou, “On the intrinsic structures of spiking neural networks,”J. Mach. Learn. Res., vol. 25, pp. 194:1–194:74, 2024. [Online]. Available: https://jmlr.org/papers/v25/23-1526.html

2024
[14]

Expressivity of spiking neural networks,

M. Singh, A. Fono, and G. Kutyniok, “Expressivity of spiking neural networks,”arXiv:2308.08218, vol. abs/2308.08218, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.08218

work page doi:10.48550/arxiv.2308.08218 2023
[15]

Rsnn: Recurrent spiking neural networks for dynamic spatial-temporal information processing,

Q. Xu, X. Fang, Y . Li, J. Shen, D. Ma, Y . Xu, and G. Pan, “Rsnn: Recurrent spiking neural networks for dynamic spatial-temporal information processing,”MM 2024, p. 10602–10610, 2024. [Online]. Available: https://doi.org/10.1145/3664647.3680573

work page doi:10.1145/3664647.3680573 2024
[16]

Efficient training of deep spiking neural networks using a modified learning rate scheduler,

S.-H. Cha and D.-S. Kim, “Efficient training of deep spiking neural networks using a modified learning rate scheduler,”Mathematics, vol. 13, no. 8, 2025. [Online]. Available: https://www.mdpi.com/ 2227-7390/13/8/1361

2025
[18]

Lost in the Middle: How Language Models Use Long Contexts

W. Merrill and A. Sabharwal, “The parallelism tradeoff: Limitations of log-precision transformers,”Trans. Assoc. Comput. Linguistics, vol. 11, pp. 531–545, 2023. [Online]. Available: https://doi.org/10.1162/tacl a 00562

work page internal anchor Pith review doi:10.1162/tacl 2023
[19]

Tighter bounds on the expressivity of transformer encoders,

D. Chiang, P. Cholak, and A. Pillay, “Tighter bounds on the expressivity of transformer encoders,”ICML 2023, vol. 202, pp. 5544–5562, 2023. [Online]. Available: https://proceedings.mlr.press/v202/chiang23a.html

2023
[20]

Dust3r: Geometric 3d vision made easy

X. Shi, Z. Hao, and Z. Yu, “Spikingresformer: Bridging resnet and vision transformer in spiking neural networks,”CVPR 2024, pp. 5610– 5619, 2024. [Online]. Available: https://doi.org/10.1109/CVPR52733. 2024.00536

work page doi:10.1109/cvpr52733 2024
[21]

Spikformer v2: Join the high accuracy club on imagenet with an snn ticket,

Z. Zhou, K. Che, W. Fang, K. Tian, Y . Zhu, S. Yan, Y . Tian, and L. Yuan, “Spikformer V2: join the high accuracy club on imagenet with an SNN ticket,”arXiv:2401.02020, vol. abs/2401.02020, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.02020

work page doi:10.48550/arxiv.2401.02020 2024
[22]

Spikedattention: Training-free and fully spike-driven transformer-to-snn conversion with winner-oriented spike shift for softmax operation,

S. Hwang, S. Lee, D. Park, D. Lee, and J. Kung, “Spikedattention: Training-free and fully spike-driven transformer-to-snn conversion with winner-oriented spike shift for softmax operation,”NeurIPS 2024, vol. 37, 2024

2024
[23]

Longllada: Unlocking long context capabilities in diffusion llms

M. Bal and A. Sengupta, “Spikingbert: Distilling BERT to train spiking language models using implicit differentiation,”AAAI 2024, pp. 10 998–11 006, 2024. [Online]. Available: https://doi.org/10.1609/aaai. v38i10.28975

work page doi:10.1609/aaai 2024
[24]

Spikegpt: Generative pre-trained language model with spiking neural networks,

R. Zhu, Q. Zhao, G. Li, and J. Eshraghian, “Spikegpt: Generative pre-trained language model with spiking neural networks,”Trans. Mach. Learn. Res., vol. 2024, 2024. [Online]. Available: https: //openreview.net/forum?id=gcf1anBL9e

2024
[25]

Big bird: Transformers for longer sequences,

M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Onta ˜n´on, P. Pham, A. Ravula, Q. Wang, L. Yang, and A. Ahmed, “Big bird: Transformers for longer sequences,”NeurIPS 2020,

2020
[26]

Available: https://proceedings.neurips.cc/paper/2020/ hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html

[Online]. Available: https://proceedings.neurips.cc/paper/2020/ hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html

2020
[27]

T. M. Cover and J. A. Thomas,Elements of infor- mation theory (2. ed.). Wiley, 2006. [Online]. Available: http://www.elementsofinformationtheory.com/

2006
[28]

Uniform constant- depth threshold circuits for division and iterated multiplication,

W. Hesse, E. Allender, and D. A. M. Barrington, “Uniform constant- depth threshold circuits for division and iterated multiplication,”J. Comput. Syst. Sci., vol. 65, no. 4, pp. 695–716, 2002. [Online]. Available: https://doi.org/10.1016/S0022-0000(02)00025-9

work page doi:10.1016/s0022-0000(02)00025-9 2002
[29]

Spikingjelly: An open-source machine learning infrastructure platform for spike-based intelligence,

W. Fang, Y . Chen, J. Ding, Z. Yu, T. Masquelier, D. Chen, L. Huang, H. Zhou, G. Li, and Y . Tian, “Spikingjelly: An open-source machine learning infrastructure platform for spike-based intelligence,”Science Advances, vol. 9, 2023

2023
[30]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” ICLR 2019, 2019. [Online]. Available: https://openreview.net/forum?id= Bkg6RiCqY7

2019
[31]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,”ICLR 2021, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

2021
[32]

Universal approximation theorems of fully connected binarized neural networks,

M. Yayla, M. G ¨unzel, B. Ramosaj, and J. Chen, “Universal approximation theorems of fully connected binarized neural networks,” arXiv:2102.02631, vol. abs/2102.02631, 2021. [Online]. Available: https://arxiv.org/abs/2102.02631

work page arXiv 2021
[33]

On the universal approximability and complexity bounds of quantized relu neural networks,

Y . Ding, J. Liu, J. Xiong, and Y . Shi, “On the universal approximability and complexity bounds of quantized relu neural networks,”ICLR 2019,

2019
[34]

Available: https://openreview.net/forum?id=SJe9rh0cFX

[Online]. Available: https://openreview.net/forum?id=SJe9rh0cFX