pith. sign in

arxiv: 2403.16552 · v2 · pith:WXDS3T2Vnew · submitted 2024-03-25 · 💻 cs.NE · cs.AI· cs.CV

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Pith reviewed 2026-05-24 03:42 UTC · model grok-4.3

classification 💻 cs.NE cs.AIcs.CV
keywords spiking neural networkstransformerQ-K attentionImageNet classificationhierarchical architecturedirect trainingenergy efficient models
0
0 comments X

The pith

QKFormer reaches 85.65 percent top-1 accuracy on ImageNet-1K using a hierarchical spiking transformer with Q-K attention.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents QKFormer, which combines spiking neural networks with transformer architectures using a new Q-K attention. This attention uses binary spike vectors to model token and channel importance with linear complexity. The model also uses a hierarchical structure for multi-scale features and a special patch embedding with deformed shortcut. With these changes, it achieves 85.65 percent top-1 accuracy on ImageNet-1K using 64.96 million parameters, more than 10 points above the previous best direct-trained spiking model of similar size. A sympathetic reader would care because spiking networks promise lower energy use while now reaching high accuracy on large-scale image tasks.

Core claim

We introduce a spike-form Q-K attention mechanism tailored for SNNs that efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. We incorporate the hierarchical structure into spiking transformers to obtain multi-scale spiking representation and design a versatile patch embedding module with a deformed shortcut. Together these form QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training, which achieves 85.65 percent top-1 accuracy on ImageNet-1k with 64.96 million parameters, outperforming Spikformer by 10.84 percent and marking the first time directly trained SNNs exceed 85 percent on ImageNet-1K.

What carries the argument

The spike-form Q-K attention mechanism, which models importance of token or channel dimensions through binary vectors with linear complexity.

If this is right

  • QKFormer outperforms existing state-of-the-art SNN models on various mainstream datasets.
  • The hierarchical structure provides multi-scale spiking representations that improve performance.
  • The deformed shortcut in the patch embedding module supports better performance in spiking transformers.
  • Direct training of SNNs can now exceed 85 percent top-1 accuracy on ImageNet-1K.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The linear-complexity Q-K attention could be tested in non-spiking transformers to check whether binary importance vectors reduce compute without accuracy loss.
  • Running QKFormer on actual neuromorphic chips would test whether the reported accuracy gains produce measurable energy savings compared with standard transformers.
  • The same hierarchical spiking design with Q-K attention could be applied to video or detection tasks to check whether multi-scale spiking features transfer.

Load-bearing premise

The large accuracy gains are caused by the Q-K attention, hierarchical design, and deformed shortcut rather than differences in training schedule, data augmentation, optimizer settings, or other experimental details.

What would settle it

Re-training the same QKFormer architecture without the Q-K attention or hierarchy and still reaching 85 percent or higher on ImageNet-1K would show the gains do not depend on these elements.

Figures

Figures reproduced from arXiv: 2403.16552 by Chenlin Zhou, Han Zhang, Huihui Zhou, Liutao Yu, Liwei Huang, Li Yuan, Xiaopeng Fan, Yonghong Tian, Zhaokun Zhou, Zhengyu Ma.

Figure 1
Figure 1. Figure 1: Illustration of Q-K attention with the two versions of Q-K token attention (QKTA) and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overview of QKFormer, a hierarchi￾cal spiking transformer with Q-K attention. Overall Hierarchical Architecture. The overview of QKFormer is presented in Fig￾ure 2. The input form can be formulated as (T0×H ×W ×n). In static RGB image datasets, T0 = 1 and n = 3. In temporal neuromorphic datasets, the input T0 = T, while n = 2. In our implementation, we use a patch size of 4 × 4 and thus the input featu… view at source ↗
Figure 3
Figure 3. Figure 3: The visualization and memory consumption of QKTA. (a) is the visualization of Q-K token [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) shows the variance and expectation of SSA, (b) shows the variance and expectation of [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Spiking Patch Splitting (SPS) module in Spikformer. (b) Spiking Patch Embedding with [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Training loss, test loss, top-1 and top-5 test accuracy of QKFormer on ImageNet-1K. The [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces QKFormer, a hierarchical spiking transformer that incorporates a novel spike-form Q-K attention mechanism (binary vectors with linear complexity), a hierarchical structure for multi-scale spiking representations, and a deformed shortcut patch embedding. It claims state-of-the-art results on multiple datasets, with the headline result being 85.65% top-1 accuracy on ImageNet-1K using 64.96M parameters, outperforming Spikformer (66.34M parameters, 74.81% accuracy) by 10.84 percentage points and marking the first time directly trained SNNs exceed 85% on this dataset. The code and models are released publicly.

Significance. If the accuracy gains hold under matched training conditions, the work would represent a notable advance for spiking transformers by showing that architectural changes can push directly trained SNNs into a new performance regime on ImageNet-1K. The public code release is a clear strength that enables direct verification and future extensions.

major comments (2)
  1. [Abstract and experimental results section] Abstract and experimental results section: the central claim attributes the 10.84 pp gain (85.65% vs. 74.81%) to the Q-K attention, hierarchy, and deformed shortcut, yet no table or subsection confirms that timestep count, surrogate-gradient function, data augmentation, optimizer schedule, and other hyperparameters are identical to the Spikformer baseline; without this isolation the attribution to the proposed mechanisms remains unverified.
  2. [Section describing the Q-K attention mechanism] Section describing the Q-K attention mechanism: the statement that the mechanism 'efficiently models the importance of token or channel dimensions through binary vectors with linear complexity' requires an explicit derivation or complexity analysis showing how the binary spike-form vectors are produced and propagated while preserving the claimed linear scaling; this is load-bearing for both the performance and energy-efficiency assertions.
minor comments (1)
  1. [Abstract] The abstract contains a minor grammatical issue: 'directly training SNNs have exceeded' should read 'directly trained SNNs have exceeded'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our results and methods.

read point-by-point responses
  1. Referee: [Abstract and experimental results section] Abstract and experimental results section: the central claim attributes the 10.84 pp gain (85.65% vs. 74.81%) to the Q-K attention, hierarchy, and deformed shortcut, yet no table or subsection confirms that timestep count, surrogate-gradient function, data augmentation, optimizer schedule, and other hyperparameters are identical to the Spikformer baseline; without this isolation the attribution to the proposed mechanisms remains unverified.

    Authors: We agree that explicit confirmation of matched training conditions is necessary for a fair comparison. In the revised manuscript we will add a dedicated table (and accompanying text in Section 4) listing all key hyperparameters for QKFormer alongside those reported for Spikformer (timesteps T=4, arctan surrogate gradient, identical data augmentation pipeline, AdamW optimizer with the same learning-rate schedule and weight decay, etc.). These settings were taken directly from the Spikformer paper and codebase to ensure the performance difference can be attributed to the architectural innovations. revision: yes

  2. Referee: [Section describing the Q-K attention mechanism] Section describing the Q-K attention mechanism: the statement that the mechanism 'efficiently models the importance of token or channel dimensions through binary vectors with linear complexity' requires an explicit derivation or complexity analysis showing how the binary spike-form vectors are produced and propagated while preserving the claimed linear scaling; this is load-bearing for both the performance and energy-efficiency assertions.

    Authors: We acknowledge that the current description would benefit from a formal derivation. In the revised manuscript we will expand the Q-K attention subsection with a step-by-step derivation: (1) generation of binary spike vectors Q_s and K_s via the spiking neuron, (2) the attention computation reducing to an element-wise product and summation that counts matching spikes, and (3) the resulting per-layer complexity of O(N) for sequence length N (versus O(N^2) for standard softmax attention). We will also include a small complexity table comparing FLOPs and spike operations. revision: yes

Circularity Check

0 steps flagged

No circularity: central claim is empirical accuracy on public benchmark

full rationale

The paper presents architectural innovations (Q-K attention, hierarchical design, deformed shortcut) and reports an empirical top-1 accuracy of 85.65% on ImageNet-1k. No derivation chain exists that reduces predictions or uniqueness claims to fitted parameters, self-citations, or ansatzes by construction. The accuracy number is obtained from direct training experiments on a standard public dataset and does not equate to any input by the paper's own equations. Self-citations, if present, are not load-bearing for the performance claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The performance claim rests on the effectiveness of three newly introduced architectural components whose benefits are shown only through the reported experiments; the model contains numerous tunable design choices typical of deep networks.

free parameters (1)
  • Architecture and training hyperparameters
    Choices such as number of layers, embedding dimensions, learning rate schedules, and augmentation parameters are selected to achieve the stated accuracy.
axioms (1)
  • domain assumption Gradient-based optimization converges effectively when applied directly to the spiking network loss.
    Direct training of SNNs assumes backpropagation through the non-differentiable spike function yields useful gradients.
invented entities (2)
  • Spike-form Q-K attention no independent evidence
    purpose: Models token or channel importance via binary vectors with linear complexity inside an SNN.
    New mechanism introduced by the authors; effectiveness shown only in the paper's results.
  • Deformed shortcut patch embedding no independent evidence
    purpose: Improves input representation specifically for spiking transformers.
    Novel design element proposed for this architecture.

pith-pipeline@v0.9.0 · 5834 in / 1416 out tokens · 57955 ms · 2026-05-24T03:42:56.045120+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Temporal-Aware Spiking Transformer Hashing Based on 3D-DWT

    cs.CV 2025-01 unverdicted novelty 7.0

    Spikinghash combines 3D-DWT Spiking WaveMixer, Spiking Self-Attention, and a dynamic soft similarity loss to produce energy-efficient hash codes for DVS data retrieval.

  2. Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation

    cs.CV 2026-04 unverdicted novelty 3.0

    RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    Networks of spiking neurons: the third generation of neural network models

    Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural networks, 10(9):1659–1671, 1997

  2. [2]

    Towards spike-based machine intelligence with neuromorphic computing

    Kaushik Roy, Akhilesh Jaiswal, and Priyadarshini Panda. Towards spike-based machine intelligence with neuromorphic computing. Nature, 575(7784):607–617, 2019

  3. [3]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the Interna- tional Conference on Neural Information Processing Systems (NeurIPS), volume 30, 2017. 10

  4. [4]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representa- tions (ICLR), 2020

  5. [5]

    Tokens-to-token vit: Training vision transformers from scratch on imagenet

    Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zi-Hang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 558–567, 2021

  6. [6]

    End-to-end object detection with transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), pages 213–229. Springer, 2020

  7. [7]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020

  8. [8]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021

  9. [9]

    Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

    Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 568–578, 2021

  10. [10]

    V olo: Vision outlooker for visual recognition

    Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, and Shuicheng Yan. V olo: Vision outlooker for visual recognition. arXiv preprint arXiv:2106.13112, 2021

  11. [11]

    Spikformer: When spiking neural network meets transformer

    Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng YAN, Yonghong Tian, and Li Yuan. Spikformer: When spiking neural network meets transformer. In The Eleventh International Conference on Learning Representations, 2023

  12. [12]

    Spikingformer: Spike-driven residual learning for transformer-based spiking neural network, 2023

    Chenlin Zhou, Liutao Yu, Zhaokun Zhou, Han Zhang, Zhengyu Ma, Huihui Zhou, and Yonghong Tian. Spikingformer: Spike-driven residual learning for transformer-based spiking neural network, 2023

  13. [13]

    Spike- driven transformer, 2023

    Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, and Guoqi Li. Spike- driven transformer, 2023

  14. [14]

    Enhancing the performance of transformer-based spiking neural networks by improved downsampling with precise gradient backpropagation, 2023

    Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Zhengyu Ma, Huihui Zhou, Xiaopeng Fan, and Yonghong Tian. Enhancing the performance of transformer-based spiking neural networks by improved downsampling with precise gradient backpropagation, 2023

  15. [15]

    Spatial- temporal self-attention for asynchronous spiking neural networks

    Yuchen Wang, Kexin Shi, Chengzhuo Lu, Yuguo Liu, Malu Zhang, and Hong Qu. Spatial- temporal self-attention for asynchronous spiking neural networks. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 3085–3093. International Joint Conferences on Artificial Intelligence Organ...

  16. [16]

    Segregation, integration, and balance of large-scale resting brain networks configure different cognitive abilities

    Rong Wang, Mianxin Liu, Xinhong Cheng, Ying Wu, Andrea Hildebrandt, and Changsong Zhou. Segregation, integration, and balance of large-scale resting brain networks configure different cognitive abilities. Proceedings of the National Academy of Sciences, 118(23):e2022288118, 2021

  17. [17]

    Spiking deep convolutional neural networks for energy-efficient object recognition

    Yongqiang Cao, Yang Chen, and Deepak Khosla. Spiking deep convolutional neural networks for energy-efficient object recognition. International Journal of Computer Vision, 113(1):54–66, 2015

  18. [18]

    Spiking Deep Networks with LIF Neurons

    Eric Hunsberger and Chris Eliasmith. Spiking deep networks with lif neurons. arXiv preprint arXiv:1510.08829, 2015

  19. [19]

    Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks

    Tong Bu, Wei Fang, Jianhao Ding, PengLin Dai, Zhaofei Yu, and Tiejun Huang. Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks. InInternational Conference on Learning Representations (ICLR), 2021

  20. [20]

    A free lunch from ann: Towards efficient, accurate spiking neural networks calibration

    Yuhang Li, Shi-Wee Deng, Xin Dong, Ruihao Gong, and Shi Gu. A free lunch from ann: Towards efficient, accurate spiking neural networks calibration. ArXiv, abs/2106.06984, 2021. 11

  21. [21]

    Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network

    Bing Han, Gopalakrishnan Srinivasan, and Kaushik Roy. Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13558–13567, 2020

  22. [22]

    Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks

    Tong Bu, Wei Fang, Jianhao Ding, PengLin Dai, Zhaofei Yu, and Tiejun Huang. Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. arXiv preprint arXiv:2303.04347, 2023

  23. [23]

    Masked spiking transformer

    Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang, and Renjing Xu. Masked spiking transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1761–1771, 2023

  24. [24]

    Spatio-temporal backpropagation for training high-performance spiking neural networks

    Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, and Luping Shi. Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in neuroscience, 12:331, 2018

  25. [25]

    Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks

    Emre O Neftci, Hesham Mostafa, and Friedemann Zenke. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019

  26. [26]

    Training feedback spiking neural networks by implicit differentiation on the equilibrium state

    Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Yisen Wang, and Zhouchen Lin. Training feedback spiking neural networks by implicit differentiation on the equilibrium state. In Pro- ceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 34, pages 14516–14528, 2021

  27. [27]

    Slayer: Spike layer error reassignment in time

    Sumit B Shrestha and Garrick Orchard. Slayer: Spike layer error reassignment in time. In Pro- ceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 31, 2018

  28. [28]

    Deep Residual Learning in Spiking Neural Networks

    Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, and Yonghong Tian. Deep Residual Learning in Spiking Neural Networks. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 34, pages 21056– 21069, 2021

  29. [29]

    Event-driven spiking convolutional neural network, June 16 2022

    Ole Juri Richter, QIAO Ning, Qian Liu, and Sadique Ul Ameen Sheik. Event-driven spiking convolutional neural network, June 16 2022. US Patent App. 17/601,939

  30. [30]

    Towards artificial general intelligence with hybrid tianjic chip architecture

    Jing Pei, Lei Deng, Sen Song, Mingguo Zhao, Youhui Zhang, Shuang Wu, Guanrui Wang, Zhe Zou, Zhenzhi Wu, Wei He, et al. Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106–111, 2019

  31. [31]

    Advancing residual learning towards powerful deep spiking neural networks

    Yifan Hu, Yujie Wu, Lei Deng, and Guoqi Li. Advancing residual learning towards powerful deep spiking neural networks. arXiv preprint arXiv:2112.08954, 2021

  32. [32]

    Training data-efficient image transformers & distillation through attention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021

  33. [33]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

  34. [34]

    Randaugment: Practical automated data augmentation with a reduced search space

    Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020

  35. [35]

    Random erasing data augmentation

    Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 13001–13008, 2020

  36. [36]

    Deep networks with stochastic depth

    Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 646–661. Springer, 2016

  37. [37]

    Direct training high-performance deep spiking neural networks: A review of theories and methods

    Chenlin Zhou, Han Zhang, Liutao Yu, Yumin Ye, Zhaokun Zhou, Liwei Huang, Zhengyu Ma, Xiaopeng Fan, Huihui Zhou, and Yonghong Tian. Direct training high-performance deep spiking neural networks: A review of theories and methods. arXiv preprint arXiv:2405.04289, 2024. 12

  38. [38]

    Incorporating learnable membrane time constant to enhance learning of spiking neural networks

    Wei Fang, Zhaofei Yu, Yanqi Chen, Timothée Masquelier, Tiejun Huang, and Yonghong Tian. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2661–2671, 2021

  39. [39]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009

  40. [40]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009

  41. [41]

    Cifar10-dvs: an event- stream dataset for object classification

    Hongmin Li, Hanchao Liu, Xiangyang Ji, Guoqi Li, and Luping Shi. Cifar10-dvs: an event- stream dataset for object classification. Frontiers in neuroscience, 11:309, 2017

  42. [42]

    A low power, fully event-based gesture recognition system

    Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jeffrey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, Jeff Kusnitz, Michael Debole, Steve Esser, Tobi Delbruck, Myron Flickner, and Dharmendra Modha. A low power, fully event-based gesture recognition system. In Proceedings of the IEEE/CVF Conferenc...

  43. [43]

    Going Deeper With Directly- Trained Larger Spiking Neural Networks

    Hanle Zheng, Yujie Wu, Lei Deng, Yifan Hu, and Guoqi Li. Going Deeper With Directly- Trained Larger Spiking Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 11062–11070, 2021

  44. [44]

    Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting

    Shikuang Deng, Yuhang Li, Shanghang Zhang, and Shi Gu. Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting. In International Conference on Learning Representations (ICLR), 2021

  45. [45]

    Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks

    Xiaohan Ding, Yuchen Guo, Guiguang Ding, and Jungong Han. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1911–1920, 2019

  46. [46]

    Repvgg: Making vgg-style convnets great again

    Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13733–13742, 2021

  47. [47]

    Spiking deep residual networks

    Yangfan Hu, Huajin Tang, and Gang Pan. Spiking deep residual networks. IEEE Transactions on Neural Networks and Learning Systems, 2021

  48. [48]

    Training full spike neural networks via auxiliary accumulation pathway

    Guangyao Chen, Peixi Peng, Guoqi Li, and Yonghong Tian. Training full spike neural networks via auxiliary accumulation pathway. arXiv preprint arXiv:2301.11929, 2023

  49. [49]

    Hire-snn: Harnessing the inherent robustness of energy-efficient deep spiking neural networks by training with crafted input noise

    Souvik Kundu, Massoud Pedram, and Peter A Beerel. Hire-snn: Harnessing the inherent robustness of energy-efficient deep spiking neural networks by training with crafted input noise. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5209–5218, 2021

  50. [50]

    1.1 computing’s energy problem (and what we can do about it)

    Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14. IEEE, 2014

  51. [51]

    Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization

    Priyadarshini Panda, Sai Aparna Aketi, and Kaushik Roy. Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Frontiers in Neuroscience, 14:653, 2020

  52. [52]

    Attention spiking neural networks

    Man Yao, Guangshe Zhao, Hengyu Zhang, Yifan Hu, Lei Deng, Yonghong Tian, Bo Xu, and Guoqi Li. Attention spiking neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  53. [53]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), volume 32, 2019

  54. [54]

    Spikingjelly: An open-source machine learning infrastructure platform for spike-based intelligence

    Wei Fang, Yanqi Chen, Jianhao Ding, Zhaofei Yu, Timothée Masquelier, Ding Chen, Liwei Huang, Huihui Zhou, Guoqi Li, and Yonghong Tian. Spikingjelly: An open-source machine learning infrastructure platform for spike-based intelligence. Science Advances, 9(40):eadi1480, 2023

  55. [55]

    Pytorch image models

    Ross Wightman. Pytorch image models. https://github.com/rwightman/ pytorch-image-models, 2019. 13 6 Appendix 6.1 Spiking Neuron Model Spiking neuron is the fundamental unit of SNNs, we choose the Leaky Integrate-and-Fire (LIF) model as the spiking neuron in our work. The dynamics of a LIF neuron can be formulated as follows: H[t] = V [t − 1] + 1 τ (X[t] −...