arxiv: 2604.25570 · v1 · submitted 2026-04-28 · 💻 cs.CV

Recognition: unknown

Vision SmolMamba: Spike-Guided Token Pruning for Energy-Efficient Spiking State-Space Vision Models

Dewei Bai , Hongxiang Peng , Yunyun Zeng , Ziyu Zhang , Hong Qu , Yi Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords spiking neural networkstoken pruningstate space modelsenergy efficient visionevent-based visionneuromorphic computingspiking transformers

0 comments

The pith

Vision SmolMamba uses spike strength and latency to prune tokens in a state-space spiking model, cutting energy use by at least 1.5 times versus prior spiking transformers while holding or improving accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a spiking vision architecture that replaces quadratic self-attention with linear selective state-space recurrence and adds a pruning step guided by spike signals. The pruning step measures each token's importance through its spike activation strength and the timing of its first spike, then progressively discards the least important tokens across layers. This combination is tested on both ordinary image datasets and event-camera recordings, where it delivers the same or higher classification accuracy at substantially lower estimated energy. A sympathetic reader would care because spiking networks are already sparse and event-driven; removing the quadratic bottleneck while keeping the sparsity should make them practical for larger images or longer video sequences without custom hardware. The results are presented as evidence that spike-guided sparsity and state-space modeling together form a scalable route for energy-efficient visual computation.

Core claim

The central claim is that a Spike-Guided Spatio-Temporal Token Pruner (SST-TP) can be fused with bidirectional spiking state-space recurrence inside SmolMamba blocks to produce a vision backbone that performs long-range modeling in linear time, progressively removes redundant tokens on the basis of spike activity, and thereby achieves at least 1.5 times lower estimated energy cost than both spiking Transformer baselines and an earlier Spiking Mamba variant on ImageNet-1K, CIFAR, CIFAR10-DVS, and DVS128 Gesture while preserving competitive or better accuracy.

What carries the argument

The Spike-Guided Spatio-Temporal Token Pruner (SST-TP), which scores token importance from spike activation strength and first-spike latency and removes the lowest-scoring tokens layer by layer before feeding the survivors into spiking bidirectional state-space recurrence.

If this is right

The architecture scales to higher-resolution inputs or longer temporal sequences because token count is reduced while computation per token stays linear.
Spiking state-space models can now exploit the same token-sparsity benefits previously limited to attention-based spiking transformers.
Energy estimates on both static-image and event-based benchmarks improve by at least 1.5 times relative to prior spiking attention and Spiking Mamba baselines at matched accuracy.
The same spike-guided pruning rule can be applied inside other recurrent or state-space spiking blocks without changing the underlying spike-driven dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the pruning rule generalizes across datasets, it could be used to adaptively allocate compute in real-time neuromorphic vision pipelines where input sparsity varies frame to frame.
The linear-time recurrence plus early token removal may allow spiking models to run on low-power edge devices that currently cannot sustain full-resolution transformer attention.
Combining the reported energy numbers with known neuromorphic hardware characteristics would let one estimate end-to-end latency and power for specific chips.

Load-bearing premise

Spike activation strength and first-spike latency are sufficient to identify which tokens can be safely discarded without removing information needed for correct final classification on both static images and event streams.

What would settle it

An experiment on ImageNet-1K or CIFAR10-DVS in which the pruned SmolMamba model shows an accuracy drop larger than the unpruned version or where the measured energy reduction falls below 1.5 times the cost of the strongest spiking Transformer baseline.

Figures

Figures reproduced from arXiv: 2604.25570 by Dewei Bai, Hong Qu, Hongxiang Peng, Yi Zhang, Yunyun Zeng, Ziyu Zhang.

**Figure 1.** Figure 1: The estimated energy–accuracy landscape of ANN and view at source ↗

**Figure 2.** Figure 2: Overview of Vision SmolMamba. (a) The overall architecture: spike-form visual patches generated with SPS are view at source ↗

**Figure 3.** Figure 3: Illustration of the proposed Spike-Guided Spatio view at source ↗

**Figure 4.** Figure 4: Visualization of token pruning results in ImageNet-1k. view at source ↗

**Figure 5.** Figure 5: Train loss and kept tokens of Vision SmolMamba-2-256 on Cifar-10. (a) Without Z-score normalization, pruning stays view at source ↗

**Figure 6.** Figure 6: Train loss and kept tokens of Vision SmolMamba-8-512 on ImageNet-1K. (a) Without Z-score normalization, pruning view at source ↗

read the original abstract

Spiking Transformers have shown strong potential for long-range visual modeling through spike-driven self-attention. However, their quadratic token interactions remain fundamentally misaligned with the sparse and event-driven nature of spiking neural computation. To address this limitation, we propose Vision SmolMamba, an energy-efficient spiking state-space architecture that integrates spike-driven dynamics with linear-time selective recurrence. The key idea is a Spike-Guided Spatio-Temporal Token Pruner (SST-TP), which estimates token importance using both spike activation strength and first-spike latency. This mechanism progressively removes redundant tokens while preserving salient spatio-temporal information, enabling efficient scaling with token sparsity. Based on this mechanism, the proposed SmolMamba block incorporates spike events directly into bidirectional state-space recurrence, forming a spiking state-space vision backbone for efficient long-range modeling. Extensive experiments on both static and event-based benchmarks, including ImageNet-1K, CIFAR10/100, CIFAR10-DVS, and DVS128 Gesture, demonstrate that Vision SmolMamba consistently achieves superior accuracy-efficiency trade-offs. In particular, it reduces the estimated energy cost by at least 1.5x compared with prior spiking Transformer baselines and a Spiking Mamba variant while maintaining competitive or improved accuracy. These results demonstrate that combining spike-guided token sparsity with state-space modeling offers a scalable and energy-efficient paradigm for spiking vision systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds spike-guided token pruning to a bidirectional spiking state-space backbone and claims 1.5x lower estimated energy on vision benchmarks while holding accuracy, but the gains rest on unverified proxies and estimated costs.

read the letter

The main point to take away is that Vision SmolMamba adds a spike-guided token pruner to a spiking version of state-space models and reports lower energy use on vision benchmarks without much accuracy loss. The new piece is the SST-TP module, which scores tokens by spike activation strength plus first-spike timing and then prunes progressively. This gets plugged into bidirectional spiking state-space blocks that replace the usual attention. The abstract positions it as better aligned with sparse spiking computation than quadratic transformers, and the combination looks like it hasn't been tried before in the cited work. They run tests on ImageNet-1K, CIFAR sets, and event datasets like CIFAR10-DVS and DVS128 Gesture. The results show the model keeps or improves accuracy while claiming at least 1.5x energy reduction versus spiking transformer baselines and a Spiking Mamba variant. Testing across static and dynamic data is a reasonable choice for this area. The soft spots are mostly around the evidence. Energy costs are estimated rather than measured on actual hardware, and the abstract gives no error bars or run-to-run variation. That leaves the size of the gain a bit uncertain. The pruning logic depends on spike strength and latency being reliable indicators of token importance. If those metrics miss critical features on timing-sensitive tasks, the accuracy would suffer, but the paper doesn't appear to include targeted checks like information preservation metrics or failure case analysis beyond the end accuracy numbers. This kind of work is for people focused on efficient neuromorphic vision models. Someone looking for practical ways to scale spiking networks with linear complexity would find the architecture description useful. It is solid enough in its core idea and empirical claims to go to peer review, even if the details need tightening. I would send it to referees and ask them to look closely at the energy estimation method and the pruning ablations.

Referee Report

2 major / 0 minor

Summary. The paper proposes Vision SmolMamba, a spiking state-space vision architecture that integrates spike-driven dynamics with linear-time selective recurrence via a novel Spike-Guided Spatio-Temporal Token Pruner (SST-TP). SST-TP estimates token importance from spike activation strength and first-spike latency to progressively prune redundant tokens while preserving salient spatio-temporal features. The SmolMamba block embeds spike events into bidirectional state-space recurrence. Experiments on ImageNet-1K, CIFAR10/100, CIFAR10-DVS, and DVS128 Gesture show superior accuracy-efficiency trade-offs, including at least 1.5x lower estimated energy cost versus spiking Transformer baselines and a Spiking Mamba variant, with competitive or improved accuracy.

Significance. If the central claims hold, the work offers a scalable route to energy-efficient spiking vision models by replacing quadratic token interactions with state-space recurrence augmented by spike-guided sparsity. This addresses a key misalignment between spiking computation and Transformer-style attention, with empirical support across both static and event-based vision benchmarks. The approach could inform low-power neuromorphic deployments, though its impact depends on the robustness of the pruning mechanism and energy estimates.

major comments (2)

[Abstract and Experiments] Abstract and Experiments: The reported energy reductions are labeled 'estimated' with no description of the estimation method, hardware model, power model parameters, or inclusion of error bars/statistical tests. This is load-bearing for the headline 1.5x efficiency claim and the accuracy-efficiency trade-off.
[SST-TP description] SST-TP description: Token pruning relies on spike activation strength and first-spike latency as proxies for importance, yet no independent verification (e.g., information-preservation metrics, reconstruction error, or targeted ablations on critical features) is provided to confirm that task-relevant spatio-temporal content is retained. This assumption is load-bearing because accuracy maintenance on event-based data (CIFAR10-DVS, DVS128 Gesture) depends on it; mis-pruning could explain the reported trade-off without true efficiency gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity and validation that we will address through revisions. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [Abstract and Experiments] The reported energy reductions are labeled 'estimated' with no description of the estimation method, hardware model, power model parameters, or inclusion of error bars/statistical tests. This is load-bearing for the headline 1.5x efficiency claim and the accuracy-efficiency trade-off.

Authors: We agree that explicit documentation of the energy estimation procedure is necessary to substantiate the efficiency claims. In the revised manuscript we will add a dedicated subsection in the Experiments section that fully describes the energy model. This will include: (i) the operation-counting methodology (spike events and state updates in the SmolMamba blocks), (ii) the assumed hardware platform and per-operation energy costs drawn from established neuromorphic literature, (iii) the precise power-model parameters, and (iv) error bars computed as standard deviations over multiple independent training runs. Where direct comparisons are presented we will also include statistical significance tests (paired t-tests) to quantify the reliability of the reported 1.5x gains. revision: yes
Referee: [SST-TP description] Token pruning relies on spike activation strength and first-spike latency as proxies for importance, yet no independent verification (e.g., information-preservation metrics, reconstruction error, or targeted ablations on critical features) is provided to confirm that task-relevant spatio-temporal content is retained. This assumption is load-bearing because accuracy maintenance on event-based data (CIFAR10-DVS, DVS128 Gesture) depends on it; mis-pruning could explain the reported trade-off without true efficiency gains.

Authors: We acknowledge that additional direct validation of the SST-TP pruning criterion would strengthen the paper. Although the maintained accuracy on event-based benchmarks provides supporting evidence that salient features are retained, we will incorporate new targeted experiments in the revision. These will comprise: (i) ablations contrasting SST-TP against random pruning and against single-proxy variants (activation strength only or latency only), (ii) a quantitative information-preservation metric that measures the fraction of high-activation spikes retained after pruning, and (iii) reconstruction-error analysis on a held-out subset of CIFAR10-DVS and DVS128 Gesture using a lightweight decoder trained to reconstruct the original spike sequences from the pruned token set. The results will be reported alongside the existing accuracy-efficiency curves. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical benchmarks

full rationale

The paper presents an architectural proposal (SST-TP token pruning guided by spike strength and latency, integrated into a spiking state-space block) whose performance claims are supported by direct experiments on external benchmarks (ImageNet-1K, CIFAR10/100, CIFAR10-DVS, DVS128 Gesture). No equations, fitted parameters, or self-citations are shown to reduce the reported accuracy-efficiency trade-off to the inputs by construction. The 1.5x energy reduction is an observed experimental outcome rather than a definitional or fitted prediction. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The architecture introduces two new components whose internal parameters and neuron models are not enumerated here.

free parameters (2)

pruning threshold schedule
Controls how aggressively tokens are removed at each stage; must be chosen or tuned.
state-space recurrence parameters
Selective state update coefficients in the SmolMamba block.

axioms (1)

domain assumption Spiking neurons follow standard integrate-and-fire or similar dynamics
Implicit in all spiking vision work; not re-derived.

invented entities (2)

Spike-Guided Spatio-Temporal Token Pruner (SST-TP) no independent evidence
purpose: Estimates token importance from spike strength and latency to enable progressive pruning
Core novel mechanism introduced to reconcile quadratic attention with sparse spikes.
SmolMamba block no independent evidence
purpose: Bidirectional state-space recurrence that ingests spike events directly
New architectural unit that replaces self-attention in the spiking backbone.

pith-pipeline@v0.9.0 · 5565 in / 1409 out tokens · 40572 ms · 2026-05-07T16:37:01.747110+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 3 canonical work pages

[1]

Spiking neural networks: A survey,

J. D. Nunes, M. Carvalho, D. Carneiro, and J. S. Car- doso, “Spiking neural networks: A survey,”IEEE Access, vol. 10, pp. 60 738–60 764, 2022

2022
[2]

Spiking neural networks: From bio-inspired computing to hybrid artificial intelligence,

M. Bouvier, V . G. Santucci, T. Mesquida, and et al., “Spiking neural networks: From bio-inspired computing to hybrid artificial intelligence,”Frontiers in Neuro- science, vol. 13, p. 959, 2019

2019
[3]

Incorporating learnable membrane time constant to enhance learning of spiking neural networks,

H. Fang, Y . Zhou, Y . Tian, and et al., “Incorporating learnable membrane time constant to enhance learning of spiking neural networks,” inProceedings of ICCV, 2021, pp. 2661–2671

2021
[4]

Spike-driven transformer,

M. Yao, J. Hu, Z. Zhou, L. Yuan, Y . Tian, B. Xu, and G. Li, “Spike-driven transformer,” inProceedings of NeurIPS, vol. 36, 2023, pp. 64 043–64 058

2023
[5]

Spikingvit: A multiscale spiking vision transformer model for event-based object detection,

L. Yu, H. Chen, Z. Wanget al., “Spikingvit: A multiscale spiking vision transformer model for event-based object detection,”IEEE Transactions on Cognitive and Devel- opmental Systems, vol. 17, no. 1, pp. 130–146, 2025

2025
[6]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inProceedings of COLM, 2024

2024
[7]

Hungry hungry hippos: Towards language modeling with state space models,

D. Y . Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. R ´e, “Hungry hungry hippos: Towards language modeling with state space models,” inProceedings of ICLR, 2023

2023
[8]

SpikingSSMs: Learning long sequences with sparse and parallel spiking state space models,

S. Shen, C. Wang, R. Huang, Y . Zhong, Q. Guo, Z. Lu, J. Zhang, and L. Leng, “SpikingSSMs: Learning long sequences with sparse and parallel spiking state space models,” inProceedings of AAAI, vol. 39, no. 19, 2025, pp. 20 380–20 388

2025
[9]

P-SpikeSSM: Harnessing probabilistic spiking state space models for long-range dependency tasks,

M. Bal and A. Sengupta, “P-SpikeSSM: Harnessing probabilistic spiking state space models for long-range dependency tasks,”Proceedings of ICLR, 2025

2025
[10]

Efficient spiking point mamba for point cloud analysis,

P. Wu, B. Chai, M. Zheng, W. Li, Z. Hu, J. Chen, Z. Zhang, H. Li, and X. Sun, “Efficient spiking point mamba for point cloud analysis,” inProceedings of ICCV, 2025, pp. 26 393–26 403

2025
[11]

Spikemba: Multi-modal spiking saliency mamba for temporal video grounding,

W. Li, X. Hong, R. Xiong, and X. Fan, “Spikemba: Multi-modal spiking saliency mamba for temporal video grounding,”arXiv preprint arXiv:2404.01174, 2024

work page arXiv 2024
[12]

Spikmamba: When snn meets mamba in event-based human action recognition,

J. Chen, Y . Yang, S. Deng, D. Teng, and L. Pan, “Spikmamba: When snn meets mamba in event-based human action recognition,” inProceedings of MMAsia, 2024, pp. 1–8

2024
[13]

State space models for event cameras,

N. Zubic, M. Gehrig, and D. Scaramuzza, “State space models for event cameras,” inProceedings of CVPR, 2024, pp. 5819–5828

2024
[14]

Spiking pointnet: Spiking neural networks for point clouds,

D. Ren, Z. Ma, Y . Chen, W. Peng, X. Liu, Y . Zhang, and Y . Guo, “Spiking pointnet: Spiking neural networks for point clouds,” inProceedings of NeurIPS, vol. 36, 2023, pp. 41 797–41 808

2023
[15]

Enhancing motion deblurring in high-speed scenes with spike streams,

S. Chen, J. Zhang, Y . Zheng, T. Huang, and Z. Yu, “Enhancing motion deblurring in high-speed scenes with spike streams,” inProceedings of NeurIPS, vol. 36, 2023, pp. 70 279–70 292

2023
[16]

Spiking deep residual networks,

Y . Hu, H. Tang, and G. Pan, “Spiking deep residual networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 8, pp. 5200–5205, 2021

2021
[17]

Deep residual learning in spiking neural networks,

W. Fang, Z. Yu, Y . Chen, T. Huang, T. Masquelier, and Y . Tian, “Deep residual learning in spiking neural networks,” inProceedings of Proceedings of NeurIPS, vol. 34, 2021, pp. 21 056–21 069

2021
[18]

Attention spiking neural networks,

M. Yao, G. Zhao, H. Zhang, Y . Hu, L. Deng, Y . Tian, B. Xu, and G. Li, “Attention spiking neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 9393–9410, 2023

2023
[19]

Spikformer: When spiking neural network meets transformer,

Z. Zhou, Y . Zhu, C. He, Y . Wang, S. Y AN, Y . Tian, and L. Yuan, “Spikformer: When spiking neural network meets transformer,” inProceedings of ICLR, 2023

2023
[20]

Capture the moment: High-speed imaging with spiking cameras through short-term plasticity,

Y . Zheng, L. Zheng, Z. Yu, T. Huang, and S. Wang, “Capture the moment: High-speed imaging with spiking cameras through short-term plasticity,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8127–8142, 2023

2023
[21]

Spikingformer: Spike-driven residual learn- ing for transformer-based spiking neural network.arXiv preprint arXiv:2304.11954,

C. Zhou, L. Yu, Z. Zhou, Z. Ma, H. Zhang, H. Zhou, and Y . Tian, “Spikingformer: Spike-driven residual learning for transformer-based spiking neural network,”arXiv preprint arXiv:2304.11954, 2023

work page arXiv 2023
[22]

Spiking vision transformer with saccadic attention,

S. Wang, M. Zhang, D. Zhang, A. Belatreche, Y . Xiao, Y . Liang, Y . Shan, Q. Sun, E. Zhang, and Y . Yang, “Spiking vision transformer with saccadic attention,” in Proceedings of ICLR, 2025

2025
[23]

SparseSpikformer: A co-design framework for token and weight pruning in spiking transformer,

Y . Liu, S. Xiao, B. Li, and Z. Yu, “SparseSpikformer: A co-design framework for token and weight pruning in spiking transformer,” inProceedings of ICASSP, 2024, pp. 6410–6414

2024
[24]

Efficiently modeling long sequences with structured state spaces,

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,” inProceedings of ICLR, 2022

2022
[25]

Sim- plified state space layers for sequence modeling,

J. T. Smith, A. Warrington, and S. Linderman, “Sim- plified state space layers for sequence modeling,” in Proceedings of ICLR, 2023

2023
[26]

Vision mamba: Efficient visual representation learning with bidirectional state space model,

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” inPro- ceedings of ICLR, 2024, pp. 62 429–62 442

2024
[27]

Vmamba: Visual state space model,

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,” inProceedings of NeurIPS, vol. 37, 2024, pp. 103 031– 103 063. 14

2024
[28]

LocalMamba: Visual state space model with windowed selective scan,

T. Huang, X. Pei, S. You, F. Wang, C. Qian, and C. Xu, “LocalMamba: Visual state space model with windowed selective scan,” inProceedings of ECCV, 2024, pp. 12– 22

2024
[29]

Multi-scale VMamba: Hierarchy in hierarchy visual state space model,

Y . Shi, M. Dong, and C. Xu, “Multi-scale VMamba: Hierarchy in hierarchy visual state space model,” in Proceedings of NeurIPS, vol. 37, 2024, pp. 25 687– 25 708

2024
[30]

Mamba-Reg: Vision mamba also needs registers,

F. Wang, J. Wang, S. Ren, G. Wei, J. Mei, W. Shao, Y . Zhou, A. Yuille, and C. Xie, “Mamba-Reg: Vision mamba also needs registers,” inProceedings of CVPR, 2025, pp. 14 944–14 953

2025
[31]

Dynamic vision mamba.arXiv preprint arXiv:2504.04787, 2025

M. Wu, Z. Li, Z. Liang, M. Li, X. Zhao, S. Khaki, Z. Zhu, X. Peng, K. N. Plataniotis, K. Wanget al., “Dynamic vision Mamba,”arXiv preprint arXiv:2504.04787, 2025

work page arXiv 2025
[32]

Demystify Mamba in vision: A linear attention perspective,

D. Han, Z. Wang, Z. Xia, Y . Han, Y . Pu, C. Ge, J. Song, S. Song, B. Zheng, and G. Huang, “Demystify Mamba in vision: A linear attention perspective,” inProceedings of NeurIPS, vol. 37, 2024, pp. 127 181–127 203

2024
[33]

Exploring token pruning in vision state space models,

Z. Zhan, Z. Kong, Y . Gong, Y . Wu, Z. Meng, H. Zheng, X. Shen, S. Ioannidis, W. Niu, P. Zhaoet al., “Exploring token pruning in vision state space models,” inProceed- ings of NeurIPS, vol. 37, 2024, pp. 50 952–50 971

2024
[34]

Rethinking token reduc- tion for state space models,

Z. Zhan, Y . Wu, Z. Kong, C. Yang, Y . Gong, X. Shen, X. Lin, P. Zhao, and Y . Wang, “Rethinking token reduc- tion for state space models,” inProceedings of EMNLP, 2024, pp. 1686–1697

2024
[35]

Efficient unstruc- tured pruning of mamba state-space models for resource- constrained environments,

I. F. Shihab, S. Akter, and A. Sharma, “Efficient unstruc- tured pruning of mamba state-space models for resource- constrained environments,” inProceedings of EMNLP, 2025, pp. 11 109–11 137

2025
[36]

Rate coding versus temporal order coding: what the retinal ganglion cells tell the visual cortex,

R. Van Rullen and S. J. Thorpe, “Rate coding versus temporal order coding: what the retinal ganglion cells tell the visual cortex,”Neural Computation, vol. 13, no. 6, pp. 1255–1283, 2001

2001
[37]

High-performance deep spiking neural networks with 0.3 spikes per neuron,

A. Stanojevic, S. Wo ´zniak, G. Bellec, G. Cherubini, A. Pantazi, and W. Gerstner, “High-performance deep spiking neural networks with 0.3 spikes per neuron,” Nature Communications, vol. 15, no. 1, p. 6793, 2024

2024
[38]

Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation,

N. Rathi, G. Srinivasan, P. Panda, and K. Roy, “Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation,” inProceedings of ICLR, 2020

2020
[39]

Temporal effi- cient training of spiking neural network via gradient re- weighting,

S. Deng, Y . Li, S. Zhang, and S. Gu, “Temporal effi- cient training of spiking neural network via gradient re- weighting,” inProceedings of ICLR, 2022

2022
[40]

Going deeper with directly-trained larger spiking neural net- works,

H. Zheng, Y . Wu, L. Deng, Y . Hu, and G. Li, “Going deeper with directly-trained larger spiking neural net- works,” inProceedings of AAAI, vol. 35, no. 12, 2021, pp. 11 062–11 070

2021
[41]

Advancing spiking neural networks toward deep residual learning,

Y . Hu, L. Deng, Y . Wu, M. Yao, and G. Li, “Advancing spiking neural networks toward deep residual learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 2, pp. 2353–2367, 2024

2024
[42]

CIFAR-10 Dataset,

A. Krizhevsky, V . Nair, and G. Hinton, “CIFAR-10 Dataset,” 2009, canadian Institute for Advanced Re- search

2009
[43]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of CVPR, 2009, pp. 248–255

2009
[44]

CIFAR10- DVS: An Event-Stream Dataset for Object Classifica- tion,

H. Li, H. Liu, X. Ji, G. Li, and L. Shi, “CIFAR10- DVS: An Event-Stream Dataset for Object Classifica- tion,”Frontiers in Neuroscience, vol. V olume 11 - 2017, 2017

2017
[45]

A low power, fully event-based gesture recognition system,

A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendozaet al., “A low power, fully event-based gesture recognition system,” inProceedings of CVPR, 2017, pp. 7243–7252

2017
[46]

1.1 Computing’s energy problem (and what we can do about it),

M. Horowitz, “1.1 Computing’s energy problem (and what we can do about it),” inProceedings of ISSCC, 2014, pp. 10–14

2014
[47]

Differentiable spike: Rethinking gradient-descent for training spiking neural networks,

Y . Li, Y . Guo, S. Zhang, S. Deng, Y . Hai, and S. Gu, “Differentiable spike: Rethinking gradient-descent for training spiking neural networks,” inProceedings of NeurIPS, vol. 34, 2021, pp. 23 426–23 439

2021
[48]

Training high-performance low-latency spiking neural networks by differentiation on spike representa- tion,

Q. Meng, M. Xiao, S. Yan, Y . Wang, Z. Lin, and Z.- Q. Luo, “Training high-performance low-latency spiking neural networks by differentiation on spike representa- tion,” inProceedings of CVPR, 2022, pp. 12 444–12 453

2022
[49]

Spike attention coding for spiking neural networks,

J. Liu, Y . Hu, G. Li, J. Pei, and L. Deng, “Spike attention coding for spiking neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 18 892–18 898, 2023

2023
[50]

Ternary spike: Learning ternary spikes for spiking neural networks,

Y . Guo, Y . Chen, X. Liu, W. Peng, Y . Zhang, X. Huang, and Z. Ma, “Ternary spike: Learning ternary spikes for spiking neural networks,” inProceedings of AAAI, vol. 38, no. 11, 2024, pp. 12 244–12 252

2024
[51]

TCJA-SNN: Temporal-channel joint attention for spiking neural networks,

R.-J. Zhu, M. Zhang, Q. Zhao, H. Deng, Y . Duan, and L.- J. Deng, “TCJA-SNN: Temporal-channel joint attention for spiking neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 3, pp. 5112–5125, 2024

2024
[52]

FSTA- SNN: Frequency-based spatial-temporal attention module for spiking neural networks,

K. Yu, T. Zhang, H. Wang, and Q. Xu, “FSTA- SNN: Frequency-based spatial-temporal attention module for spiking neural networks,” inProceedings of AAAI, vol. 39, no. 21, 2025, pp. 22 227–22 235

2025
[53]

Neu- romorphic data augmentation for training spiking neural networks,

Y . Li, Y . Kim, H. Park, T. Geller, and P. Panda, “Neu- romorphic data augmentation for training spiking neural networks,” inProceedings of ECCV. Springer, 2022, pp. 631–649

2022
[54]

Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization,

P. Panda, S. A. Aketi, and K. Roy, “Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization,”Frontiers in Neuroscience, vol. 14, p. 653, 2020

2020
[55]

Efficient 3d recognition with event- driven spike sparse convolution,

X. Qiu, M. Yao, J. Zhang, Y . Chou, N. Qiao, S. Zhou, B. Xu, and G. Li, “Efficient 3d recognition with event- driven spike sparse convolution,” inProceedings of AAAI, vol. 39, no. 19, 2025, pp. 20 086–20 094

2025
[56]

Advancing spiking neural networks towards multiscale spatiotemporal interaction learning,

Y . Shan, M. Zhang, R.-j. Zhu, X. Qiu, J. K. Eshraghian, and H. Qu, “Advancing spiking neural networks towards multiscale spatiotemporal interaction learning,” inPro- ceedings of AAAI, vol. 39, no. 2, 2025, pp. 1501–1509

2025
[57]

High-performance temporal reversible 15 spiking neural networks withO(l)training memory and O(1)inference cost,

J. Hu, M. Yao, X. Qiu, Y . Chou, Y . Cai, N. Qiao, Y . Tian, B. Xu, and G. Li, “High-performance temporal reversible 15 spiking neural networks withO(l)training memory and O(1)inference cost,” inProceedings of ICLR, vol. 235, 2024, pp. 19 516–19 530

2024