ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

Cheng Zou; Chen Nie; Honglan Jiang; Kang You; Lee Jun Yan; Yu Feng; Zekai Xu; Zhezhi He; Ziling Wei

arxiv: 2605.20802 · v1 · pith:5KHLANZUnew · submitted 2026-05-20 · 💻 cs.AR · cs.AI

ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

Kang You , Chen Nie , Lee Jun Yan , Ziling Wei , Cheng Zou , Zekai Xu , Yu Feng , Honglan Jiang

show 1 more author

Zhezhi He

This is my paper

Pith reviewed 2026-05-21 02:18 UTC · model grok-4.3

classification 💻 cs.AR cs.AI

keywords spiking neural networkselastic inferenceneuromorphic acceleratorSNN hardwareenergy efficiencypipeline architectureevent-driven computationneuromorphic computing

0 comments

The pith

ELSA realizes true elastic inference in spiking neural networks by forwarding each spine or token immediately in a fine-grained pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Spiking neural networks possess an elastic inference property that lets outputs emerge progressively and respond to salient inputs before full evaluation completes. Existing accelerators cannot use this property because layer-by-layer execution waits for every layer and time-step pipelines synchronize all spines or tokens within each layer before any result moves forward. ELSA overcomes the barrier with a near-SRAM dataflow design that pipelines at the individual spine or token level so each result is sent onward as soon as it is produced. Additional hardware features lower network-on-chip traffic through a bundled address-event protocol and reduce memory traffic by applying a mini-batch spiking Gustavson product that exploits sparsity. The resulting system delivers concrete gains in speed and energy while preserving accuracy, showing that properly supported SNNs can surpass both quantized artificial networks and prior SNN accelerators.

Core claim

The paper claims that a near-SRAM dataflow architecture equipped with a fine-grained spine/token-wise pipeline realizes true elastic inference by forwarding each spine or token immediately upon production, forming a continuous streaming pipeline that cuts latency to the first response; bundled address-event representation and mini-batch spiking Gustavson-product optimizations further reduce communication and memory costs, yielding 3.4× speedup and 13.6× energy-efficiency improvement over the SOTA QANN accelerator ANT together with 2.9× speedup and 22.1× energy-efficiency improvement over the SOTA SNN accelerator PAICORE for a 4-bit ResNet-50 at unchanged accuracy.

What carries the argument

Fine-grained spine/token-wise pipeline inside a near-SRAM dataflow architecture that enables immediate forwarding of partial results to capture elastic inference.

If this is right

SNNs produce usable outputs at the earliest possible moment rather than only after every layer finishes.
Neuromorphic accelerators can exceed both quantized ANN accelerators and earlier SNN accelerators in latency and energy efficiency.
Event-driven computation becomes practical without accuracy loss when mapping and scheduling match the fine-grained pipeline.
Bundled AER and sparse Gustavson-product techniques cut NoC traffic and memory accesses while keeping the streaming flow intact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-time neuromorphic systems could react to changing inputs at the moment the first reliable spikes appear rather than after fixed latency.
The same immediate-forwarding principle might be applied to other sparse, event-driven models to shorten decision latency in edge devices.
Dynamic depth adjustment becomes feasible if the pipeline naturally stops once confidence reaches a threshold.

Load-bearing premise

A fine-grained spine or token-wise pipeline can be built in hardware with negligible synchronization and communication overhead while still preserving the elastic property and accuracy.

What would settle it

Hardware measurements that compare actual time-to-first-output and total energy of an ELSA-style spine-wise pipeline against a conventional layer-wise or coarse time-step pipeline on the same SNN workload.

Figures

Figures reproduced from arXiv: 2605.20802 by Cheng Zou, Chen Nie, Honglan Jiang, Kang You, Lee Jun Yan, Yu Feng, Zekai Xu, Zhezhi He, Ziling Wei.

**Figure 1.** Figure 1: Illustration of elastic inference. Bars denote firstcorrect-response (FCR) latency, dashed lines mark stable-state outputs, and stars show QANN execution on an A100 GPU. instance, in Fig. 1a, visually prominent vehicles are recognized earlier, while distant ones require additional inference time. This phenomenon is consistent with early decision-making in biological neural systems [15], where salient stim… view at source ↗

**Figure 2.** Figure 2: Overall architecture and execution flow of ELSA. Layer1 Cache Neuron Circuit Memory (Spike, Weight, Membrane) Spikes, weights, Membrane Spikes, Membrane Memory (Spike, Weight, Membrane) Context Switching Cache Neuron Circuit Layer2 Context Switching Spikes, weights, Membrane Results Memory (Spike, Weight, Membrane) Memory (Weight, Mem.) Neuron Circuit PE2 Memory (Weight, Mem.) Neuron Circuit PE3 Memory (We… view at source ↗

**Figure 3.** Figure 3: Neural dynamics of (left) IF and (right) ST-BIF neuron. [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 6.** Figure 6: Left: Communication comparison of QANN and SNN [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗

**Figure 5.** Figure 5: Comparison of pipeline schemes. Colors denote different time-steps, and P1∼N denotes individual spines/tokens. The finer-grained pipeline enables substantially earlier first responses, thus better exploiting elastic inference. B. Operators in SNN 1) Matrix Multiplication (MM): Unlike conventional MM with two continuous-valued operands, SNNs use spikecontinuous MM (MM-sc) and spike-spike MM (MM-ss). Spik… view at source ↗

**Figure 7.** Figure 7: Energy breakdown when applying different execution patterns to ELSA. The workload is ResNet-18. header across the group and removing the per-spike header overhead of conventional AER [11]. This row-wise bundling reduces both packet count and metadata redundancy, yielding a more communication-efficient substrate that aligns naturally with the fine-grained spine/token-wise pipeline of ELSA. C. Neural Core w… view at source ↗

**Figure 9.** Figure 9: ST-BIF neuron circuit, which consists of an adder tree, a fire component, and an update component. A. Microarchitecture of Processing Element Our PE is designed to execute MM-sc as listed in Tab. I via mini-batch spiking Gustavson-product. As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗

**Figure 11.** Figure 11: ELSA Router Design. ELSA router contains five data paths, two paths ⃝1 ⃝2 to process spikes from local PEs and three paths ⃝3 ⃝4 ⃝5 to receive the flits from neural cores. SSoftmax & SLayerNorm Unit performs the ssoftmax and slayernorm summarized in Tab. I. m, n are the hop counts in flits ( [PITH_FULL_IMAGE:figures/full_fig_p005_11.png] view at source ↗

**Figure 12.** Figure 12: (a) Traditional AER and (b) Bundled AER (aka. BAER). “S./T.” denotes Spine/Token; “Dest.” is destination. “Type” is the flit position within a spine/token. chosen from ⃝1 or ⃝2 for spikes from its PEs and a remote path chosen from ⃝3 , ⃝4 , or ⃝5 for flits from other cores. Such an assignment prevents contention across the five data paths. On the local path, Local Input Reducer gathers spikes until Flit G… view at source ↗

**Figure 13.** Figure 13: Details of fine-grained spine/token-wise pipelines. (a) Spine-wise pipeline in convolution layers. The data dependence of the 1st spine (S1) in layer-3 is highlighted in dark orange. (b) Token-wise pipeline in a multi-layer perceptron. Algorithm 1: The control algorithm in Output Scheduler for spine-wise pipeline in CNN. 1 Input: kernel height Hk, kernel width Wk, convolution stride S, convolution padding… view at source ↗

**Figure 14.** Figure 14: Mapping Procedure in ELSA. ELSA maps SNN through three stages: partition, mapping, and routing. mapping, and routing. The mapping algorithm has three targets: 1) minimize the NoC traffic, 2) minimize the required peak bandwidth (aka. RPB), and 3) maximize PE utilization. Partition: In the partition stage, as shown in [PITH_FULL_IMAGE:figures/full_fig_p007_14.png] view at source ↗

**Figure 15.** Figure 15: Energy breakdown of ELSA on the benchmark W1-7 (Tab. II). Fire Comp. is short for fire component. The Pipeline Register Energy is consumed by FIFO Queue. via advanced integration technology. The router is mostly occupied by SSoftmax Unit and SLayerNorm Unit (i.e., 6.72% of ELSA). The reason is that SSoftmax Unit and SLayerNorm Unit contain ST-BIF neuron circuits and memories to store spike tracer and memb… view at source ↗

**Figure 16.** Figure 16: Energy and latency comparison of SNN accelerators. Statistics are normalized w.r.t. Eyeriss [21]. without elastic inference capability, ELSA achieves the highest throughput (4.9× higher than the SOTA accelerator CDNN [7]), since ELSA has larger on-chip hardware resources and leverages spine/token-level pipeline to reduce end-toend latency ( [PITH_FULL_IMAGE:figures/full_fig_p009_16.png] view at source ↗

**Figure 18.** Figure 18: Mismatch rate (%) and latency (ms) with different [PITH_FULL_IMAGE:figures/full_fig_p010_18.png] view at source ↗

**Figure 19.** Figure 19: Latency v.s. Significance (area ratio of bounding [PITH_FULL_IMAGE:figures/full_fig_p010_19.png] view at source ↗

**Figure 21.** Figure 21: Total inference cycles together with the cycle reduc [PITH_FULL_IMAGE:figures/full_fig_p011_21.png] view at source ↗

**Figure 25.** Figure 25: NoC Traffic and Latency Across Various Flit Sizes. [PITH_FULL_IMAGE:figures/full_fig_p012_25.png] view at source ↗

**Figure 27.** Figure 27: Flit distribution across ELSA NoC links. Violin width [PITH_FULL_IMAGE:figures/full_fig_p013_27.png] view at source ↗

**Figure 28.** Figure 28: Scaling study of ELSA in ResNet18, ResNet34, [PITH_FULL_IMAGE:figures/full_fig_p013_28.png] view at source ↗

read the original abstract

Spiking neural networks (SNNs) exploit event-driven and addition-only computation to substantially improve efficiency for intelligent computation. A key temporal property of SNNs, elastic inference, allows outputs to emerge progressively, enabling responses to salient inputs much earlier than full evaluation. However, existing SNN-specific accelerators cannot capitalize on this property. Layer-by-layer designs emit outputs only after all layers are complete, while time-step-by-time-step designs rely on coarse-grained, layer-wise pipelines that require synchronizing all spines/tokens within a layer. This barrier prevents results from being forwarded immediately, delaying the earliest possible response and forfeiting the benefits of elastic inference. To address these challenges, we propose ELSA, a near-SRAM dataflow architecture that realizes true elastic inference through a fine-grained spine/token-wise pipeline and hardware optimizations tailored to SNNs. ELSA forwards each spine/token immediately upon production, forming a continuous streaming pipeline that substantially reduces the latency to the first response. To enhance this lightweight execution, ELSA introduces a bundled address event representation protocol to lower communication traffic of network-on-chip (NoC), and leverages mini-batch spiking Gustavson-product to cut memory access and exploit inherent sparsity. Combined with mapping and scheduling optimizations, ELSA achieves efficient, event-driven computation without compromising accuracy. Experiments show that SNNs can outperform quantized artificial neural networks (QANNs) while maintaining on-par accuracy. For a 4-bit ResNet-50, ELSA achieves 3.4$\times$ speedup and 13.6$\times$ higher energy efficiency over the SOTA QANN accelerator (ANT), and 2.9$\times$ speedup and 22.1$\times$ energy efficiency gains over the SOTA SNN accelerator (PAICORE).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ELSA's fine-grained spine-wise pipeline targets a real gap in SNN accelerators for elastic inference, but the headline speedups need cycle-level evidence to show they survive NoC and sync costs.

read the letter

The main takeaway is that this paper builds a near-SRAM dataflow with immediate per-spine forwarding to turn the temporal property of SNNs into lower latency to first output. That is the concrete step beyond layer-by-layer or coarse time-step pipelines that the abstract describes. The bundled AER protocol and mini-batch spiking Gustavson product are presented as the practical fixes for traffic and memory under that schedule, and the mapping optimizations are meant to keep accuracy intact while cutting access energy. If the implementation works as drawn, the 2.9–3.4× gains over PAICORE and ANT would be useful numbers for neuromorphic hardware work. The paper earns credit for naming the exact barrier in prior designs and then tailoring the dataflow and protocol to remove it rather than adding another coarse pipeline stage. The claims rest on reported experimental outcomes rather than circular definitions, which keeps the burden low. The soft spot is exactly the one in the stress-test note. A fine-grained token-wise pipeline only delivers the elastic advantage if synchronization stalls, reordering buffers, and NoC hop costs stay negligible when spines or tokens increase. The abstract gives no cycle-accurate breakdown or error bars, so it is not yet clear whether those costs eat the reported latency reduction. Methodology details on simulation, dataset handling, and verification are also missing, which leaves the central performance numbers thin. This work is aimed at hardware designers who already care about SNN accelerators and event-driven pipelines. A reader who wants to see how elastic inference can be wired into real silicon would find the architecture description and the specific optimizations worth reading. The paper is coherent on its own terms and shows honest engagement with the cited prior accelerators, so it deserves a serious referee who can ask for the missing timing and energy breakdowns. I would send it to review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper proposes ELSA, an ELastic SNN Inference Architecture that uses a near-SRAM dataflow with a fine-grained spine/token-wise pipeline to enable immediate result forwarding for elastic inference in spiking neural networks. It introduces a bundled AER protocol to reduce NoC traffic and a mini-batch spiking Gustavson-product to optimize memory access and exploit sparsity. The central experimental claim is that for a 4-bit ResNet-50, ELSA provides 3.4× speedup and 13.6× energy efficiency improvement over the QANN accelerator ANT, and 2.9× speedup and 22.1× energy efficiency over the SNN accelerator PAICORE.

Significance. If the performance numbers are validated with detailed hardware modeling, this work could be significant for neuromorphic computing by demonstrating how to exploit elastic inference in hardware, potentially allowing SNNs to outperform QANNs in efficiency while maintaining accuracy. The approach addresses a key limitation in existing accelerators.

major comments (2)

[Abstract and Experimental Results] The abstract reports specific speedup and energy efficiency numbers (3.4× and 13.6× over ANT; 2.9× and 22.1× over PAICORE) for 4-bit ResNet-50, but the manuscript provides no details on simulation methodology, error bars, dataset splits, or verification steps. This weakens the support for the central performance claims and the assertion that the fine-grained pipeline delivers these gains without hidden synchronization costs.
[Architecture Design] The fine-grained spine/token-wise pipeline is presented as enabling immediate forwarding with negligible overhead, yet there is no cycle-accurate breakdown of inter-spine synchronization stalls, token reordering buffers, or NoC hop latency under this schedule. If these costs scale, the latency to first response and thus the elastic-inference advantage would be reduced, directly impacting the headline comparisons.

minor comments (1)

[Abstract] Consider adding a short statement on the accuracy maintenance or datasets used to support the 'without compromising accuracy' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript to provide the requested details and analysis.

read point-by-point responses

Referee: [Abstract and Experimental Results] The abstract reports specific speedup and energy efficiency numbers (3.4× and 13.6× over ANT; 2.9× and 22.1× over PAICORE) for 4-bit ResNet-50, but the manuscript provides no details on simulation methodology, error bars, dataset splits, or verification steps. This weakens the support for the central performance claims and the assertion that the fine-grained pipeline delivers these gains without hidden synchronization costs.

Authors: We agree that the manuscript would benefit from expanded details on the experimental methodology to better support the reported performance numbers. In the revised version, we will add a dedicated subsection describing the cycle-accurate simulation framework (derived from our RTL implementation), the ImageNet dataset splits and preprocessing used for ResNet-50, verification steps including cross-validation against software models, and error bars from repeated runs. We will also include additional analysis quantifying synchronization overheads in the fine-grained pipeline to confirm that they do not materially affect the elastic-inference latency gains. revision: yes
Referee: [Architecture Design] The fine-grained spine/token-wise pipeline is presented as enabling immediate forwarding with negligible overhead, yet there is no cycle-accurate breakdown of inter-spine synchronization stalls, token reordering buffers, or NoC hop latency under this schedule. If these costs scale, the latency to first response and thus the elastic-inference advantage would be reduced, directly impacting the headline comparisons.

Authors: We acknowledge the value of a more detailed cycle-accurate breakdown to substantiate the negligible-overhead claim. We will revise the architecture section to incorporate simulation results that break down inter-spine synchronization stalls, token reordering buffer occupancy and latency, and per-hop NoC costs under the spine/token-wise schedule. Our existing modeling indicates these components remain small relative to the overall pipeline benefits thanks to the bundled AER protocol and immediate forwarding, but the added data will allow readers to assess scalability directly. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on proposed architecture and external benchmarks

full rationale

The manuscript presents an architectural proposal for a near-SRAM dataflow SNN accelerator (ELSA) that enables fine-grained spine/token-wise pipelining to realize elastic inference. All headline performance numbers (3.4× speedup, 13.6× energy efficiency vs. ANT; 2.9× and 22.1× vs. PAICORE) are stated as outcomes of hardware mapping, scheduling, and experimental evaluation rather than any closed-form derivation or fitted prediction. No equations, uniqueness theorems, or self-citations appear in the provided text that would reduce a claimed result to its own inputs by construction. The work is therefore self-contained against external benchmarks and implementation measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that SNNs possess an exploitable elastic inference property and that fine-grained hardware pipelining can be implemented without accuracy or overhead penalties; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption SNNs possess an elastic inference property that allows outputs to emerge progressively before full evaluation
This property is invoked as the key motivation and the reason prior accelerators forfeit benefits.

pith-pipeline@v0.9.0 · 5881 in / 1330 out tokens · 35380 ms · 2026-05-21T02:18:27.115775+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 2 internal anchors

[1]

Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks,

T. Bu, W. Fang, J. Ding, P. Dai, Z. Yu, and T. Huang, “Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks,”arXiv preprint arXiv:2303.04347, 2023

work page arXiv 2023
[2]

Fast-snn: Fast spiking neural network by converting quantized ann,

Y . Hu, Q. Zheng, X. Jiang, and G. Pan, “Fast-snn: Fast spiking neural network by converting quantized ann,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023
[3]

Spikformer: When spiking neural network meets transformer

Z. Zhou, Y . Zhu, C. He, Y . Wang, S. Yan, Y . Tian, and L. Yuan, “Spikformer: When spiking neural network meets transformer,”arXiv preprint arXiv:2209.15425, 2022

work page arXiv 2022
[4]

Spikezip- tf: Conversion is all you need for transformer-based snn,

K. You, Z. Xu, C. Nie, Z. Deng, X. Wang, Q. Guo, and Z. He, “Spikezip- tf: Conversion is all you need for transformer-based snn,” inForty-first International Conference on Machine Learning (ICML), 2024

work page 2024
[5]

Towards spike-based machine intelligence with neuromorphic computing,

K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,”Nature, vol. 575, no. 7784, pp. 607–617, 2019

work page 2019
[6]

An energy-efficient unstructured sparsity-aware deep snn accelerator with 3-d computation array,

C. Fang, Z. Shen, Z. Wang, C. Wang, S. Zhao, F. Tian, J. Yang, and M. Sawan, “An energy-efficient unstructured sparsity-aware deep snn accelerator with 3-d computation array,”IEEE Journal of Solid-State Circuits, 2024

work page 2024
[7]

C- dnn: An energy-efficient complementary deep-neural-network processor with heterogeneous cnn/snn core architecture,

S. Kim, S. Kim, S. Hong, S. Kim, D. Han, J. Choi, and H.-J. Yoo, “C- dnn: An energy-efficient complementary deep-neural-network processor with heterogeneous cnn/snn core architecture,”IEEE Journal of Solid- State Circuits, vol. 59, no. 1, pp. 157–172, 2024

work page 2024
[8]

Sato: spiking neural network acceleration via temporal- oriented dataflow and architecture,

F. Liu, W. Zhao, Z. Wang, Y . Chen, T. Yang, Z. He, X. Yang, and L. Jiang, “Sato: spiking neural network acceleration via temporal- oriented dataflow and architecture,” inProceedings of the 59th ACM/IEEE Design Automation Conference, 2022, pp. 1105–1110

work page 2022
[9]

Loas: Fully temporal- parallel dataflow for dual-sparse spiking neural networks,

R. Yin, Y . Kim, D. Wu, and P. Panda, “Loas: Fully temporal- parallel dataflow for dual-sparse spiking neural networks,” in2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024, pp. 1107–1121

work page 2024
[10]

Parallel time batching: Systolic- array acceleration of sparse spiking neural computation,

J.-J. Lee, W. Zhang, and P. Li, “Parallel time batching: Systolic- array acceleration of sparse spiking neural computation,” in2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 317–330

work page 2022
[11]

Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,

F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y . Nakamura, P. Datta, and G.-J. Nam, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,”IEEE transactions on computer-aided design of integrated circuits and systems, vol. 34, no. 10, pp. 1537–1557, 2015

work page 2015
[12]

Loihi: A neuromorphic manycore processor with on-chip learning,

M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y . Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, and S. Jain, “Loihi: A neuromorphic manycore processor with on-chip learning,”Ieee Micro, vol. 38, no. 1, pp. 82–99, 2018

work page 2018
[13]

Paicore: A 1.9-million-neuron 5.181-tsops/w digital neuromorphic processor with unified snn-ann and on-chip learning paradigm,

Y . Zhong, Y . Kuang, K. Liu, Z. Wang, S. Feng, G. Chen, Y . Yang, X. Cui, Q. Wang, J. Cao, S. Jia, Y . Liang, G. Sun, X. Cui, R. Huang, and Y . Wang, “Paicore: A 1.9-million-neuron 5.181-tsops/w digital neuromorphic processor with unified snn-ann and on-chip learning paradigm,”IEEE Journal of Solid-State Circuits, vol. 60, no. 2, pp. 651–671, 2025

work page 2025
[14]

Darwin3: A large-scale neuromorphic chip with a novel isa and on-chip learning,

D. Ma, X. Jin, S. Sun, Y . Li, X. Wu, Y . Hu, F. Yang, H. Tang, X. Zhu, P. Lin, and G. Pan, “Darwin3: A large-scale neuromorphic chip with a novel isa and on-chip learning,” 2023. [Online]. Available: https://arxiv.org/abs/2312.17582

work page arXiv 2023
[15]

Speed of processing in the human visual system,

S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system,”nature, vol. 381, no. 6582, pp. 520–522, 1996

work page 1996
[16]

3d object detection for autonomous driving: A survey,

J. Mao, S. Shi, X. Wang, and H. Li, “3d object detection for autonomous driving: A survey,”Pattern Recognition, vol. 130, p. 108796, 2022

work page 2022
[17]

Morphic: A 65-nm 738k- synapse/mm2 quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning,

C. Frenkel, J.-D. Legat, and D. Bol, “Morphic: A 65-nm 738k- synapse/mm2 quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning,”IEEE Transactions on Biomedical Circuits and Systems, vol. 13, no. 5, pp. 999–1010, 2019

work page 2019
[18]

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization,

C. Guo, C. Zhang, J. Leng, Z. Liu, F. Yang, Y . Liu, M. Guo, and Y . Zhu, “Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization,” in2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, pp. 1414–1433

work page 2022
[19]

Stellar: Energy- efficient and low-latency snn algorithm and hardware co-design with spatiotemporal computation,

R. Mao, L. Tang, X. Yuan, Y . Liu, and J. Zhou, “Stellar: Energy- efficient and low-latency snn algorithm and hardware co-design with spatiotemporal computation,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2024, pp. 172–185

work page 2024
[20]

Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,

B. Han, G. Srinivasan, and K. Roy, “Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 558–13 567

work page 2020
[21]

Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural net- works,

Y .-H. Chen, T. Krishna, J. S. Emer, and V . Sze, “Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural net- works,”IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127– 138, 2017

work page 2017
[22]

In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS ’21)

G. Zhang, N. Attaluri, J. S. Emer, and D. Sanchez, “Gamma: leveraging gustavson’s algorithm to accelerate sparse matrix multiplication,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 687–701....

work page doi:10.1145/3445814.3446702 2021
[23]

Simulation and analysis of network on chip architectures: ring, spidergon and 2d mesh,

L. Bononi and N. Concer, “Simulation and analysis of network on chip architectures: ring, spidergon and 2d mesh,” inProceedings of the Design Automation & Test in Europe Conference, vol. 2. IEEE, 2006, pp. 6–pp

work page 2006
[24]

Swifttron: An efficient hardware accelerator for quan- tized transformers,

A. Marchisio, D. Dura, M. Capra, M. Martina, G. Masera, and M. Shafique, “Swifttron: An efficient hardware accelerator for quan- tized transformers,” in2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023, pp. 1–9

work page 2023
[25]

Modified hilbert curve for rectangles and cuboids and its application in entropy coding for image and video compression,

Y . Rong, X. Zhang, and J. Lin, “Modified hilbert curve for rectangles and cuboids and its application in entropy coding for image and video compression,”Entropy, vol. 23, no. 7, 2021. [Online]. Available: https://www.mdpi.com/1099-4300/23/7/836

work page 2021
[26]

Mapping very large scale spiking neuron network to neuromorphic hardware,

O. Jin, Q. Xing, Y . Li, S. Deng, S. He, and G. Pan, “Mapping very large scale spiking neuron network to neuromorphic hardware,” inProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ser. ASPLOS 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 419–4...

work page doi:10.1145/3582016.3582038 2023
[27]

Dramsim3: A cycle-accurate, thermal-capable dram simulator,

S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, “Dramsim3: A cycle-accurate, thermal-capable dram simulator,”IEEE Computer Architecture Letters, vol. 19, no. 2, pp. 106–109, 2020

work page 2020
[28]

Highlights of the high-bandwidth memory (hbm) stan- dard,

M. O’Connor, “Highlights of the high-bandwidth memory (hbm) stan- dard,” inMemory forum workshop, vol. 3, 2014

work page 2014
[29]

Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling,

C. Sun, C.-H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V . Stojanovic, “Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling,” in2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, 2012, pp. 201–210

work page 2012
[30]

Spinalflow: An architecture and dataflow tailored for spiking neural networks,

S. Narayanan, K. Taht, R. Balasubramonian, E. Giacomin, and P.- E. Gaillardon, “Spinalflow: An architecture and dataflow tailored for spiking neural networks,” in2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2020, pp. 349– 362

work page 2020
[31]

Prosperity: Accelerating spiking neural networks via product sparsity,

C. Wei, C. Guo, F. Cheng, S. Li, H. F. Yang, H. H. Li, and Y . Chen, “Prosperity: Accelerating spiking neural networks via product sparsity,”

work page
[32]

Available: https://arxiv.org/abs/2503.03379

[Online]. Available: https://arxiv.org/abs/2503.03379

work page arXiv
[33]

A 0.078 pj/sop unstructured sparsity-aware spiking attention/convolution processor with 3d compute array,

C. Fang, Z. Shen, S. Zhao, C. Wang, F. Tian, J. Yang, and M. Sawan, “A 0.078 pj/sop unstructured sparsity-aware spiking attention/convolution processor with 3d compute array,” in2024 IEEE Custom Integrated Circuits Conference (CICC), 2024, pp. 1–2

work page 2024
[34]

Phi: Leveraging pattern-based hierarchical sparsity for high-efficiency spiking neural networks,

C. Wei, B. Duan, C. Guo, J. Zhang, Q. Song, H. H. Li, and Y . Chen, “Phi: Leveraging pattern-based hierarchical sparsity for high-efficiency spiking neural networks,” 2025. [Online]. Available: https://arxiv.org/abs/2505.10909

work page arXiv 2025
[35]

Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,

Y .-H. Chen, T.-J. Yang, J. Emer, and V . Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019

work page 2019
[36]

A 28nm 2d/3d unified sparse convolution accelerator with block-wise neighbor searcher for large-scaled voxel-based point cloud network,

W. Sun, X. Feng, C. Tang, S. Fan, Y . Yang, J. Yue, H. Yang, and Y . Liu, “A 28nm 2d/3d unified sparse convolution accelerator with block-wise neighbor searcher for large-scaled voxel-based point cloud network,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC), 2023, pp. 328–330

work page 2023
[37]

A 28nm 343.5fps/w vision transformer accelerator with integer-only quantized attention block,

C.-C. Lin, W. Lu, P.-T. Huang, and H.-M. Chen, “A 28nm 343.5fps/w vision transformer accelerator with integer-only quantized attention block,” in2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS), 2024, pp. 80–84. 15

work page 2024
[38]

Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture,

L. Lu, Y . Jin, H. Bi, Z. Luo, P. Li, T. Wang, and Y . Liang, “Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture,” inMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 977–991. [Online]. Available: https:/...

work page doi:10.1145/3466752.3480125 2021
[39]

Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention,

J. Dass, S. Wu, H. Shi, C. Li, Z. Ye, Z. Wang, and Y . Lin, “Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention,” in2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 415–428

work page 2023
[40]

16.3 a 28nm 384kb 6t-sram computation-in-memory macro with 8b precision for ai edge chips,

J.-W. Su, Y .-C. Chou, R. Liu, T.-W. Liu, P.-J. Lu, P.-C. Wu, Y .-L. Chung, L.-Y . Hung, J.-S. Ren, T. Pan, S.-H. Li, S.-C. Chang, S.-S. Sheu, W.- C. Lo, C.-I. Wu, X. Si, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “16.3 a 28nm 384kb 6t-sram computation-in-memory macro with 8b precision for ai edge chips,” in2021 IEEE International Soli...

work page 2021
[41]

34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog-storage quantizers for transformer and cnns,

A. Guo, X. Chen, F. Dong, J. Chen, Z. Yuan, X. Hu, Y . Zhang, J. Zhang, Y . Tang, Z. Zhang, G. Chen, D. Yang, Z. Zhang, L. Ren, T. Xiong, B. Wang, B. Liu, W. Shan, X. Liu, H. Cai, G. Sun, J. Yang, and X. Si, “34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog-storage quantizers for transformer and cnns...

work page 2024
[42]

Reconfigurable dataflow optimization for spatiotem- poral spiking neural computation on systolic array accelerators,

J.-J. Lee and P. Li, “Reconfigurable dataflow optimization for spatiotem- poral spiking neural computation on systolic array accelerators,” in 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 2020, pp. 57–64

work page 2020
[43]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”CoRR, vol. abs/1409.1556, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:14124313

work page internal anchor Pith review Pith/arXiv arXiv 2014
[44]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

work page 2016
[45]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenRevie...

work page 2021
[46]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009

work page 2009
[47]

Cifar10-dvs: An event-stream dataset for object classification,

H. Li, H. Liu, X. Ji, G. Li, and L. Shi, “Cifar10-dvs: An event-stream dataset for object classification,”Frontiers in Neuroscience, vol. 11, 2017

work page 2017
[48]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

work page 2009
[49]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean conference on computer vision. Springer, 2014, pp. 740–755

work page 2014
[50]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,”International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010

work page 2010
[51]

Nvidia jetson agx orin 64 gb

Nvidia. Nvidia jetson agx orin 64 gb. 2021, Nov 09. [Online]. Available: https://www.techpowerup.com/gpu-specs/jetson-agx-orin-64-gb.c4085

work page 2021
[52]

Nvidia a100

NVIDIA. Nvidia a100. 2020, May 04. [Online]. Available: https://www.nvidia.cn/content/dam/en-zz/Solutions/Data-Center/a100/ pdf/ampere-a100-datasheet-a4-nvidia-1293124-r10-web-zhCN.pdf

work page 2020
[53]

Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,

N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. A. Patterson, “Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA...

work page doi:10.1145/3579371.3589350 2023
[54]

Groqcard accelerator

Groq. Groqcard accelerator. 2022. [Online]. Available: https://groq. com/wp-content/uploads/2024/02

work page 2022
[55]

Seenn: Towards temporal spiking early-exit neural networks,

Y . Li, T. Geller, Y . Kim, and P. Panda, “Seenn: Towards temporal spiking early-exit neural networks,” 2023. [Online]. Available: https://arxiv.org/abs/2304.01230

work page arXiv 2023
[56]

Optimizing event-driven spiking neural network with regularization and cutoff,

D. Wu, G. Jin, H. Yu, X. Yi, and X. Huang, “Optimizing event-driven spiking neural network with regularization and cutoff,” Frontiers in Neuroscience, vol. 19, Feb. 2025. [Online]. Available: http://dx.doi.org/10.3389/fnins.2025.1522788

work page doi:10.3389/fnins.2025.1522788 2025
[57]

Knowing when to stop: Delay- adaptive spiking neural network classifiers with reliability guarantees,

J. Chen, S. Park, and O. Simeone, “Knowing when to stop: Delay- adaptive spiking neural network classifiers with reliability guarantees,”

work page
[58]

Available: https://arxiv.org/abs/2305.11322

[Online]. Available: https://arxiv.org/abs/2305.11322

work page arXiv
[59]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

work page 2016
[60]

Logic-based edram: Origins and rationale for use,

R. E. Matick and S. E. Schuster, “Logic-based edram: Origins and rationale for use,”IBM Journal of Research and Development, vol. 49, no. 1, pp. 145–165, 2005

work page 2005
[61]

A survey of architectural approaches for managing embedded dram and non-volatile on-chip caches,

S. Mittal, J. S. Vetter, and D. Li, “A survey of architectural approaches for managing embedded dram and non-volatile on-chip caches,”IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 6, pp. 1524–1537, 2014

work page 2014
[62]

A high-performance, high-density 28nm edram technology with high-k/metal-gate,

K. Huang, Y . Ting, C. Chang, K. Tu, K. Tzeng, H. Chu, C. Pai, A. Katoch, W. Kuo, K. Chenet al., “A high-performance, high-density 28nm edram technology with high-k/metal-gate,” in2011 International Electron Devices Meeting. IEEE, 2011, pp. 24–7

work page 2011
[63]

Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,

H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. K. Kim, V . Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,” in2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018, pp. 764–775

work page 2018
[64]

The spinnaker project,

S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, “The spinnaker project,”Proceedings of the IEEE, vol. 102, no. 5, pp. 652–665, 2014

work page 2014
[65]

Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product,

N. Srivastava, H. Jin, J. Liu, D. Albonesi, and Z. Zhang, “Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product,” in2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 766–780

work page 2020
[66]

Cerebras architecture deep dive: First look inside the hard- ware/software co-design for deep learning,

S. Lie, “Cerebras architecture deep dive: First look inside the hard- ware/software co-design for deep learning,”Ieee Micro, vol. 43, no. 3, pp. 18–30, 2023

work page 2023
[67]

Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,

C. Zou, Z. Wei, J. Y . Lee, C. Nie, K. You, and Z. He, “Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,” in2025 58th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2025

work page 2025
[68]

Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,

C. Nie, C. Tang, J. Lin, H. Hu, C. Lv, T. Cao, W. Zhang, L. Jiang, X. Liang, W. Qian, Y . Sun, and Z. He, “Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,”IEEE Transac- tions on Computers, vol. 73, no. 10, pp. 2378–2390, 2024

work page 2024
[69]

Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,

R. Fan, Y . Cui, Q. Chen, M. Wang, Y . Zhang, W. Zheng, and Z. Li, “Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 411–423. [Online...

work page doi:10.1145/3613424 2023
[70]

33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac com- puting,

Q. Liu, B. Gao, P. Yao, D. Wu, J. Chen, Y . Pang, W. Zhang, Y . Liao, C.-X. Xue, W.-H. Chenet al., “33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac com- puting,” in2020 IEEE International Solid-State Circuits Conference- (ISSCC). IEEE, 2020, pp. 500–502

work page 2020
[71]

Ir-qnn framework: An ir drop-aware offline training of quantized crossbar arrays,

M. E. Fouda, S. Lee, J. Lee, G. H. Kim, F. Kurdahi, and A. M. Eltawi, “Ir-qnn framework: An ir drop-aware offline training of quantized crossbar arrays,”IEEE Access, vol. 8, pp. 228 392–228 408, 2020

work page 2020
[72]

Rxnn: A framework for evaluating deep neural networks on resistive crossbars,

S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “Rxnn: A framework for evaluating deep neural networks on resistive crossbars,”IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 2, pp. 326–338, 2020

work page 2020
[73]

Spinnaker2: A large-scale neuromorphic system for event-based and asynchronous machine learning,

H. A. Gonzalez, J. Huang, F. Kelber, K. K. Nazeer, T. Langer, C. Liu, M. Lohrmann, A. Rostami, M. Sch¨one, B. V oggingeret al., “Spinnaker2: A large-scale neuromorphic system for event-based and asynchronous machine learning,”arXiv preprint arXiv:2401.04491, 2024

work page arXiv 2024
[74]

Intel builds world’s largest neuromorphic sys- tem to enable more sustainable ai,

Intel Newsroom, “Intel builds world’s largest neuromorphic sys- tem to enable more sustainable ai,” https://newsroom.intel.com/ artificial-intelligence, 2024, accessed: 2026-04-26

work page 2024
[75]

Gustavsnn: Unleashing the power of gustavson’s algorithm on snn acceleration with column-parallel tick-batch dataflow,

S. Hwang, D. Lee, J. Koo, and J. Kung, “Gustavsnn: Unleashing the power of gustavson’s algorithm on snn acceleration with column-parallel tick-batch dataflow,” in2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2026, pp. 1–14. 16

work page 2026
[76]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020. 17

work page internal anchor Pith review Pith/arXiv arXiv 2010

[1] [1]

Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks,

T. Bu, W. Fang, J. Ding, P. Dai, Z. Yu, and T. Huang, “Optimal ann- snn conversion for high-accuracy and ultra-low-latency spiking neural networks,”arXiv preprint arXiv:2303.04347, 2023

work page arXiv 2023

[2] [2]

Fast-snn: Fast spiking neural network by converting quantized ann,

Y . Hu, Q. Zheng, X. Jiang, and G. Pan, “Fast-snn: Fast spiking neural network by converting quantized ann,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023

[3] [3]

Spikformer: When spiking neural network meets transformer

Z. Zhou, Y . Zhu, C. He, Y . Wang, S. Yan, Y . Tian, and L. Yuan, “Spikformer: When spiking neural network meets transformer,”arXiv preprint arXiv:2209.15425, 2022

work page arXiv 2022

[4] [4]

Spikezip- tf: Conversion is all you need for transformer-based snn,

K. You, Z. Xu, C. Nie, Z. Deng, X. Wang, Q. Guo, and Z. He, “Spikezip- tf: Conversion is all you need for transformer-based snn,” inForty-first International Conference on Machine Learning (ICML), 2024

work page 2024

[5] [5]

Towards spike-based machine intelligence with neuromorphic computing,

K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,”Nature, vol. 575, no. 7784, pp. 607–617, 2019

work page 2019

[6] [6]

An energy-efficient unstructured sparsity-aware deep snn accelerator with 3-d computation array,

C. Fang, Z. Shen, Z. Wang, C. Wang, S. Zhao, F. Tian, J. Yang, and M. Sawan, “An energy-efficient unstructured sparsity-aware deep snn accelerator with 3-d computation array,”IEEE Journal of Solid-State Circuits, 2024

work page 2024

[7] [7]

C- dnn: An energy-efficient complementary deep-neural-network processor with heterogeneous cnn/snn core architecture,

S. Kim, S. Kim, S. Hong, S. Kim, D. Han, J. Choi, and H.-J. Yoo, “C- dnn: An energy-efficient complementary deep-neural-network processor with heterogeneous cnn/snn core architecture,”IEEE Journal of Solid- State Circuits, vol. 59, no. 1, pp. 157–172, 2024

work page 2024

[8] [8]

Sato: spiking neural network acceleration via temporal- oriented dataflow and architecture,

F. Liu, W. Zhao, Z. Wang, Y . Chen, T. Yang, Z. He, X. Yang, and L. Jiang, “Sato: spiking neural network acceleration via temporal- oriented dataflow and architecture,” inProceedings of the 59th ACM/IEEE Design Automation Conference, 2022, pp. 1105–1110

work page 2022

[9] [9]

Loas: Fully temporal- parallel dataflow for dual-sparse spiking neural networks,

R. Yin, Y . Kim, D. Wu, and P. Panda, “Loas: Fully temporal- parallel dataflow for dual-sparse spiking neural networks,” in2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024, pp. 1107–1121

work page 2024

[10] [10]

Parallel time batching: Systolic- array acceleration of sparse spiking neural computation,

J.-J. Lee, W. Zhang, and P. Li, “Parallel time batching: Systolic- array acceleration of sparse spiking neural computation,” in2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 317–330

work page 2022

[11] [11]

Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,

F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y . Nakamura, P. Datta, and G.-J. Nam, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,”IEEE transactions on computer-aided design of integrated circuits and systems, vol. 34, no. 10, pp. 1537–1557, 2015

work page 2015

[12] [12]

Loihi: A neuromorphic manycore processor with on-chip learning,

M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y . Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, and S. Jain, “Loihi: A neuromorphic manycore processor with on-chip learning,”Ieee Micro, vol. 38, no. 1, pp. 82–99, 2018

work page 2018

[13] [13]

Paicore: A 1.9-million-neuron 5.181-tsops/w digital neuromorphic processor with unified snn-ann and on-chip learning paradigm,

Y . Zhong, Y . Kuang, K. Liu, Z. Wang, S. Feng, G. Chen, Y . Yang, X. Cui, Q. Wang, J. Cao, S. Jia, Y . Liang, G. Sun, X. Cui, R. Huang, and Y . Wang, “Paicore: A 1.9-million-neuron 5.181-tsops/w digital neuromorphic processor with unified snn-ann and on-chip learning paradigm,”IEEE Journal of Solid-State Circuits, vol. 60, no. 2, pp. 651–671, 2025

work page 2025

[14] [14]

Darwin3: A large-scale neuromorphic chip with a novel isa and on-chip learning,

D. Ma, X. Jin, S. Sun, Y . Li, X. Wu, Y . Hu, F. Yang, H. Tang, X. Zhu, P. Lin, and G. Pan, “Darwin3: A large-scale neuromorphic chip with a novel isa and on-chip learning,” 2023. [Online]. Available: https://arxiv.org/abs/2312.17582

work page arXiv 2023

[15] [15]

Speed of processing in the human visual system,

S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system,”nature, vol. 381, no. 6582, pp. 520–522, 1996

work page 1996

[16] [16]

3d object detection for autonomous driving: A survey,

J. Mao, S. Shi, X. Wang, and H. Li, “3d object detection for autonomous driving: A survey,”Pattern Recognition, vol. 130, p. 108796, 2022

work page 2022

[17] [17]

Morphic: A 65-nm 738k- synapse/mm2 quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning,

C. Frenkel, J.-D. Legat, and D. Bol, “Morphic: A 65-nm 738k- synapse/mm2 quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning,”IEEE Transactions on Biomedical Circuits and Systems, vol. 13, no. 5, pp. 999–1010, 2019

work page 2019

[18] [18]

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization,

C. Guo, C. Zhang, J. Leng, Z. Liu, F. Yang, Y . Liu, M. Guo, and Y . Zhu, “Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization,” in2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, pp. 1414–1433

work page 2022

[19] [19]

Stellar: Energy- efficient and low-latency snn algorithm and hardware co-design with spatiotemporal computation,

R. Mao, L. Tang, X. Yuan, Y . Liu, and J. Zhou, “Stellar: Energy- efficient and low-latency snn algorithm and hardware co-design with spatiotemporal computation,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2024, pp. 172–185

work page 2024

[20] [20]

Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,

B. Han, G. Srinivasan, and K. Roy, “Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 558–13 567

work page 2020

[21] [21]

Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural net- works,

Y .-H. Chen, T. Krishna, J. S. Emer, and V . Sze, “Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural net- works,”IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127– 138, 2017

work page 2017

[22] [22]

In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS ’21)

G. Zhang, N. Attaluri, J. S. Emer, and D. Sanchez, “Gamma: leveraging gustavson’s algorithm to accelerate sparse matrix multiplication,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 687–701....

work page doi:10.1145/3445814.3446702 2021

[23] [23]

Simulation and analysis of network on chip architectures: ring, spidergon and 2d mesh,

L. Bononi and N. Concer, “Simulation and analysis of network on chip architectures: ring, spidergon and 2d mesh,” inProceedings of the Design Automation & Test in Europe Conference, vol. 2. IEEE, 2006, pp. 6–pp

work page 2006

[24] [24]

Swifttron: An efficient hardware accelerator for quan- tized transformers,

A. Marchisio, D. Dura, M. Capra, M. Martina, G. Masera, and M. Shafique, “Swifttron: An efficient hardware accelerator for quan- tized transformers,” in2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023, pp. 1–9

work page 2023

[25] [25]

Modified hilbert curve for rectangles and cuboids and its application in entropy coding for image and video compression,

Y . Rong, X. Zhang, and J. Lin, “Modified hilbert curve for rectangles and cuboids and its application in entropy coding for image and video compression,”Entropy, vol. 23, no. 7, 2021. [Online]. Available: https://www.mdpi.com/1099-4300/23/7/836

work page 2021

[26] [26]

Mapping very large scale spiking neuron network to neuromorphic hardware,

O. Jin, Q. Xing, Y . Li, S. Deng, S. He, and G. Pan, “Mapping very large scale spiking neuron network to neuromorphic hardware,” inProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ser. ASPLOS 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 419–4...

work page doi:10.1145/3582016.3582038 2023

[27] [27]

Dramsim3: A cycle-accurate, thermal-capable dram simulator,

S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, “Dramsim3: A cycle-accurate, thermal-capable dram simulator,”IEEE Computer Architecture Letters, vol. 19, no. 2, pp. 106–109, 2020

work page 2020

[28] [28]

Highlights of the high-bandwidth memory (hbm) stan- dard,

M. O’Connor, “Highlights of the high-bandwidth memory (hbm) stan- dard,” inMemory forum workshop, vol. 3, 2014

work page 2014

[29] [29]

Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling,

C. Sun, C.-H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V . Stojanovic, “Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling,” in2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, 2012, pp. 201–210

work page 2012

[30] [30]

Spinalflow: An architecture and dataflow tailored for spiking neural networks,

S. Narayanan, K. Taht, R. Balasubramonian, E. Giacomin, and P.- E. Gaillardon, “Spinalflow: An architecture and dataflow tailored for spiking neural networks,” in2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2020, pp. 349– 362

work page 2020

[31] [31]

Prosperity: Accelerating spiking neural networks via product sparsity,

C. Wei, C. Guo, F. Cheng, S. Li, H. F. Yang, H. H. Li, and Y . Chen, “Prosperity: Accelerating spiking neural networks via product sparsity,”

work page

[32] [32]

Available: https://arxiv.org/abs/2503.03379

[Online]. Available: https://arxiv.org/abs/2503.03379

work page arXiv

[33] [33]

A 0.078 pj/sop unstructured sparsity-aware spiking attention/convolution processor with 3d compute array,

C. Fang, Z. Shen, S. Zhao, C. Wang, F. Tian, J. Yang, and M. Sawan, “A 0.078 pj/sop unstructured sparsity-aware spiking attention/convolution processor with 3d compute array,” in2024 IEEE Custom Integrated Circuits Conference (CICC), 2024, pp. 1–2

work page 2024

[34] [34]

Phi: Leveraging pattern-based hierarchical sparsity for high-efficiency spiking neural networks,

C. Wei, B. Duan, C. Guo, J. Zhang, Q. Song, H. H. Li, and Y . Chen, “Phi: Leveraging pattern-based hierarchical sparsity for high-efficiency spiking neural networks,” 2025. [Online]. Available: https://arxiv.org/abs/2505.10909

work page arXiv 2025

[35] [35]

Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,

Y .-H. Chen, T.-J. Yang, J. Emer, and V . Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019

work page 2019

[36] [36]

A 28nm 2d/3d unified sparse convolution accelerator with block-wise neighbor searcher for large-scaled voxel-based point cloud network,

W. Sun, X. Feng, C. Tang, S. Fan, Y . Yang, J. Yue, H. Yang, and Y . Liu, “A 28nm 2d/3d unified sparse convolution accelerator with block-wise neighbor searcher for large-scaled voxel-based point cloud network,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC), 2023, pp. 328–330

work page 2023

[37] [37]

A 28nm 343.5fps/w vision transformer accelerator with integer-only quantized attention block,

C.-C. Lin, W. Lu, P.-T. Huang, and H.-M. Chen, “A 28nm 343.5fps/w vision transformer accelerator with integer-only quantized attention block,” in2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS), 2024, pp. 80–84. 15

work page 2024

[38] [38]

Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture,

L. Lu, Y . Jin, H. Bi, Z. Luo, P. Li, T. Wang, and Y . Liang, “Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture,” inMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 977–991. [Online]. Available: https:/...

work page doi:10.1145/3466752.3480125 2021

[39] [39]

Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention,

J. Dass, S. Wu, H. Shi, C. Li, Z. Ye, Z. Wang, and Y . Lin, “Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention,” in2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 415–428

work page 2023

[40] [40]

16.3 a 28nm 384kb 6t-sram computation-in-memory macro with 8b precision for ai edge chips,

J.-W. Su, Y .-C. Chou, R. Liu, T.-W. Liu, P.-J. Lu, P.-C. Wu, Y .-L. Chung, L.-Y . Hung, J.-S. Ren, T. Pan, S.-H. Li, S.-C. Chang, S.-S. Sheu, W.- C. Lo, C.-I. Wu, X. Si, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “16.3 a 28nm 384kb 6t-sram computation-in-memory macro with 8b precision for ai edge chips,” in2021 IEEE International Soli...

work page 2021

[41] [41]

34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog-storage quantizers for transformer and cnns,

A. Guo, X. Chen, F. Dong, J. Chen, Z. Yuan, X. Hu, Y . Zhang, J. Zhang, Y . Tang, Z. Zhang, G. Chen, D. Yang, Z. Zhang, L. Ren, T. Xiong, B. Wang, B. Liu, W. Shan, X. Liu, H. Cai, G. Sun, J. Yang, and X. Si, “34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog-storage quantizers for transformer and cnns...

work page 2024

[42] [42]

Reconfigurable dataflow optimization for spatiotem- poral spiking neural computation on systolic array accelerators,

J.-J. Lee and P. Li, “Reconfigurable dataflow optimization for spatiotem- poral spiking neural computation on systolic array accelerators,” in 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 2020, pp. 57–64

work page 2020

[43] [43]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”CoRR, vol. abs/1409.1556, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:14124313

work page internal anchor Pith review Pith/arXiv arXiv 2014

[44] [44]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

work page 2016

[45] [45]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenRevie...

work page 2021

[46] [46]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009

work page 2009

[47] [47]

Cifar10-dvs: An event-stream dataset for object classification,

H. Li, H. Liu, X. Ji, G. Li, and L. Shi, “Cifar10-dvs: An event-stream dataset for object classification,”Frontiers in Neuroscience, vol. 11, 2017

work page 2017

[48] [48]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

work page 2009

[49] [49]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean conference on computer vision. Springer, 2014, pp. 740–755

work page 2014

[50] [50]

The pascal visual object classes (voc) challenge,

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,”International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010

work page 2010

[51] [51]

Nvidia jetson agx orin 64 gb

Nvidia. Nvidia jetson agx orin 64 gb. 2021, Nov 09. [Online]. Available: https://www.techpowerup.com/gpu-specs/jetson-agx-orin-64-gb.c4085

work page 2021

[52] [52]

Nvidia a100

NVIDIA. Nvidia a100. 2020, May 04. [Online]. Available: https://www.nvidia.cn/content/dam/en-zz/Solutions/Data-Center/a100/ pdf/ampere-a100-datasheet-a4-nvidia-1293124-r10-web-zhCN.pdf

work page 2020

[53] [53]

Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,

N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. A. Patterson, “Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA...

work page doi:10.1145/3579371.3589350 2023

[54] [54]

Groqcard accelerator

Groq. Groqcard accelerator. 2022. [Online]. Available: https://groq. com/wp-content/uploads/2024/02

work page 2022

[55] [55]

Seenn: Towards temporal spiking early-exit neural networks,

Y . Li, T. Geller, Y . Kim, and P. Panda, “Seenn: Towards temporal spiking early-exit neural networks,” 2023. [Online]. Available: https://arxiv.org/abs/2304.01230

work page arXiv 2023

[56] [56]

Optimizing event-driven spiking neural network with regularization and cutoff,

D. Wu, G. Jin, H. Yu, X. Yi, and X. Huang, “Optimizing event-driven spiking neural network with regularization and cutoff,” Frontiers in Neuroscience, vol. 19, Feb. 2025. [Online]. Available: http://dx.doi.org/10.3389/fnins.2025.1522788

work page doi:10.3389/fnins.2025.1522788 2025

[57] [57]

Knowing when to stop: Delay- adaptive spiking neural network classifiers with reliability guarantees,

J. Chen, S. Park, and O. Simeone, “Knowing when to stop: Delay- adaptive spiking neural network classifiers with reliability guarantees,”

work page

[58] [58]

Available: https://arxiv.org/abs/2305.11322

[Online]. Available: https://arxiv.org/abs/2305.11322

work page arXiv

[59] [59]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

work page 2016

[60] [60]

Logic-based edram: Origins and rationale for use,

R. E. Matick and S. E. Schuster, “Logic-based edram: Origins and rationale for use,”IBM Journal of Research and Development, vol. 49, no. 1, pp. 145–165, 2005

work page 2005

[61] [61]

A survey of architectural approaches for managing embedded dram and non-volatile on-chip caches,

S. Mittal, J. S. Vetter, and D. Li, “A survey of architectural approaches for managing embedded dram and non-volatile on-chip caches,”IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 6, pp. 1524–1537, 2014

work page 2014

[62] [62]

A high-performance, high-density 28nm edram technology with high-k/metal-gate,

K. Huang, Y . Ting, C. Chang, K. Tu, K. Tzeng, H. Chu, C. Pai, A. Katoch, W. Kuo, K. Chenet al., “A high-performance, high-density 28nm edram technology with high-k/metal-gate,” in2011 International Electron Devices Meeting. IEEE, 2011, pp. 24–7

work page 2011

[63] [63]

Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,

H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. K. Kim, V . Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,” in2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018, pp. 764–775

work page 2018

[64] [64]

The spinnaker project,

S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, “The spinnaker project,”Proceedings of the IEEE, vol. 102, no. 5, pp. 652–665, 2014

work page 2014

[65] [65]

Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product,

N. Srivastava, H. Jin, J. Liu, D. Albonesi, and Z. Zhang, “Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product,” in2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 766–780

work page 2020

[66] [66]

Cerebras architecture deep dive: First look inside the hard- ware/software co-design for deep learning,

S. Lie, “Cerebras architecture deep dive: First look inside the hard- ware/software co-design for deep learning,”Ieee Micro, vol. 43, no. 3, pp. 18–30, 2023

work page 2023

[67] [67]

Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,

C. Zou, Z. Wei, J. Y . Lee, C. Nie, K. You, and Z. He, “Polymorpic: Em- bedding polymorphic processing-in-cache in risc-v based processor for full-stack efficient ai inference,” in2025 58th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2025

work page 2025

[68] [68]

Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,

C. Nie, C. Tang, J. Lin, H. Hu, C. Lv, T. Cao, W. Zhang, L. Jiang, X. Liang, W. Qian, Y . Sun, and Z. He, “Vspim: Sram processing-in- memory dnn acceleration via vector-scalar operations,”IEEE Transac- tions on Computers, vol. 73, no. 10, pp. 2378–2390, 2024

work page 2024

[69] [69]

Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,

R. Fan, Y . Cui, Q. Chen, M. Wang, Y . Zhang, W. Zheng, and Z. Li, “Maicc: A lightweight many-core architecture with in-cache computing for multi-dnn parallel inference,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 411–423. [Online...

work page doi:10.1145/3613424 2023

[70] [70]

33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac com- puting,

Q. Liu, B. Gao, P. Yao, D. Wu, J. Chen, Y . Pang, W. Zhang, Y . Liao, C.-X. Xue, W.-H. Chenet al., “33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac com- puting,” in2020 IEEE International Solid-State Circuits Conference- (ISSCC). IEEE, 2020, pp. 500–502

work page 2020

[71] [71]

Ir-qnn framework: An ir drop-aware offline training of quantized crossbar arrays,

M. E. Fouda, S. Lee, J. Lee, G. H. Kim, F. Kurdahi, and A. M. Eltawi, “Ir-qnn framework: An ir drop-aware offline training of quantized crossbar arrays,”IEEE Access, vol. 8, pp. 228 392–228 408, 2020

work page 2020

[72] [72]

Rxnn: A framework for evaluating deep neural networks on resistive crossbars,

S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “Rxnn: A framework for evaluating deep neural networks on resistive crossbars,”IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 2, pp. 326–338, 2020

work page 2020

[73] [73]

Spinnaker2: A large-scale neuromorphic system for event-based and asynchronous machine learning,

H. A. Gonzalez, J. Huang, F. Kelber, K. K. Nazeer, T. Langer, C. Liu, M. Lohrmann, A. Rostami, M. Sch¨one, B. V oggingeret al., “Spinnaker2: A large-scale neuromorphic system for event-based and asynchronous machine learning,”arXiv preprint arXiv:2401.04491, 2024

work page arXiv 2024

[74] [74]

Intel builds world’s largest neuromorphic sys- tem to enable more sustainable ai,

Intel Newsroom, “Intel builds world’s largest neuromorphic sys- tem to enable more sustainable ai,” https://newsroom.intel.com/ artificial-intelligence, 2024, accessed: 2026-04-26

work page 2024

[75] [75]

Gustavsnn: Unleashing the power of gustavson’s algorithm on snn acceleration with column-parallel tick-batch dataflow,

S. Hwang, D. Lee, J. Koo, and J. Kung, “Gustavsnn: Unleashing the power of gustavson’s algorithm on snn acceleration with column-parallel tick-batch dataflow,” in2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2026, pp. 1–14. 16

work page 2026

[76] [76]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020. 17

work page internal anchor Pith review Pith/arXiv arXiv 2010