pith. sign in

arxiv: 2606.30039 · v1 · pith:XPEFWVB3new · submitted 2026-06-29 · 💻 cs.AR

Mega: A 22 nm Convolutional Spiking Neural Network Accelerator Achieving 0.375 pJ/SOP for Efficient Edge Vision

Pith reviewed 2026-06-30 04:17 UTC · model grok-4.3

classification 💻 cs.AR
keywords convolutional spiking neural networksSNN acceleratorenergy efficiency22 nm FDSOIedge visionspike detectionparallel convolutionunified memory
0
0 comments X

The pith

Mega delivers 0.375 pJ per synaptic operation in a 22 nm convolutional SNN accelerator, four times better than prior designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Mega, a fabricated digital chip for convolutional spiking neural networks aimed at low-power edge vision. It identifies that current accelerators miss the parallelism available in 3x3 convolutions and cannot easily adapt memory use to changing sparsity and data needs across layers. The design counters this with parallel convolution units, one shared memory holding spikes, states and weights, and lightweight spike detection logic. When built in GlobalFoundries 22 nm FDSOI, the chip reaches the stated efficiency figure. A reader interested in hardware for sparse event-driven networks would therefore expect concrete gains in energy per operation for vision workloads.

Core claim

Mega is a digital architecture for convolutional SNNs that uses highly parallel acceleration of 3×3 convolutions, a unified data memory for spikes, neuron states, and weights, and efficient spike map processing with low-overhead spike detection. Fabricated in GlobalFoundries 22 nm FDSOI technology, it achieves an energy efficiency of 0.375 pJ/SOP, improving the state of the art by 4×.

What carries the argument

The combination of parallel 3×3 convolution engines, a single unified memory array for spikes/neuron states/weights, and low-overhead spike detection logic that together support varying layer sparsity and memory footprints.

If this is right

  • Convolutional layers can be processed with higher throughput per watt than in earlier SNN accelerators.
  • A single memory structure suffices for spikes, states, and weights without separate banks for each.
  • Spike detection overhead stays low enough to preserve the efficiency benefit of sparse event-driven computation.
  • The measured 0.375 pJ/SOP figure holds across typical convolutional SNN vision workloads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same memory-sharing and spike-handling ideas could be tested on networks that use larger or non-square kernels.
  • End-to-end system power for a full camera-plus-classifier pipeline might drop further once the accelerator is paired with an event-based sensor.
  • Designers of future edge chips could adopt the unified-memory pattern to reduce area overhead when supporting multiple data types.

Load-bearing premise

Existing SNN accelerators underutilize convolutional parallelism and cannot flexibly handle different memory demands and input sparsity across layers.

What would settle it

Fabrication and measurement of Mega or an equivalent design in the same 22 nm process showing energy above 0.375 pJ/SOP or no 4× gain over the cited prior accelerators.

Figures

Figures reproduced from arXiv: 2606.30039 by Manil Dev Gomony, Rick Luiken, Sander Stuijk.

Figure 1
Figure 1. Figure 1: Challenges and Solutions. The left side illustrates common challenges encountered when accelerating convolutional SNNs, while the right side presents our corresponding solutions. II. HARDWARE ARCHITECTURE [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mega architecture diagram. Mega is built around nine compute clusters that compute 3 × 3 convolutions in parallel. Each cluster updates 32 neuron states in parallel using 32 convolution units (CUs). Spikes, weights, and neuron states are fetched from the integrated SNAX cluster [2]. The spike streamer converts dense spike maps into sparse spike addresses, which are forwarded to the compute clusters. 1) Spi… view at source ↗
Figure 3
Figure 3. Figure 3: Neuron state memory organization. Neuron states are distributed across nine SRAM banks. The (x, y) coordinate of a state is transformed into (x ′ , y′ , bx, by), as shown on the left. The bank indices (bx, by) select one of the nine banks, while (x ′ , y′ ) defines the word address within the bank. Each memory word stores 32 neuron states corresponding to 32 output channels. corresponding to 32 output chan… view at source ↗
Figure 4
Figure 4. Figure 4: Chip micrograph and measurement setup TABLE I CHIP SUMMARY Technology 22 nm FDSOI CMOS Area 1.12 mm2 Supply voltage 0.55 V - 1.1 V Frequency 155 MHz - 600 MHz Network type Convolutional SNN Weight/State precision INT4/INT8 0.55 V @ 155 MHz 1.1 V @ 600 MHz Power consumption 14.4 mW 144.7 mW Energy efficiency 0.375 pJ/SOP 0.978 pJ/SOP Throughput 38.4 GSOP/s 148.7 GSOP/s to compute x ′ and bx. Therefore, we e… view at source ↗
Figure 7
Figure 7. Figure 7: SNN architecture used for the DVS gestures task. [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Energy efficiency and performance tradeoff. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
read the original abstract

Convolutional Spiking Neural Networks (SNN) offer the potential for highly energy-efficient vision processing by exploiting sparse, event-driven computation. However, existing SNN accelerators underutilize the inherent parallelism of convolutional layers and lack the flexibility to accommodate varying memory demands and input sparsity across layers. This paper presents Mega, a digital architecture for convolutional SNNs that addresses these limitations through three key contributions: (1) highly parallel acceleration of $3 \times 3$ convolutions, (2) a unified data memory for spikes, neuron states, and weights, and (3) efficient spike map processing with low-overhead spike detection. Fabricated in GlobalFoundries 22 nm FDSOI technology, Mega achieves an energy efficiency of 0.375 pJ/SOP, improving the state of the art by $4\times$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents Mega, a digital convolutional spiking neural network accelerator fabricated in GlobalFoundries 22 nm FDSOI technology. It identifies limitations in existing SNN accelerators regarding parallelism in convolutional layers and flexibility for varying memory and sparsity demands, then proposes three contributions: highly parallel acceleration of 3×3 convolutions, a unified data memory for spikes/neuron states/weights, and efficient spike map processing with low-overhead spike detection. The central result is a measured energy efficiency of 0.375 pJ/SOP, stated to improve the state of the art by 4×.

Significance. A well-documented post-fabrication measurement at this efficiency level, with transparent test conditions and comparisons, would represent a meaningful advance for energy-efficient edge vision hardware using convolutional SNNs, particularly by addressing underutilized parallelism and layer-specific flexibility.

major comments (1)
  1. [Abstract] Abstract: The central claim of 0.375 pJ/SOP energy efficiency (with 4× SOTA improvement) is presented without any details on test conditions, workload, baseline comparisons, error bars, or data exclusion criteria. This information is load-bearing for verifying the measured result on fabricated silicon.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of our post-fabrication measurements. We respond point-by-point to the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of 0.375 pJ/SOP energy efficiency (with 4× SOTA improvement) is presented without any details on test conditions, workload, baseline comparisons, error bars, or data exclusion criteria. This information is load-bearing for verifying the measured result on fabricated silicon.

    Authors: We agree that the abstract would be strengthened by including concise context on the measurement conditions. The manuscript body (Sections 4 and 5) details the test setup, workload (specific convolutional SNN models and datasets for edge vision), baseline comparisons, and measurement methodology. To address this concern directly, we will revise the abstract to incorporate a brief statement on the workload and test conditions used to obtain the 0.375 pJ/SOP figure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central claim is post-fabrication measurement

full rationale

The paper reports a fabricated 22 nm FDSOI chip and its measured energy efficiency (0.375 pJ/SOP, 4× SOTA). The abstract and description list three architectural contributions at a high level but supply no equations, fitted parameters, derivations, or self-referential definitions. No load-bearing self-citations, ansatzes, or uniqueness theorems appear. The result is an empirical outcome from silicon, not a chain that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, parameters, or explicit assumptions beyond standard CMOS fabrication and measurement practices; no free parameters, axioms, or invented entities are visible.

pith-pipeline@v0.9.1-grok · 5684 in / 1110 out tokens · 48625 ms · 2026-06-30T04:17:42.440330+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references

  1. [1]

    A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,

    P. Lichtsteineret al., “A 128×128 120 db 15µs latency asynchronous temporal contrast vision sensor,” JSSC, 2008

  2. [2]

    An open-source HW-SW Co-development frame- work enabling efficient multi-accelerator systems,

    R. A. Antonioet al., “An open-source HW-SW Co-development frame- work enabling efficient multi-accelerator systems,” ISLPED, 2025

  3. [3]

    Neuromorphic LIF row-by-row multicon- volution processor for FPGA,

    R. Tapiador-Moraleset al., “Neuromorphic LIF row-by-row multicon- volution processor for FPGA,” TBioCAS, 2019

  4. [4]

    Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks,

    J. Sommeret al., “Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks,” TCADICS, 2022

  5. [5]

    ReckOn: A 28nm Sub-mm 2 Task-Agnostic Spiking Recurrent Neural Network Processor Enabling On-Chip Learning over Second-Long Timescales,

    C. Frenkelet al., “ReckOn: A 28nm Sub-mm 2 Task-Agnostic Spiking Recurrent Neural Network Processor Enabling On-Chip Learning over Second-Long Timescales,” ISSCC, 2022

  6. [6]

    ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neu- ral Network Processor Enabling Sub-0.1µJ/Sample On-Chip Learning for Edge-AI Applications,

    J. Zhanget al., “ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neu- ral Network Processor Enabling Sub-0.1µJ/Sample On-Chip Learning for Edge-AI Applications,” ISSCC, 2023

  7. [7]

    A 22nm 0.26nW/Synapse Spike-Driven Spiking Neural Network Processing Unit Using Time-Step-First Dataflow and Sparsity- Adaptive In-Memory Computing,

    Y . Liuet al., “A 22nm 0.26nW/Synapse Spike-Driven Spiking Neural Network Processing Unit Using Time-Step-First Dataflow and Sparsity- Adaptive In-Memory Computing,” ISSCC, 2024

  8. [8]

    A 65nm 5 TOPS/W digital CIM accelerator with reconfigurable precision and temporal pipelining for spiking neural networks,

    D. Sharmaet al., “A 65nm 5 TOPS/W digital CIM accelerator with reconfigurable precision and temporal pipelining for spiking neural networks,” ESSERC, 2025

  9. [9]

    SpikeRAM: A 48.1pW/Synapse/Bit Event-Driven Spiking Compute-Near/In-Memory Processor with Neuromorphic Sensor En- abling Life-Long On-Chip Learning,

    H. Fuet al., “SpikeRAM: A 48.1pW/Synapse/Bit Event-Driven Spiking Compute-Near/In-Memory Processor with Neuromorphic Sensor En- abling Life-Long On-Chip Learning,” ISSCC, 2026

  10. [10]

    A low power, fully event-based gesture recognition system,

    A. Amiret al., “A low power, fully event-based gesture recognition system,” CVPR, 2017