pith. sign in

arxiv: 2605.30814 · v1 · pith:RWYZAFLSnew · submitted 2026-05-29 · 💻 cs.AR

A Reconfigurable Computing In-Memory Macro with Charge-sharing-based Weighted Accumulator

Pith reviewed 2026-06-28 20:42 UTC · model grok-4.3

classification 💻 cs.AR
keywords in-memory computingSRAManalog computingreconfigurable arraycharge-sharing accumulatordual-8T bitcellADC overhead reductionneural network accelerator
0
0 comments X

The pith

A reconfigurable 256x128 SRAM in-memory array integrates a low-overhead ADC, charge-sharing accumulator, and dual-8T bitcells to support 1-7 bit inputs and weights while addressing ADC area, latency, and voltage swing limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to build a multi-bit SRAM analog computing-in-memory macro that overcomes three barriers: large ADC area cost, slow handling of multi-bit inputs, and restricted read bitline voltage range. It does so with three specific additions inside a 256x128 array that can be configured for 1-7b inputs, 2-4b weights, and 1-7b outputs. An in-memory ADC is shown to take only 3 percent area while delivering a 9x area improvement over earlier versions. A charge-sharing weighted accumulator shortens the time needed for multi-bit accumulation. A dual-8T bitcell with decoupled read path and under-driven cascode read wordline raises current linearity and usable bitline voltage. If these elements work together, the macro offers a concrete route to more area-efficient and faster analog in-memory computation for variable-precision workloads.

Core claim

The proposed macro achieves its reconfigurability and performance through an IMADC that occupies only 3 percent area with a 9x improvement over prior IMADCs, a BSCHA that reduces latency by 1.9x relative to PWM and 6.6x relative to bit-slicing, and a dual-8T bitcell that stores ternary weights via a decoupled read path combined with read wordline under-driven cascode, delivering 7x better unit discharge current linearity and 3.5x greater usable read bitline voltage.

What carries the argument

The charge-sharing-based weighted accumulator (BSCHA) paired with the in-memory ADC (IMADC) and the dual-8T bitcell that uses a decoupled read path plus under-driven cascode read wordline to control discharge current.

If this is right

  • The array can be programmed for different input, weight, and output precisions without redesigning the core hardware.
  • Multi-bit input processing completes in less time than either PWM or bit-slicing approaches.
  • The small IMADC leaves most of the die area available for the memory array itself.
  • Ternary weights become practical inside a standard SRAM-style cell layout.
  • Higher usable read bitline voltage allows more reliable summation of discharge currents across the column.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reconfigurability may reduce the need for separate accelerators tuned to each precision level in a single system.
  • The cascode read technique could be examined for compatibility with other bitcell topologies that also suffer from nonlinear discharge.
  • If the area and latency numbers scale with array size, larger macros might still keep the same relative overhead.
  • The design leaves open the question of how the same techniques perform when the array is embedded inside a larger digital SoC with shared power rails.

Load-bearing premise

The measured area, latency, linearity, and voltage gains remain valid when process variation, temperature, and supply noise are present and without large unstated penalties in power or yield.

What would settle it

Silicon measurements on a fabricated chip that show the IMADC area rising above 3 percent of the macro or the BSCHA latency reduction dropping below the stated 1.9x and 6.6x factors under standard operating conditions.

Figures

Figures reproduced from arXiv: 2605.30814 by Arindam Basu, Hongyang Shang, Junyi Yang, Shuai Dong, Zhengnan Fu.

Figure 1
Figure 1. Figure 1: System latency and ADC area overhead: (a) System latency (Our [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Hardware structure of the proposed dual 8T SRAM CIM and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Detailed circuit and timing diagram of MAC columns and IMADC. (b) Internal signals of SA. (c) IMADC principle. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Schematic of RWLUDC. (b) Voltage swings and current variations [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Self-biased voltage buffer (a) Circuit Diagram. (b) Monte Carlo [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Method for multi-bit weight with two cases (3-bit and 4-bit weights). [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: ADC error distribution (ADC error= (simulated ADC output [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Effective circuit model for generation of differential MAC voltage [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (a) Monte Carlo simulation for buffer noise. (b) Monte Carlo [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: (a) SA schematic. (b) Monte Carlo simulation for SA noise. (c) [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Post layout simulation versus theoretical 4-bit IMADC output across temperature variations (0°C, 70°C, 27°C) and different process corners (TT, [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The inference accuracy results with hardware-simulated ADC noise [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Weight sparsity result across each layer for VGG-8 on CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Throughput under different input, output, and weight resolutions. [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: The partial MACP/N distribution of (a) Proposed method and (b) PWM mode based on 3-bit input for MNIST. Their corresponding voltages on [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: (a) Energy breakdown (4-bit input/output and 2-bit weight) based [PITH_FULL_IMAGE:figures/full_fig_p012_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: System throughput and energy efficiency comparison with previous [PITH_FULL_IMAGE:figures/full_fig_p013_18.png] view at source ↗
read the original abstract

SRAM-based analog computing-in-memory demonstrates outstanding efficiency. However, it faces three critical challenges: significant ADC overhead, high latency for multi-bit inputs, and limited read bitline voltage. To address these issues, this work proposes a multi-bit highly reconfigurable 256x128 in-memory computing array supporting 1-7b input, 2-4b weight, and 1-7b output. Three key innovations are introduced: 1) The IMADC occupies only 3% area overhead, achieving a 9x improvement compared to previous IMADC; 2) The BSCHA reduces latency by 1.9x and 6.6x compared to traditional pulse-width modulation (PWM) and bit-slicing modes, respectively; 3) A dual-8T bitcell enabling ternary weight storage through a decoupled read path, integrated with a read wordline under-driven cascode technique, improves linearity of unit discharge current by 7x and increases the usable read bitline voltage by 3.5x.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript presents a reconfigurable 256x128 SRAM-based analog computing-in-memory array supporting 1-7b inputs, 2-4b weights, and 1-7b outputs. It introduces three main contributions: an IMADC with 3% area overhead achieving 9x improvement over prior designs, a BSCHA reducing latency by 1.9x versus PWM and 6.6x versus bit-slicing modes, and a dual-8T bitcell with decoupled read path plus read-wordline under-driven cascode that improves unit discharge current linearity by 7x and usable RBL voltage range by 3.5x.

Significance. If the reported gains hold, the work meaningfully advances CiM hardware by directly tackling ADC area, multi-bit latency, and RBL voltage limitations through concrete circuit techniques. The explicit quantitative comparisons to established baselines (PWM, bit-slicing, prior IMADCs) and the parameter choices that produce the stated improvements provide clear, falsifiable benchmarks for the community.

minor comments (1)
  1. [Abstract] Abstract: the quantitative claims would be easier to assess if the abstract briefly indicated the simulation conditions or baseline references used for the 9x, 1.9x, 6.6x, 7x, and 3.5x figures.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on the reconfigurable 256x128 CIM array, the IMADC, BSCHA accumulator, and dual-8T bitcell, and for recommending minor revision. The referee's description accurately captures the claimed improvements in ADC area, latency, and RBL linearity/voltage range.

Circularity Check

0 steps flagged

No significant circularity; hardware claims rest on explicit design comparisons

full rationale

The manuscript describes a reconfigurable SRAM-based in-memory computing macro with three circuit-level innovations (IMADC, BSCHA, dual-8T bitcell with under-driven cascode). All quantitative claims (3% area, 9x improvement, 1.9x/6.6x latency reductions, 7x linearity, 3.5x voltage range) are presented as outcomes of concrete implementation choices and direct comparisons to prior PWM/bit-slicing baselines. No equations, fitted parameters, predictions, or first-principles derivations appear; the work contains no self-definitional loops, fitted-input-as-prediction steps, or load-bearing self-citations that reduce the central claims to their own inputs. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Hardware circuit design paper; the abstract contains no free parameters, mathematical axioms, or newly postulated physical entities.

pith-pipeline@v0.9.1-grok · 5723 in / 1332 out tokens · 32982 ms · 2026-06-28T20:42:38.354805+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Towards high-quality and efficient video super-resolution via spatial- temporal data overfitting,

    G. Li, J. Ji, M. Qin, W. Niu, B. Ren, F. Afghah, L. Guo, and X. Ma, “Towards high-quality and efficient video super-resolution via spatial- temporal data overfitting,” in2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023, pp. 10 259– 10 269

  2. [2]

    Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,

    G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V . Vanhoucke, P. Nguyen, T. N. Sainathet al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,”IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, 2012

  3. [3]

    Quantifying the knowledge in a DNN to explain knowledge distillation for classification,

    Q. Zhang, X. Cheng, Y . Chen, and Z. Rao, “Quantifying the knowledge in a DNN to explain knowledge distillation for classification,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 5099–5113, 2022

  4. [4]

    A 28-nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs,

    A. Guo, C. Xi, F. Dong, X. Pu, D. Li, J. Zhang, X. Dong, H. Gao, Y . Zhang, B. Wanget al., “A 28-nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs,”IEEE Journal of Solid-State Circuits, vol. 59, no. 9, pp. 3032–3044, 2024

  5. [5]

    A 28-nm 50.1-TOPS/W P-8T SRAM compute-in-memory macro design with BL charge-sharing-based in- SRAM DAC/ADC operations,

    K. Lee, J. Kim, and J. Park, “A 28-nm 50.1-TOPS/W P-8T SRAM compute-in-memory macro design with BL charge-sharing-based in- SRAM DAC/ADC operations,”IEEE Journal of Solid-State Circuits, vol. 59, no. 6, pp. 1926–1937, 2023

  6. [6]

    Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,

    Y .-H. Chen, T.-J. Yang, J. Emer, and V . Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019

  7. [7]

    Efficient nonlinear function ap- proximation in analog resistive crossbars for recurrent neural networks,

    J. Yang, R. Mao, M. Jiang, Y . Cheng, P.-S. V . Sun, S. Dong, G. Pedretti, X. Sheng, J. Ignowski, H. Liet al., “Efficient nonlinear function ap- proximation in analog resistive crossbars for recurrent neural networks,” Nature Communications, vol. 16, no. 1, p. 1136, 2025

  8. [8]

    A 2941-TOPS/W charge-domain 10T SRAM compute-in-memory for ternary neural network,

    S. Cheon, K. Lee, and J. Park, “A 2941-TOPS/W charge-domain 10T SRAM compute-in-memory for ternary neural network,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 5, pp. 2085–2097, 2023

  9. [9]

    34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog- storage quantizers for transformer and cnns,

    A. Guo, X. Chen, F. Dong, J. Chen, Z. Yuan, X. Hu, Y . Zhang, J. Zhang, Y . Tang, Z. Zhanget al., “34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog- storage quantizers for transformer and cnns,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), vol. 67. IEEE, 2024, pp. 570– 572

  10. [10]

    A 22 nm 10.03-237.99 TOPS/W time-digital-hybrid SRAM compute-in-memory AI accelerator for GNN edge device applications,

    C.-J. Jhang, W.-S. Khwa, P.-C. Wu, A. S. Lele, P.-S. Wu, C.-E. Ke, T.-C. Chiu, Y .-C. Hung, W.-T. Hsu, J.-M. Hsuet al., “A 22 nm 10.03-237.99 TOPS/W time-digital-hybrid SRAM compute-in-memory AI accelerator for GNN edge device applications,”IEEE Transactions on Circuits and Systems for Artificial Intelligence, vol. 1, no. 1, pp. 15–25, 2024

  11. [11]

    A twin-8T SRAM computation-in- memory unit-macro for multibit CNN-based AI edge processors,

    X. Si, J.-J. Chen, Y .-N. Tu, W.-H. Huang, J.-H. Wang, Y .-C. Chiu, W.-C. Wei, S.-Y . Wu, X. Sun, R. Liuet al., “A twin-8T SRAM computation-in- memory unit-macro for multibit CNN-based AI edge processors,”IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 189–202, 2019

  12. [12]

    A 33.6–136.2-TOPS/W Nonlinear Analog Computing-in-Memory Macro for Multi-Bit LSTM Accelerator in 65-nm CMOS,

    J. Yang, X. Luo, Y . Ke, Z. Wang, H. Shang, S. Dong, Z. Fu, X. Yang, H. Liu, and A. Basu, “A 33.6–136.2-TOPS/W Nonlinear Analog Computing-in-Memory Macro for Multi-Bit LSTM Accelerator in 65-nm CMOS,”IEEE Journal of Solid-State Circuits, 2025

  13. [14]

    An overview of computing-in-memory circuits with DRAM and NVM,

    S. Kim and H.-J. Yoo, “An overview of computing-in-memory circuits with DRAM and NVM,”IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 71, no. 3, pp. 1626–1631, 2023

  14. [15]

    In-memory computation of a machine-learning classifier in a standard 6T SRAM array,

    J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,”IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915–924, 2017. 14

  15. [16]

    Macc-sram: A multistep accumu- lation capacitor-coupling in-memory computing sram macro for deep convolutional neural networks,

    B. Zhang, J. Saikia, J. Meng, D. Wang, S. Kwon, S. Myung, H. Kim, S. J. Kim, J.-S. Seo, and M. Seok, “Macc-sram: A multistep accumu- lation capacitor-coupling in-memory computing sram macro for deep convolutional neural networks,”IEEE Journal of Solid-State Circuits, vol. 59, no. 6, pp. 1938–1949, 2023

  16. [17]

    a 65nm 3T dynamic analog RAM- based computing-in-memory macro and CNN accelerator with retention enhancement, adaptive analog sparsity and 44TOPS/W system energy efficiency,

    Z. Chen, X. Chen, and J. Gu, “a 65nm 3T dynamic analog RAM- based computing-in-memory macro and CNN accelerator with retention enhancement, adaptive analog sparsity and 44TOPS/W system energy efficiency,” in2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64. IEEE, 2021, pp. 240–242

  17. [18]

    An overview of processing-in-memory circuits for artificial intelligence and machine learning,

    D. Kim, C. Yu, S. Xie, Y . Chen, J.-Y . Kim, B. Kim, J. P. Kulkarni, and T. T.-H. Kim, “An overview of processing-in-memory circuits for artificial intelligence and machine learning,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 12, no. 2, pp. 338–353, 2022

  18. [19]

    SRAM-based in-memory computing macro featuring voltage-mode accumulator and row-by-row ADC for processing neural networks,

    J. Mu, H. Kim, and B. Kim, “SRAM-based in-memory computing macro featuring voltage-mode accumulator and row-by-row ADC for processing neural networks,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 6, pp. 2412–2422, 2022

  19. [20]

    A 64-tile 2.4- Mb in-memory-computing CNN accelerator employing charge-domain compute,

    H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A 64-tile 2.4- Mb in-memory-computing CNN accelerator employing charge-domain compute,”IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1789– 1799, 2019

  20. [21]

    A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks,

    C. Yu, T. Yoo, K. T. C. Chai, T. T.-H. Kim, and B. Kim, “A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks,”IEEE Journal of Solid-State Circuits, vol. 57, no. 11, pp. 3466–3476, 2022

  21. [22]

    Neuro-CIM: ADC-less neuromorphic computing-in-memory processor with operation gating/stopping and digital–analog networks,

    S. Kim, S. Kim, S. Um, S. Kim, K. Kim, and H.-J. Yoo, “Neuro-CIM: ADC-less neuromorphic computing-in-memory processor with operation gating/stopping and digital–analog networks,”IEEE Journal of Solid- State Circuits, vol. 58, no. 10, pp. 2931–2945, 2023

  22. [23]

    High Energy-efficiency and Low latency In-Memory Computing using Analog Accumulator and In-Memory ADC with shared References,

    J. Yang, S. Dong, Z. Fu, H. Shang, and A. Basu, “High Energy-efficiency and Low latency In-Memory Computing using Analog Accumulator and In-Memory ADC with shared References,” in2025 62nd ACM/IEEE Design Automation Conference (DAC). IEEE, 2025, pp. 1–7

  23. [24]

    A 1-16b reconfig- urable 80Kb 7T SRAM-based digital near-memory computing macro for processing neural networks,

    H. Kim, J. Mu, C. Yu, T. T.-H. Kim, and B. Kim, “A 1-16b reconfig- urable 80Kb 7T SRAM-based digital near-memory computing macro for processing neural networks,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 4, pp. 1580–1590, 2023

  24. [25]

    SRAM with In-Memory Inference and 90% Bitline Activity Reduction for Always-On Sensing with 109 TOPS/mm 2 and 749-1,459 TOPS/W in 28nm,

    V . K. Rajanna, S. Taneja, and M. Alioto, “SRAM with In-Memory Inference and 90% Bitline Activity Reduction for Always-On Sensing with 109 TOPS/mm 2 and 749-1,459 TOPS/W in 28nm,” inESSCIRC 2021-IEEE 47th European Solid State Circuits Conference (ESSCIRC). IEEE, 2021, pp. 127–130

  25. [26]

    Process-Variation-Aware In-Memory Computation With Improved Linearity Using On-Chip Configurable Current-Steering Thermometric DAC,

    P. K. Saragada and B. P. Das, “Process-Variation-Aware In-Memory Computation With Improved Linearity Using On-Chip Configurable Current-Steering Thermometric DAC,”IEEE Transactions on Circuits and Systems I: Regular Papers, 2024

  26. [27]

    Impact of aging and process variability on SRAM-based in-memory computing architectures,

    J. B. Shaik, X. Guo, and S. Singhal, “Impact of aging and process variability on SRAM-based in-memory computing architectures,”IEEE Transactions on Circuits and Systems I: Regular Papers, 2024

  27. [28]

    A Dual 7T SRAM-Based Zero-Skipping Compute-In-Memory Macro With 1-6b Binary Searching ADCs for Processing Quantized Neural Networks,

    C. Yu, H. Jiang, J. Mu, K. T. C. Chai, T. T.-H. Kim, and B. Kim, “A Dual 7T SRAM-Based Zero-Skipping Compute-In-Memory Macro With 1-6b Binary Searching ADCs for Processing Quantized Neural Networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, 2024

  28. [29]

    34.9 a flash-SRAM-ADC-fused plastic computing-in- memory macro for learning in neural networks in a standard 14nm FinFET process,

    L. Wang, W. Li, Z. Zhou, H. Gao, Z. Li, W. Ye, H. Hu, J. Liu, J. Yue, J. Yanget al., “34.9 a flash-SRAM-ADC-fused plastic computing-in- memory macro for learning in neural networks in a standard 14nm FinFET process,” in2024 IEEE International Solid-State Circuits Con- ference (ISSCC), vol. 67. IEEE, 2024, pp. 582–584

  29. [30]

    Topkima-Former: Low-Energy, Low-Latency Inference for Transformers Using Top-k In-Memory ADC,

    S. Dong, J. Yang, X. Peng, H. Shang, Y . Ke, X. Yang, H. Liu, and A. Basu, “Topkima-Former: Low-Energy, Low-Latency Inference for Transformers Using Top-k In-Memory ADC,”IEEE Transactions on Circuits and Systems I: Regular Papers, 2025

  30. [31]

    Hybrid SRAM/ROM Compute-in-Memory Architecture for High Task-Level Energy Efficiency in Transformer Models With 8928-kb/mm 2 Density in 28nm CMOS,

    G. Yin, Y . Chen, M. Lee, X. Du, Y . Ke, W. Tang, Z. Chen, M. Zhou, J. Yue, H. Yanget al., “Hybrid SRAM/ROM Compute-in-Memory Architecture for High Task-Level Energy Efficiency in Transformer Models With 8928-kb/mm 2 Density in 28nm CMOS,”IEEE Journal of Solid-State Circuits, 2025

  31. [32]

    Cramming More Weight Data Onto Compute-in-Memory Macros for High Task-Level Energy Efficiency Using Custom ROM With 3984-kb/mm 2 Density in 65-nm CMOS,

    G. Yin, Y . Chen, M. Zhou, W. Tang, M. Lee, Z. Yang, T. Liao, X. Du, V . Narayanan, H. Yanget al., “Cramming More Weight Data Onto Compute-in-Memory Macros for High Task-Level Energy Efficiency Using Custom ROM With 3984-kb/mm 2 Density in 65-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 59, no. 6, pp. 1912–1925, 2023

  32. [33]

    A charge domain SRAM compute-in-memory macro with C-2C ladder- based 8-bit MAC unit in 22-nm FinFET process for edge inference,

    H. Wang, R. Liu, R. Dorrance, D. Dasalukunte, D. Lake, and B. Carlton, “A charge domain SRAM compute-in-memory macro with C-2C ladder- based 8-bit MAC unit in 22-nm FinFET process for edge inference,” IEEE Journal of Solid-State Circuits, vol. 58, no. 4, pp. 1037–1050, 2023

  33. [34]

    In-memory computing in emerging memory technologies for machine learning: An overview,

    K. Roy, I. Chakraborty, M. Ali, A. Ankit, and A. Agrawal, “In-memory computing in emerging memory technologies for machine learning: An overview,” in2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6

  34. [35]

    A charge-sharing based 8T SRAM In-Memory Computing for edge DNN acceleration,

    K. Lee, S. Cheon, J. Jo, W. Choi, and J. Park, “A charge-sharing based 8T SRAM In-Memory Computing for edge DNN acceleration,” in2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021, pp. 739–744

  35. [36]

    C3SRAM: An in-memory- computing SRAM macro based on robust capacitive coupling computing mechanism,

    Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-memory- computing SRAM macro based on robust capacitive coupling computing mechanism,”IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897, 2020

  36. [37]

    Cadc: Crossbar- aware dendritic convolution for efficient in-memory computing,

    S. Dong, J. Yang, Y . Ke, H. Shang, and A. Basu, “Cadc: Crossbar- aware dendritic convolution for efficient in-memory computing,” in2026 31st Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2026, pp. 667–673

  37. [38]

    A 351 TOPS/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine- learning applications,

    Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W.-S. Khwa, H.-J. Liao, Y . Wang, and J. Chang, “A 351 TOPS/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine- learning applications,” in2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020, pp. 242–244

  38. [39]

    A 42 pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with on-chip training,

    S. K. Gonugondla, M. Kang, and N. Shanbhag, “A 42 pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with on-chip training,” in2018 IEEE International Solid-State Circuits Conference- (ISSCC). IEEE, 2018, pp. 490–492

  39. [40]

    Challenges and trends of SRAM-based computing-in-memory for AI edge devices,

    C.-J. Jhang, C.-X. Xue, J.-M. Hung, F.-C. Chang, and M.-F. Chang, “Challenges and trends of SRAM-based computing-in-memory for AI edge devices,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 5, pp. 1773–1786, 2021

  40. [41]

    Ternary weight networks,

    F. Li, B. Liu, X. Wang, B. Zhang, and J. Yan, “Ternary weight networks,” arXiv preprint arXiv:1605.04711, 2016

  41. [42]

    Mitigating methodology of hardware non-ideal characteristics for non-volatile memory based neural networks,

    L. Han, P. Huang, Y . Wang, Z. Zhou, H. Yang, Y . Chen, X. Liu, and J. Kang, “Mitigating methodology of hardware non-ideal characteristics for non-volatile memory based neural networks,”Science China Infor- mation Sciences, vol. 68, no. 2, p. 122403, 2025

  42. [43]

    XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,

    S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,”IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, 2020

  43. [44]

    A backpropagation with gradient accumulation algorithm capable of tolerating memristor non-idealities for training memristive neural networks,

    S. Dong, Y . Chen, Z. Fan, K. Chen, M. Qin, M. Zeng, X. Lu, G. Zhou, X. Gao, and J.-M. Liu, “A backpropagation with gradient accumulation algorithm capable of tolerating memristor non-idealities for training memristive neural networks,”Neurocomputing, vol. 494, pp. 89–103, 2022

  44. [45]

    E and Holberg, D.,CMOS Analog Circuit Design

    Allen, P. E and Holberg, D.,CMOS Analog Circuit Design. Oxford University Press, 2011

  45. [46]

    Pseudo asynchronous level crossing ADC for ECG signal acquisition,

    T. Marisa, T. Niederhauser, A. Haeberlin, R. A. Wildhaber, R. V ogel, J. Goette, and M. Jacomet, “Pseudo asynchronous level crossing ADC for ECG signal acquisition,”IEEE transactions on biomedical circuits and systems, vol. 11, no. 2, pp. 267–278, 2017

  46. [47]

    Graph attention networks,

    P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y . Bengio et al., “Graph attention networks,”stat, vol. 1050, no. 20, pp. 10–48 550, 2017

  47. [48]

    A 40nm analog-input ADC- free compute-in-memory RRAM macro with pulse-width modulation between sub-arrays,

    H. Jiang, W. Li, S. Huang, and S. Yu, “A 40nm analog-input ADC- free compute-in-memory RRAM macro with pulse-width modulation between sub-arrays,” in2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). IEEE, 2022, pp. 266–267

  48. [49]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  49. [50]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826

  50. [51]

    Benchmarking monolithic 3D integration for compute-in- memory accelerators: overcoming ADC bottlenecks and maintaining scalability to 7nm or beyond,

    X. Peng, W. Chakraborty, A. Kaul, W. Shim, M. S. Bakir, S. Datta, and S. Yu, “Benchmarking monolithic 3D integration for compute-in- memory accelerators: overcoming ADC bottlenecks and maintaining scalability to 7nm or beyond,” in2020 IEEE International Electron Devices Meeting (IEDM). IEEE, 2020, pp. 30–4

  51. [52]

    ENNA: An efficient neural network accelerator design based on ADC-free compute-in-memory subarrays,

    H. Jiang, S. Huang, W. Li, and S. Yu, “ENNA: An efficient neural network accelerator design based on ADC-free compute-in-memory subarrays,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 1, pp. 353–363, 2022. 15

  52. [53]

    NeuC-CIM: A 1.3 pJ/SOP Neuromorphic Charge-Domain Compute-in-Memory Macro for Spiking Neural Net- work,

    H. Fu, H. Zheng, Y . Zhou, X. Wen, Y . Chen, H. Ren, X. Lin, Z. Zong, L. Wu, and B. Cheng, “NeuC-CIM: A 1.3 pJ/SOP Neuromorphic Charge-Domain Compute-in-Memory Macro for Spiking Neural Net- work,” in2025 Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). IEEE, 2025, pp. 1–3

  53. [54]

    A 5.1 pJ/neuron 127.3 us/inference RNN- based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS,

    R. Guo, Y . Liu, S. Zheng, S.-Y . Wu, P. Ouyang, W.-S. Khwa, X. Chen, J.- J. Chen, X. Li, L. Liuet al., “A 5.1 pJ/neuron 127.3 us/inference RNN- based speech recognition processor using 16 computing-in-memory SRAM macros in 65nm CMOS,” in2019 Symposium on VLSI Circuits. IEEE, 2019, pp. C120–C121

  54. [55]

    DNN+ NeuroSim V2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training,

    X. Peng, S. Huang, H. Jiang, A. Lu, and S. Yu, “DNN+ NeuroSim V2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training,”IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, vol. 40, no. 11, pp. 2306–2319, 2020

  55. [56]

    ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,

    A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Stra- chan, M. Hu, R. S. Williams, and V . Srikumar, “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016

  56. [57]

    PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference,

    A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-m. W. Hwu, J. P. Strachan, K. Royet al., “PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference,” inProceedings of the twenty-fourth international conference on architectural support for programming languages and opera...

  57. [58]

    FPSA: A full system stack solution for reconfigurable ReRAM- based NN accelerator architecture,

    Y . Ji, Y . Zhang, X. Xie, S. Li, P. Wang, X. Hu, Y . Zhang, and Y . Xie, “FPSA: A full system stack solution for reconfigurable ReRAM- based NN accelerator architecture,” inProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019, pp. 733–747

  58. [59]

    A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references,

    W. Li, X. Sun, S. Huang, H. Jiang, and S. Yu, “A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references,”IEEE Journal of Solid- State Circuits, vol. 57, no. 9, pp. 2868–2877, 2022