Recognition: unknown
Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures
Pith reviewed 2026-05-08 06:55 UTC · model grok-4.3
The pith
An architecture integrates stochastic computing directly into MTJ memory arrays to enable fully parallel bit-stream generation and arithmetic without external random number generators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that leveraging the inherent stochasticity and write-read characteristics of MTJ devices enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams within the memory arrays, eliminating external random number generation circuitry. These bit-streams are then processed by integrated parallel stochastic arithmetic units for efficient computation of arithmetic and transcendental functions, with outputs reusable or convertible back to binary form using parallel accumulation mechanisms.
What carries the argument
The MTJ-based memory augmented with logic-in-memory capabilities that performs parallel stochastic bit-stream generation and arithmetic directly in the storage fabric.
If this is right
- Core arithmetic and transcendental functions can be implemented with minimal hardware complexity and inherent noise tolerance.
- Stochastic outputs can be reused as inputs for further processing or converted back to binary using parallel accumulation and stored in MTJ memory.
- Memory-level parallelism is maximized by performing generation, computation, and storage in a unified fabric.
- Data movement overhead is substantially minimized compared to conventional von Neumann architectures.
Where Pith is reading between the lines
- Similar in-memory stochastic techniques could extend to other emerging memory technologies if the MTJ integration proves viable.
- The design may suit approximate computing tasks such as neural network training where noise tolerance is already accepted.
- Net energy savings would require system-level simulations comparing against GPU-based stochastic implementations.
Load-bearing premise
That MTJ memory arrays can be practically augmented with integrated logic-in-memory capabilities to support parallel stochastic arithmetic units while preserving storage functionality.
What would settle it
A hardware prototype or detailed simulation showing that the energy cost of adding logic-in-memory to MTJ arrays exceeds the savings from eliminating external RNG circuitry or prevents full parallelism.
Figures
read the original abstract
Today's high-performance architectures are increasingly constrained by data movement latency and energy overhead, as the slowdown of single-core performance scaling coincides with the rise of highly data-intensive workloads. In-memory architectures have emerged as a complementary solution to conventional von Neumann systems by alleviating memory bandwidth bottlenecks, exploiting massive concurrency, and mitigating excessive data movement between memory and processing units. This study proposes a parallel in-memory stochastic computing (SC) architecture that implements an end-to-end computation pipeline within Magnetic Tunnel Junction (MTJ)-based memory augmented with logic-in-memory (LIM) capabilities. By leveraging the inherent stochasticity and write-read characteristics of MTJ devices, the proposed architecture enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams, eliminating the need for energy-intensive external random number generation circuitry. These bit-streams are processed by parallel stochastic arithmetic units integrated directly within the memory arrays to efficiently implement core arithmetic and transcendental functions with minimal hardware complexity and inherent noise tolerance. The resulting stochastic outputs can be either reused as an input of future stochastic processing or converted back to binary form using parallel accumulation mechanisms and stored in the MTJ memory. By tightly integrating data storage, bit-stream generation, and computation within a unified in-memory fabric, the proposed design maximizes memory-level parallelism while substantially minimizing data movement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a parallel in-memory stochastic computing architecture using MTJ-based memory augmented with logic-in-memory (LIM) capabilities. It claims that leveraging the inherent stochasticity and write-read characteristics of MTJ devices enables fully parallel and deterministic conversion of binary operands into probabilistic bit-streams without external RNG circuitry; these streams are processed by integrated parallel stochastic arithmetic units for core arithmetic and transcendental functions with minimal hardware and noise tolerance; outputs can be reused or converted back to binary via parallel accumulation and stored in MTJ memory, thereby maximizing memory-level parallelism and minimizing data movement.
Significance. If realized, the architecture could substantially advance in-memory computing for data-intensive workloads by integrating storage, stochastic bit-stream generation, and computation in a unified fabric, reducing energy and latency from data movement and external RNG. The conceptual leverage of documented MTJ physical properties for deterministic parallel conversion is a strength, avoiding reliance on fitted parameters or new unproven mechanisms. However, the high-level proposal without quantitative validation or circuit-level details limits demonstrated impact to prospective rather than established.
major comments (1)
- Abstract: the claims of 'fully parallel and deterministic conversion' eliminating external RNG and 'substantially minimizing data movement' are load-bearing for the central efficiency and parallelism assertions, yet the manuscript supplies no quantitative simulations, error analysis, hardware measurements, or baseline comparisons to support them.
minor comments (2)
- The manuscript would benefit from a block diagram or timing illustration of the end-to-end pipeline (bit-stream generation, stochastic units, accumulation) to clarify integration of LIM while preserving storage functionality.
- Notation for stochastic bit-stream representation and accumulation mechanisms could be defined more explicitly to aid reproducibility of the conceptual flow.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation for major revision. We address the single major comment below, clarifying the basis for the claims while acknowledging the absence of quantitative data in the current high-level proposal. We commit to targeted additions in the revised manuscript to strengthen the presentation.
read point-by-point responses
-
Referee: Abstract: the claims of 'fully parallel and deterministic conversion' eliminating external RNG and 'substantially minimizing data movement' are load-bearing for the central efficiency and parallelism assertions, yet the manuscript supplies no quantitative simulations, error analysis, hardware measurements, or baseline comparisons to support them.
Authors: We agree that the manuscript is a conceptual architectural proposal and does not contain new quantitative simulations, error analysis, or hardware measurements. The claims rest on the integration of documented MTJ device physics: stochastic switching probability can be deterministically controlled via write-pulse duration to generate bit-streams in parallel across the array without external RNG, as supported by the cited MTJ literature. In-memory integration similarly eliminates data movement by performing generation, arithmetic, and storage locally. We accept that these points would benefit from explicit support. In the revised manuscript we will add an analytical evaluation section that derives first-order energy, latency, and parallelism estimates from standard MTJ parameters, includes bit-stream error bounds, and provides comparisons against conventional stochastic-computing baselines that use external RNGs. Hardware measurements from a fabricated prototype remain outside the scope of this work. revision: partial
- Hardware measurements from a fabricated prototype: the work is a high-level architectural study based on device models; physical silicon implementation and measurements are not feasible within the current paper.
Circularity Check
No significant circularity; claims rest on external MTJ device physics
full rationale
The paper proposes an in-memory stochastic computing architecture leveraging documented physical properties of MTJ devices (stochasticity and write-read behavior) for parallel bit-stream generation without external RNG. No derivation chain, equations, or fitted parameters are presented that reduce to self-defined inputs or prior self-citations. The architecture description is conceptual and high-level, with no quantitative predictions or uniqueness theorems invoked from the authors' own prior work. All load-bearing elements trace to independent device characteristics rather than internal re-derivation or renaming of results.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption MTJ devices exhibit inherent stochasticity and write-read characteristics that can be harnessed for deterministic probabilistic bit-stream generation
- domain assumption Logic-in-memory capabilities can be added to MTJ arrays without prohibitive overhead while supporting parallel stochastic arithmetic
invented entities (1)
-
Integrated stochastic logic-in-memory architecture
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A. Alaghi and J.P. Hayes. 2014. Fast and accurate computation using stochastic circuits. InDATE’14. 1–4. doi:10.7873/DATE.2014.089 20 Razi et al
-
[2]
Armin Alaghi and John P. Hayes. 2013. Survey of Stochastic Computing.ACM Trans. Embed. Comput. Syst.12, 2s, Article 92 (2013), 19 pages
2013
-
[3]
Armin Alaghi, Weikang Qian, and John P. Hayes. 2018. The Promise and Challenge of Stochastic Computing.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems37, 8 (2018), 1515–1531. doi:10.1109/TCAD.2017.2778107
-
[4]
Mohsen Riahi Alam, M Hassan Najafi, and Nima TaheriNejad. 2020. Exact in-memory multiplication based on deterministic stochastic computing. In2020 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5
2020
-
[5]
Hassan Najafi, and Nima TaheriNejad
Mohsen Riahi Alam, M. Hassan Najafi, and Nima TaheriNejad. 2021. Exact Stochastic Computing Multiplication in Memristive Memory.IEEE Design & Test38, 6 (2021), 36–43. doi:10.1109/MDAT.2021.3051296
-
[6]
Hassan Najafi, and Mohsen Imani
Sina Asadi, M. Hassan Najafi, and Mohsen Imani. 2021. A Low-Cost FSM-based Bit-Stream Generator for Low-Discrepancy Stochastic Computing. In2021 DATE. 908–913. doi:10.23919/DATE51398.2021.9474143
-
[7]
Rui Chen, Lei Hei, and Yi Lai. 2023. Object detection in optical imaging of the Internet of Things based on deep learning.PeerJ Comp. Sc.9 (2023), e1718
2023
-
[8]
Nguyen, and Bing-Hong Liu
Shao-I Chu, Chi-Long Wu, Tu N. Nguyen, and Bing-Hong Liu. 2022. Polynomial Computation Using Unipolar Stochastic Logic and Correlation Technique.IEEE TC71, 6 (2022), 1358–1373
2022
-
[9]
Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick. 2009. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.SIAM Rev.51, 1 (Feb. 2009), 129–159. doi:10.1137/070693199
-
[10]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters.Commun. ACM51, 1 (Jan. 2008), 107–113. doi:10.1145/1327452.1327492
-
[11]
Frédo Durand and Julie Dorsey. 2002. Fast bilateral filtering for the display of high-dynamic-range images.ACM Trans. Graph.21, 3 (July 2002), 257–266. doi:10.1145/566654.566574
-
[12]
Amant, Karthikeyan Sankaralingam, and Doug Burger
Hadi Esmaeilzadeh, Emily Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In2011 38th Annual International Symposium on Computer Architecture (ISCA). 365–376
2011
-
[13]
B. R. Gaines. 1967. Stochastic computing. InProceedings of the April 18-20, 1967, Spring Joint Computer Conference (AFIPS ’67 (Spring)). 149–156
1967
-
[14]
Michael Gschwind. 2006. Chip multiprocessing and the cell broadband engine. InProceedings of the 3rd Conference on Computing Frontiers(Ischia, Italy)(CF ’06). Association for Computing Machinery, New York, NY, USA, 1–8. doi:10.1145/1128022.1128023
-
[15]
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, and Shahar Kvatinsky. 2018. Efficient algorithms for in-memory fixed point multiplication using magic. In2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5
2018
-
[16]
Jie Han, Hao Chen, Jinghang Liang, Peican Zhu, Zhixi Yang, and Fabrizio Lombardi. 2014. A Stochastic Computational Approach for Accurate and Efficient Reliability Evaluation.IEEE Trans. Comput.63, 6 (2014), 1336–1350. doi:10.1109/TC.2012.276
-
[17]
Mohsen Imani, Saransh Gupta, and Tajana Rosing. 2017. Ultra-efficient processing in-memory for data intensive applications. InProceedings of the 54th Annual Design Automation Conference 2017. 1–6
2017
-
[18]
A benchmark suite for improving per- formance portability of the sycl programming model,
Maurus Item, Geraldo F. Oliveira, Juan Gómez-Luna, Mohammad Sadrosadati, Yuxin Guo, and Onur Mutlu. 2023. TransPimLib: Efficient Tran- scendental Functions for Processing-in-Memory Systems. In2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 235–247. doi:10.1109/ISPASS57527.2023.00031
-
[19]
Seungchul Jung, Hyungwoo Lee, Sungmeen Myung, Hyunsoo Kim, Seung Keun Yoon, Soon-Wan Kwon, Yongmin Ju, Minje Kim, Wooseok Yi, Shinhee Han, Baeseong Kwon, Boyoung Seo, Kilho Lee, Gwan-Hyeob Koh, Kangho Lee, Yoonjong Song, Changkyu Choi, Donhee Ham, and Sang Joon Kim. 2022. A crossbar array of magnetoresistive memory devices for in-memory computing.Nature60...
-
[20]
Wang Kang, Zhaohao Wang, Youguang Zhang, Jacques-Olivier Klein, Weifeng Lv, and Weisheng Zhao. 2016. Spintronic logic design methodology based on spin Hall effect–driven magnetic tunnel junctions.Journal of Physics D: Applied Physics49, 6 (2016), 065008
2016
-
[21]
Dong Eun Kim, Tanvi Sharma, Anushka Mukherjee, Mainakh Mukherjee, and Kaushik Roy. 2025. MemRaptor: Magnetoresistive Array as Matrix Vector Multiplication and Transcendental Function Operator for NLP Applications. In2025 ISLPED. 1–7
2025
-
[22]
Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. 65–76. doi:10.1109/MICRO.2010.51
-
[23]
Vijaya Lakshmi, Vikramkumar Pudi, and John Reuben. 2022. Inner product computation in-memory using distributed arithmetic.IEEE Transactions on Circuits and Systems I: Regular Papers69, 11 (2022), 4546–4557
2022
-
[24]
V. T. Lee, A. Alaghi, and L. Ceze. 2018. Correlation manipulating circuits for stochastic computing. InDATE’18. 1417–1422. doi:10.23919/DATE.2018. 8342234
-
[25]
Luqiao Liu, Chi-Feng Pai, Yi Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman. 2012. Spin-torque switching with the giant spin Hall effect of tantalum. Science336, 6081 (4 May 2012), 555–558. doi:10.1126/science.1218197
-
[26]
Siting Liu and Jie Han. 2018. Toward Energy-Efficient Stochastic Circuits Using Parallel Sobol Sequences.IEEE Transactions on Very Large Scale Integration (VLSI) Systems26, 7 (2018), 1326–1339. doi:10.1109/TVLSI.2018.2812214
-
[27]
Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, and Jian-Ping Wang. 2024. Experimental demonstration of magnetic tunnel junction-based computational random-access memory.npj Unconventional Computing1, 1 (25 July 2024), 3. doi:10.10...
-
[28]
Meher, Javier Valls, Tso-Bing Juang, K
Pramod K. Meher, Javier Valls, Tso-Bing Juang, K. Sridharan, and Koushik Maharatna. 2009. 50 Years of CORDIC: Algorithms, Architectures, and Applications.IEEE Transactions on Circuits and Systems I: Regular Papers56, 9 (2009), 1893–1907. doi:10.1109/TCSI.2009.2025803 Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures 21
-
[29]
In: 2020 25th Asia and South Pacific Design Automation Conference (ASP- DAC)
Mehran Shoushtari Moghadam, Sercan Aygun, Mohsen Riahi Alam, and M. Hassan Najafi. 2024. P2LSG: Powers-of-2 Low-Discrepancy Sequence Generator for Stochastic Computing. In2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). 38–45. doi:10.1109/ASP- DAC58780.2024.10473928
-
[30]
Mehran Shoushtari Moghadam, Sercan Aygun, Sina Asadi, and M. Hassan Najafi. 2024. Low-Cost and Highly-Efficient Bit-Stream Generator for Stochastic Computing Division.IEEE Transactions on Nanotechnology23 (2024), 195–202. doi:10.1109/TNANO.2024.3358395
-
[31]
Hassan Najafi, Devon Jenson, David J
M. Hassan Najafi, Devon Jenson, David J. Lilja, and Marc D. Riedel. 2019. Performing Stochastic Computation Deterministically.IEEE Transactions on Very Large Scale Integration (VLSI) Systems27, 12 (2019). doi:10.1109/TVLSI.2019.2929354
-
[32]
M. Hassan Najafi, David J. Lilja, and Marc Riedel. 2018. Deterministic methods for stochastic computing using low-discrepancy sequences. In Proceedings of the International Conference on Computer-Aided Design(San Diego, California)(ICCAD ’18). Association for Computing Machinery, New York, NY, USA, Article 51, 8 pages. doi:10.1145/3240765.3240797
-
[33]
Parhi and Yin Liu
Keshab K. Parhi and Yin Liu. 2019. Computing Arithmetic Functions Using Stochastic Logic by Series Expansion.IEEE TETC7, 1 (2019), 44–59
2019
-
[34]
Farzad Razi, Mohammad Hossein Moaiyeri, and Siamak Mohammadi. 2022. Toward efficient logic-in-memory computing with magnetic reconfig- urable logic circuits.IEEE Magn. Let.13 (2022), 1–5
2022
-
[35]
Hassan Najafi, Sercan Aygun, and Marc Riedel
Farzad Razi, Mehran Shoushtari Moghadam, M. Hassan Najafi, Sercan Aygun, and Marc Riedel. 2025. In-Memory Arithmetic: Enabling Division with Stochastic Logic. In2025 62nd ACM/IEEE Design Automation Conference (DAC). 1–2. doi:10.1109/DAC63849.2025.11132099
-
[36]
Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou. 2020. Memory devices and applications for in-memory computing.Nature Nanotechnology15, 7 (1 July 2020), 529–544. doi:10.1038/s41565-020-0655-z
-
[37]
Gian Singh, Ayushi Dube, and Sarma Vrudhula. 2024. Energy-Efficient and Low-Latency Computation of Transcendental Functions in a Precision- Tunable PIM Architecture. In2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 186–191
2024
-
[38]
Smithson, Naoya Onizawa, Brett H
Sean C. Smithson, Naoya Onizawa, Brett H. Meyer, Warren J. Gross, and Takahiro Hanyu. 2019. Efficient CMOS Invertible Logic Using Stochastic Computing.IEEE Transactions on Circuits and Systems I: Regular Papers66, 6 (2019), 2263–2274. doi:10.1109/TCSI.2018.2889732
-
[39]
Costin-Emanuel Vasile, Andrei-Alexandru Ulmămei, and Călin Bîră. 2024. Image Processing Hardware Acceleration—A Review of Operations Involved and Current Hardware Approaches.Journal of Imaging10, 12 (21 Nov. 2024), 298. doi:10.3390/jimaging10120298
-
[40]
Jingcheng Wang, Xiaowei Wang, Charles Eckert, Arun Subramaniyan, Reetuparna Das, David Blaauw, and Dennis Sylvester. 2019. A 28-nm compute SRAM with bit-serial logic/arithmetic operations for programmable in-memory vector computing.IEEE Journal of Solid-State Circuits55, 1 (2019), 76–86
2019
-
[41]
Zhendong Wang, Zhenyu Xu, Daojing He, and Sammy Chan. 2021. Deep logarithmic neural network for Internet intrusion detection.Soft Computing 25, 15 (01 Aug 2021), 10129–10152
2021
-
[42]
Siqiu Xu, Xi Li, Chenchen Xie, Houpeng Chen, Cheng Chen, and Zhitang Song. 2021. A high-precision implementation of the sigmoid activation function for computing-in-memory architecture.Micromachines12, 10 (2021), 1183
2021
-
[43]
Yawen Zhang, Runsheng Wang, Xinyue Zhang, Zherui Zhang, Jiahao Song, Zuodong Zhang, Yuan Wang, and Ru Huang. 2019. A Parallel Bitstream Generator for Stochastic Computing. In2019 Silicon Nanoelectronics Workshop (SNW). 1–2. doi:10.23919/SNW.2019.8782977
-
[44]
Brandon R Zink, Yang Lv, Masoud Zabihi, Husrev Cilasun, Sachin S Sapatnekar, Ulya R Karpuzcu, Marc D Riedel, and Jian-Ping Wang. 2023. A stochastic computing scheme of embedding random bit generation and processing in computational random access memory (SC-CRAM).IEEE Journal on Exploratory Solid-State Computational Devices and Circuits9, 1 (2023), 29–37
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.