arxiv: 2604.23146 · v2 · submitted 2026-04-25 · 💻 cs.ET · cs.AR· eess.IV

Recognition: unknown

Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures

Farzad Razi , Mehran Moghadam , Sercan Aygun , M. Hassan Najafi , Marc Riedel

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:55 UTC · model grok-4.3

classification 💻 cs.ET cs.AReess.IV

keywords stochastic computinglogic-in-memorymagnetic tunnel junctionin-memory computingmemory-level parallelismstochastic arithmeticMTJ devices

0 comments

The pith

An architecture integrates stochastic computing directly into MTJ memory arrays to enable fully parallel bit-stream generation and arithmetic without external random number generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes an in-memory stochastic computing system built on magnetic tunnel junction devices augmented with logic-in-memory features. It aims to convert binary data to probabilistic bit-streams in a deterministic and parallel manner using the devices' natural stochastic behavior, bypassing the need for separate random number generators. If successful, this approach would reduce energy use and data movement in high-performance systems handling data-intensive tasks by keeping computation inside the memory fabric. The design supports core arithmetic operations and can feed outputs back into memory or convert them to binary form.

Core claim

The central claim is that leveraging the inherent stochasticity and write-read characteristics of MTJ devices enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams within the memory arrays, eliminating external random number generation circuitry. These bit-streams are then processed by integrated parallel stochastic arithmetic units for efficient computation of arithmetic and transcendental functions, with outputs reusable or convertible back to binary form using parallel accumulation mechanisms.

What carries the argument

The MTJ-based memory augmented with logic-in-memory capabilities that performs parallel stochastic bit-stream generation and arithmetic directly in the storage fabric.

If this is right

Core arithmetic and transcendental functions can be implemented with minimal hardware complexity and inherent noise tolerance.
Stochastic outputs can be reused as inputs for further processing or converted back to binary using parallel accumulation and stored in MTJ memory.
Memory-level parallelism is maximized by performing generation, computation, and storage in a unified fabric.
Data movement overhead is substantially minimized compared to conventional von Neumann architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar in-memory stochastic techniques could extend to other emerging memory technologies if the MTJ integration proves viable.
The design may suit approximate computing tasks such as neural network training where noise tolerance is already accepted.
Net energy savings would require system-level simulations comparing against GPU-based stochastic implementations.

Load-bearing premise

That MTJ memory arrays can be practically augmented with integrated logic-in-memory capabilities to support parallel stochastic arithmetic units while preserving storage functionality.

What would settle it

A hardware prototype or detailed simulation showing that the energy cost of adding logic-in-memory to MTJ arrays exceeds the savings from eliminating external RNG circuitry or prevents full parallelism.

Figures

Figures reproduced from arXiv: 2604.23146 by Farzad Razi, Marc Riedel, Mehran Moghadam, M. Hassan Najafi, Sercan Aygun.

**Figure 1.** Figure 1: Conventional SC bit-stream generation approach using RNG and comparator (a) Uncorrelated bit-stream using independent view at source ↗

**Figure 2.** Figure 2: Basic arithmetic operations using simple logic gates in SC. Some arithmetic operations demand uncorrelated input bit-streams view at source ↗

**Figure 3.** Figure 3: A general library of transcendental functions implementation using Stochastic logic. view at source ↗

**Figure 4.** Figure 4: Structure of logic in-memory [34], (a) MTJ device, (b) Circuit-level design. 2.3 Magnetic Tunnel Junction (MTJ) MTJs are non-volatile devices whose resistance is determined by the relative magnetic orientation of two ferromagnetic layers separated by a thin insulating barrier ( view at source ↗

**Figure 5.** Figure 5: General overview of proposed parallel in-memory stochastic architecture. view at source ↗

**Figure 6.** Figure 6: MTJ-based Memory Cell, (a) Write, (b) Read Operations. view at source ↗

**Figure 7.** Figure 7: An overview of in-memory deterministic SC bit-stream generation. The output bit-streams can be achieved through replicating view at source ↗

**Figure 8.** Figure 8: Stochastic Computing Unit (SCU) (a) Traditional Serial Approach, (b) Proposed Parallel Approach. view at source ↗

**Figure 9.** Figure 9: Parallel Bit-stream to binary conversion schematic used in [ view at source ↗

**Figure 10.** Figure 10: PIM-PIM tone-mapping benchmark using the proposed in-memory (a) view at source ↗

read the original abstract

Today's high-performance architectures are increasingly constrained by data movement latency and energy overhead, as the slowdown of single-core performance scaling coincides with the rise of highly data-intensive workloads. In-memory architectures have emerged as a complementary solution to conventional von Neumann systems by alleviating memory bandwidth bottlenecks, exploiting massive concurrency, and mitigating excessive data movement between memory and processing units. This study proposes a parallel in-memory stochastic computing (SC) architecture that implements an end-to-end computation pipeline within Magnetic Tunnel Junction (MTJ)-based memory augmented with logic-in-memory (LIM) capabilities. By leveraging the inherent stochasticity and write-read characteristics of MTJ devices, the proposed architecture enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams, eliminating the need for energy-intensive external random number generation circuitry. These bit-streams are processed by parallel stochastic arithmetic units integrated directly within the memory arrays to efficiently implement core arithmetic and transcendental functions with minimal hardware complexity and inherent noise tolerance. The resulting stochastic outputs can be either reused as an input of future stochastic processing or converted back to binary form using parallel accumulation mechanisms and stored in the MTJ memory. By tightly integrating data storage, bit-stream generation, and computation within a unified in-memory fabric, the proposed design maximizes memory-level parallelism while substantially minimizing data movement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a way to use MTJ stochasticity for fully parallel, deterministic bit-stream generation inside memory with integrated LIM units, but it is a high-level architecture proposal with no simulations or numbers to back the efficiency claims.

read the letter

The main takeaway is that the authors show how MTJ write-read behavior and inherent randomness can be harnessed for parallel conversion of binary operands to probabilistic bit-streams right in the memory array, skipping external RNG hardware, then feeding those streams into on-array stochastic arithmetic units with parallel accumulation to close the loop back to binary storage. This keeps storage, generation, and computation in one fabric to target data movement overheads in data-heavy workloads. They build directly on documented MTJ properties without new device physics, and the parallel framing across the array is the concrete new element in how the pieces fit together. The proposal is internally consistent and points at a practical bottleneck in current systems. The limitation is that it stays conceptual. There are no circuit equations, timing details, error analysis, energy or area estimates, or any simulation results to show whether the integration actually delivers the claimed gains or preserves normal memory function. Without those, the efficiency and noise-tolerance arguments rest on the architecture working as described rather than measured outcomes. This is for hardware architects and researchers working on stochastic computing or emerging non-volatile memory for low-power AI and big-data hardware. A reader in that area would get a useful starting point for thinking about in-memory stochastic pipelines. I would send it to peer review. The idea is coherent enough and addresses a real issue, so referees can push for the missing validation work.

Referee Report

1 major / 2 minor

Summary. The paper proposes a parallel in-memory stochastic computing architecture using MTJ-based memory augmented with logic-in-memory (LIM) capabilities. It claims that leveraging the inherent stochasticity and write-read characteristics of MTJ devices enables fully parallel and deterministic conversion of binary operands into probabilistic bit-streams without external RNG circuitry; these streams are processed by integrated parallel stochastic arithmetic units for core arithmetic and transcendental functions with minimal hardware and noise tolerance; outputs can be reused or converted back to binary via parallel accumulation and stored in MTJ memory, thereby maximizing memory-level parallelism and minimizing data movement.

Significance. If realized, the architecture could substantially advance in-memory computing for data-intensive workloads by integrating storage, stochastic bit-stream generation, and computation in a unified fabric, reducing energy and latency from data movement and external RNG. The conceptual leverage of documented MTJ physical properties for deterministic parallel conversion is a strength, avoiding reliance on fitted parameters or new unproven mechanisms. However, the high-level proposal without quantitative validation or circuit-level details limits demonstrated impact to prospective rather than established.

major comments (1)

Abstract: the claims of 'fully parallel and deterministic conversion' eliminating external RNG and 'substantially minimizing data movement' are load-bearing for the central efficiency and parallelism assertions, yet the manuscript supplies no quantitative simulations, error analysis, hardware measurements, or baseline comparisons to support them.

minor comments (2)

The manuscript would benefit from a block diagram or timing illustration of the end-to-end pipeline (bit-stream generation, stochastic units, accumulation) to clarify integration of LIM while preserving storage functionality.
Notation for stochastic bit-stream representation and accumulation mechanisms could be defined more explicitly to aid reproducibility of the conceptual flow.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive review and the recommendation for major revision. We address the single major comment below, clarifying the basis for the claims while acknowledging the absence of quantitative data in the current high-level proposal. We commit to targeted additions in the revised manuscript to strengthen the presentation.

read point-by-point responses

Referee: Abstract: the claims of 'fully parallel and deterministic conversion' eliminating external RNG and 'substantially minimizing data movement' are load-bearing for the central efficiency and parallelism assertions, yet the manuscript supplies no quantitative simulations, error analysis, hardware measurements, or baseline comparisons to support them.

Authors: We agree that the manuscript is a conceptual architectural proposal and does not contain new quantitative simulations, error analysis, or hardware measurements. The claims rest on the integration of documented MTJ device physics: stochastic switching probability can be deterministically controlled via write-pulse duration to generate bit-streams in parallel across the array without external RNG, as supported by the cited MTJ literature. In-memory integration similarly eliminates data movement by performing generation, arithmetic, and storage locally. We accept that these points would benefit from explicit support. In the revised manuscript we will add an analytical evaluation section that derives first-order energy, latency, and parallelism estimates from standard MTJ parameters, includes bit-stream error bounds, and provides comparisons against conventional stochastic-computing baselines that use external RNGs. Hardware measurements from a fabricated prototype remain outside the scope of this work. revision: partial

standing simulated objections not resolved

Hardware measurements from a fabricated prototype: the work is a high-level architectural study based on device models; physical silicon implementation and measurements are not feasible within the current paper.

Circularity Check

0 steps flagged

No significant circularity; claims rest on external MTJ device physics

full rationale

The paper proposes an in-memory stochastic computing architecture leveraging documented physical properties of MTJ devices (stochasticity and write-read behavior) for parallel bit-stream generation without external RNG. No derivation chain, equations, or fitted parameters are presented that reduce to self-defined inputs or prior self-citations. The architecture description is conceptual and high-level, with no quantitative predictions or uniqueness theorems invoked from the authors' own prior work. All load-bearing elements trace to independent device characteristics rather than internal re-derivation or renaming of results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on domain assumptions about MTJ device behavior and the feasibility of LIM integration; no free parameters or new physical entities are introduced.

axioms (2)

domain assumption MTJ devices exhibit inherent stochasticity and write-read characteristics that can be harnessed for deterministic probabilistic bit-stream generation
Invoked to eliminate external RNG circuitry
domain assumption Logic-in-memory capabilities can be added to MTJ arrays without prohibitive overhead while supporting parallel stochastic arithmetic
Required for in-memory computation pipeline

invented entities (1)

Integrated stochastic logic-in-memory architecture no independent evidence
purpose: To maximize memory-level parallelism by unifying storage, bit-stream generation, and computation
New proposed system design with no independent evidence supplied in the abstract

pith-pipeline@v0.9.0 · 5549 in / 1298 out tokens · 43596 ms · 2026-05-08T06:55:32.098145+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 26 canonical work pages

[1]

Alaghi and J.P

A. Alaghi and J.P. Hayes. 2014. Fast and accurate computation using stochastic circuits. InDATE’14. 1–4. doi:10.7873/DATE.2014.089 20 Razi et al

work page doi:10.7873/date.2014.089 2014
[2]

Armin Alaghi and John P. Hayes. 2013. Survey of Stochastic Computing.ACM Trans. Embed. Comput. Syst.12, 2s, Article 92 (2013), 19 pages

2013
[3]

Armin Alaghi, Weikang Qian, and John P. Hayes. 2018. The Promise and Challenge of Stochastic Computing.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems37, 8 (2018), 1515–1531. doi:10.1109/TCAD.2017.2778107

work page doi:10.1109/tcad.2017.2778107 2018
[4]

Mohsen Riahi Alam, M Hassan Najafi, and Nima TaheriNejad. 2020. Exact in-memory multiplication based on deterministic stochastic computing. In2020 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5

2020
[5]

Hassan Najafi, and Nima TaheriNejad

Mohsen Riahi Alam, M. Hassan Najafi, and Nima TaheriNejad. 2021. Exact Stochastic Computing Multiplication in Memristive Memory.IEEE Design & Test38, 6 (2021), 36–43. doi:10.1109/MDAT.2021.3051296

work page doi:10.1109/mdat.2021.3051296 2021
[6]

Hassan Najafi, and Mohsen Imani

Sina Asadi, M. Hassan Najafi, and Mohsen Imani. 2021. A Low-Cost FSM-based Bit-Stream Generator for Low-Discrepancy Stochastic Computing. In2021 DATE. 908–913. doi:10.23919/DATE51398.2021.9474143

work page doi:10.23919/date51398.2021.9474143 2021
[7]

Rui Chen, Lei Hei, and Yi Lai. 2023. Object detection in optical imaging of the Internet of Things based on deep learning.PeerJ Comp. Sc.9 (2023), e1718

2023
[8]

Nguyen, and Bing-Hong Liu

Shao-I Chu, Chi-Long Wu, Tu N. Nguyen, and Bing-Hong Liu. 2022. Polynomial Computation Using Unipolar Stochastic Logic and Correlation Technique.IEEE TC71, 6 (2022), 1358–1373

2022
[9]

Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick. 2009. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.SIAM Rev.51, 1 (Feb. 2009), 129–159. doi:10.1137/070693199

work page doi:10.1137/070693199 2009
[10]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters.Commun. ACM51, 1 (Jan. 2008), 107–113. doi:10.1145/1327452.1327492

work page doi:10.1145/1327452.1327492 2008
[11]

Frédo Durand and Julie Dorsey. 2002. Fast bilateral filtering for the display of high-dynamic-range images.ACM Trans. Graph.21, 3 (July 2002), 257–266. doi:10.1145/566654.566574

work page doi:10.1145/566654.566574 2002
[12]

Amant, Karthikeyan Sankaralingam, and Doug Burger

Hadi Esmaeilzadeh, Emily Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In2011 38th Annual International Symposium on Computer Architecture (ISCA). 365–376

2011
[13]

B. R. Gaines. 1967. Stochastic computing. InProceedings of the April 18-20, 1967, Spring Joint Computer Conference (AFIPS ’67 (Spring)). 149–156

1967
[14]

Michael Gschwind. 2006. Chip multiprocessing and the cell broadband engine. InProceedings of the 3rd Conference on Computing Frontiers(Ischia, Italy)(CF ’06). Association for Computing Machinery, New York, NY, USA, 1–8. doi:10.1145/1128022.1128023

work page doi:10.1145/1128022.1128023 2006
[15]

Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, and Shahar Kvatinsky. 2018. Efficient algorithms for in-memory fixed point multiplication using magic. In2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5

2018
[16]

Jie Han, Hao Chen, Jinghang Liang, Peican Zhu, Zhixi Yang, and Fabrizio Lombardi. 2014. A Stochastic Computational Approach for Accurate and Efficient Reliability Evaluation.IEEE Trans. Comput.63, 6 (2014), 1336–1350. doi:10.1109/TC.2012.276

work page doi:10.1109/tc.2012.276 2014
[17]

Mohsen Imani, Saransh Gupta, and Tajana Rosing. 2017. Ultra-efficient processing in-memory for data intensive applications. InProceedings of the 54th Annual Design Automation Conference 2017. 1–6

2017
[18]

A benchmark suite for improving per- formance portability of the sycl programming model,

Maurus Item, Geraldo F. Oliveira, Juan Gómez-Luna, Mohammad Sadrosadati, Yuxin Guo, and Onur Mutlu. 2023. TransPimLib: Efficient Tran- scendental Functions for Processing-in-Memory Systems. In2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 235–247. doi:10.1109/ISPASS57527.2023.00031

work page doi:10.1109/ispass57527.2023.00031 2023
[19]

Seungchul Jung, Hyungwoo Lee, Sungmeen Myung, Hyunsoo Kim, Seung Keun Yoon, Soon-Wan Kwon, Yongmin Ju, Minje Kim, Wooseok Yi, Shinhee Han, Baeseong Kwon, Boyoung Seo, Kilho Lee, Gwan-Hyeob Koh, Kangho Lee, Yoonjong Song, Changkyu Choi, Donhee Ham, and Sang Joon Kim. 2022. A crossbar array of magnetoresistive memory devices for in-memory computing.Nature60...

work page doi:10.1038/s41586-021-04196-6 2022
[20]

Wang Kang, Zhaohao Wang, Youguang Zhang, Jacques-Olivier Klein, Weifeng Lv, and Weisheng Zhao. 2016. Spintronic logic design methodology based on spin Hall effect–driven magnetic tunnel junctions.Journal of Physics D: Applied Physics49, 6 (2016), 065008

2016
[21]

Dong Eun Kim, Tanvi Sharma, Anushka Mukherjee, Mainakh Mukherjee, and Kaushik Roy. 2025. MemRaptor: Magnetoresistive Array as Matrix Vector Multiplication and Transcendental Function Operator for NLP Applications. In2025 ISLPED. 1–7

2025
[22]

Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. 65–76. doi:10.1109/MICRO.2010.51

work page doi:10.1109/micro.2010.51 2010
[23]

Vijaya Lakshmi, Vikramkumar Pudi, and John Reuben. 2022. Inner product computation in-memory using distributed arithmetic.IEEE Transactions on Circuits and Systems I: Regular Papers69, 11 (2022), 4546–4557

2022
[24]

V. T. Lee, A. Alaghi, and L. Ceze. 2018. Correlation manipulating circuits for stochastic computing. InDATE’18. 1417–1422. doi:10.23919/DATE.2018. 8342234

work page doi:10.23919/date.2018 2018
[25]

Luqiao Liu, Chi-Feng Pai, Yi Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman. 2012. Spin-torque switching with the giant spin Hall effect of tantalum. Science336, 6081 (4 May 2012), 555–558. doi:10.1126/science.1218197

work page doi:10.1126/science.1218197 2012
[26]

Siting Liu and Jie Han. 2018. Toward Energy-Efficient Stochastic Circuits Using Parallel Sobol Sequences.IEEE Transactions on Very Large Scale Integration (VLSI) Systems26, 7 (2018), 1326–1339. doi:10.1109/TVLSI.2018.2812214

work page doi:10.1109/tvlsi.2018.2812214 2018
[27]

Zink, Robert P

Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, and Jian-Ping Wang. 2024. Experimental demonstration of magnetic tunnel junction-based computational random-access memory.npj Unconventional Computing1, 1 (25 July 2024), 3. doi:10.10...

work page doi:10.1038/s44335-024-00003-3 2024
[28]

Meher, Javier Valls, Tso-Bing Juang, K

Pramod K. Meher, Javier Valls, Tso-Bing Juang, K. Sridharan, and Koushik Maharatna. 2009. 50 Years of CORDIC: Algorithms, Architectures, and Applications.IEEE Transactions on Circuits and Systems I: Regular Papers56, 9 (2009), 1893–1907. doi:10.1109/TCSI.2009.2025803 Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures 21

work page doi:10.1109/tcsi.2009.2025803 2009
[29]

In: 2020 25th Asia and South Pacific Design Automation Conference (ASP- DAC)

Mehran Shoushtari Moghadam, Sercan Aygun, Mohsen Riahi Alam, and M. Hassan Najafi. 2024. P2LSG: Powers-of-2 Low-Discrepancy Sequence Generator for Stochastic Computing. In2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). 38–45. doi:10.1109/ASP- DAC58780.2024.10473928

work page doi:10.1109/asp- 2024
[30]

Hassan Najafi

Mehran Shoushtari Moghadam, Sercan Aygun, Sina Asadi, and M. Hassan Najafi. 2024. Low-Cost and Highly-Efficient Bit-Stream Generator for Stochastic Computing Division.IEEE Transactions on Nanotechnology23 (2024), 195–202. doi:10.1109/TNANO.2024.3358395

work page doi:10.1109/tnano.2024.3358395 2024
[31]

Hassan Najafi, Devon Jenson, David J

M. Hassan Najafi, Devon Jenson, David J. Lilja, and Marc D. Riedel. 2019. Performing Stochastic Computation Deterministically.IEEE Transactions on Very Large Scale Integration (VLSI) Systems27, 12 (2019). doi:10.1109/TVLSI.2019.2929354

work page doi:10.1109/tvlsi.2019.2929354 2019
[32]

Hassan Najafi, David J

M. Hassan Najafi, David J. Lilja, and Marc Riedel. 2018. Deterministic methods for stochastic computing using low-discrepancy sequences. In Proceedings of the International Conference on Computer-Aided Design(San Diego, California)(ICCAD ’18). Association for Computing Machinery, New York, NY, USA, Article 51, 8 pages. doi:10.1145/3240765.3240797

work page doi:10.1145/3240765.3240797 2018
[33]

Parhi and Yin Liu

Keshab K. Parhi and Yin Liu. 2019. Computing Arithmetic Functions Using Stochastic Logic by Series Expansion.IEEE TETC7, 1 (2019), 44–59

2019
[34]

Farzad Razi, Mohammad Hossein Moaiyeri, and Siamak Mohammadi. 2022. Toward efficient logic-in-memory computing with magnetic reconfig- urable logic circuits.IEEE Magn. Let.13 (2022), 1–5

2022
[35]

Hassan Najafi, Sercan Aygun, and Marc Riedel

Farzad Razi, Mehran Shoushtari Moghadam, M. Hassan Najafi, Sercan Aygun, and Marc Riedel. 2025. In-Memory Arithmetic: Enabling Division with Stochastic Logic. In2025 62nd ACM/IEEE Design Automation Conference (DAC). 1–2. doi:10.1109/DAC63849.2025.11132099

work page doi:10.1109/dac63849.2025.11132099 2025
[36]

Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou. 2020. Memory devices and applications for in-memory computing.Nature Nanotechnology15, 7 (1 July 2020), 529–544. doi:10.1038/s41565-020-0655-z

work page doi:10.1038/s41565-020-0655-z 2020
[37]

Gian Singh, Ayushi Dube, and Sarma Vrudhula. 2024. Energy-Efficient and Low-Latency Computation of Transcendental Functions in a Precision- Tunable PIM Architecture. In2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 186–191

2024
[38]

Smithson, Naoya Onizawa, Brett H

Sean C. Smithson, Naoya Onizawa, Brett H. Meyer, Warren J. Gross, and Takahiro Hanyu. 2019. Efficient CMOS Invertible Logic Using Stochastic Computing.IEEE Transactions on Circuits and Systems I: Regular Papers66, 6 (2019), 2263–2274. doi:10.1109/TCSI.2018.2889732

work page doi:10.1109/tcsi.2018.2889732 2019
[39]

Costin-Emanuel Vasile, Andrei-Alexandru Ulmămei, and Călin Bîră. 2024. Image Processing Hardware Acceleration—A Review of Operations Involved and Current Hardware Approaches.Journal of Imaging10, 12 (21 Nov. 2024), 298. doi:10.3390/jimaging10120298

work page doi:10.3390/jimaging10120298 2024
[40]

Jingcheng Wang, Xiaowei Wang, Charles Eckert, Arun Subramaniyan, Reetuparna Das, David Blaauw, and Dennis Sylvester. 2019. A 28-nm compute SRAM with bit-serial logic/arithmetic operations for programmable in-memory vector computing.IEEE Journal of Solid-State Circuits55, 1 (2019), 76–86

2019
[41]

Zhendong Wang, Zhenyu Xu, Daojing He, and Sammy Chan. 2021. Deep logarithmic neural network for Internet intrusion detection.Soft Computing 25, 15 (01 Aug 2021), 10129–10152

2021
[42]

Siqiu Xu, Xi Li, Chenchen Xie, Houpeng Chen, Cheng Chen, and Zhitang Song. 2021. A high-precision implementation of the sigmoid activation function for computing-in-memory architecture.Micromachines12, 10 (2021), 1183

2021
[43]

Yawen Zhang, Runsheng Wang, Xinyue Zhang, Zherui Zhang, Jiahao Song, Zuodong Zhang, Yuan Wang, and Ru Huang. 2019. A Parallel Bitstream Generator for Stochastic Computing. In2019 Silicon Nanoelectronics Workshop (SNW). 1–2. doi:10.23919/SNW.2019.8782977

work page doi:10.23919/snw.2019.8782977 2019
[44]

Brandon R Zink, Yang Lv, Masoud Zabihi, Husrev Cilasun, Sachin S Sapatnekar, Ulya R Karpuzcu, Marc D Riedel, and Jian-Ping Wang. 2023. A stochastic computing scheme of embedding random bit generation and processing in computational random access memory (SC-CRAM).IEEE Journal on Exploratory Solid-State Computational Devices and Circuits9, 1 (2023), 29–37

2023