The Processing Using Memory Paradigm:In-DRAM Bulk Copy, Initialization, Bitwise AND and OR
read the original abstract
In existing systems, the off-chip memory interface allows the memory controller to perform only read or write operations. Therefore, to perform any operation, the processor must first read the source data and then write the result back to memory after performing the operation. This approach consumes high latency, bandwidth, and energy for operations that work on a large amount of data. Several works have proposed techniques to process data near memory by adding a small amount of compute logic closer to the main memory chips. In this article, we describe two techniques proposed by recent works that take this approach of processing in memory further by exploiting the underlying operation of the main memory technology to perform more complex tasks. First, we describe RowClone, a mechanism that exploits DRAM technology to perform bulk copy and initialization operations completely inside main memory. We then describe a complementary work that uses DRAM to perform bulk bitwise AND and OR operations inside main memory. These two techniques significantly improve the performance and energy efficiency of the respective operations.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding
Clutch accelerates vector-scalar comparisons in PuD systems via chunked temporal coding, delivering 2.9x throughput and 3.0x energy gains over prior bit-serial PuD while also mapping decision tree inference to PuD for...
-
PuDGhost: Experimental Analysis of Computation Result Corruption in Processing-using-DRAM Operations on Real DRAM Chips and Implications for Future Systems
PuDGhost causes up to 48% error in SiMRA-based PuD computations due to row and column interference, quantified on 96 real DDR4 chips with proposed mitigations like column screening and row layout changes.
-
HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System
Characterization of HE kernels on commercial UPMEM PIM identifies modular multiplication and per-bank capacity as dominant bottlenecks and concludes PIM becomes competitive with CPU/GPU once those are addressed.
-
Memory-Centric Computing: Security Benefits and Challenges of Processing-in-DRAM
Processing-in-DRAM enables new DRAM-based TRNGs (16.05 Gb/s) and PUFs (5.75% lower latency) on real chips but increases read disturbance by 158x and creates 14.8 Mb/s timing channels.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.