pith. sign in

arxiv: 1610.09603 · v1 · pith:7J443C7Tnew · submitted 2016-10-30 · 💻 cs.AR

The Processing Using Memory Paradigm:In-DRAM Bulk Copy, Initialization, Bitwise AND and OR

classification 💻 cs.AR
keywords memoryoperationsperformmainbulkdatadescribeoperation
0
0 comments X
read the original abstract

In existing systems, the off-chip memory interface allows the memory controller to perform only read or write operations. Therefore, to perform any operation, the processor must first read the source data and then write the result back to memory after performing the operation. This approach consumes high latency, bandwidth, and energy for operations that work on a large amount of data. Several works have proposed techniques to process data near memory by adding a small amount of compute logic closer to the main memory chips. In this article, we describe two techniques proposed by recent works that take this approach of processing in memory further by exploiting the underlying operation of the main memory technology to perform more complex tasks. First, we describe RowClone, a mechanism that exploits DRAM technology to perform bulk copy and initialization operations completely inside main memory. We then describe a complementary work that uses DRAM to perform bulk bitwise AND and OR operations inside main memory. These two techniques significantly improve the performance and energy efficiency of the respective operations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding

    cs.AR 2026-06 unverdicted novelty 7.0

    Clutch accelerates vector-scalar comparisons in PuD systems via chunked temporal coding, delivering 2.9x throughput and 3.0x energy gains over prior bit-serial PuD while also mapping decision tree inference to PuD for...

  2. PuDGhost: Experimental Analysis of Computation Result Corruption in Processing-using-DRAM Operations on Real DRAM Chips and Implications for Future Systems

    cs.AR 2026-06 unverdicted novelty 7.0

    PuDGhost causes up to 48% error in SiMRA-based PuD computations due to row and column interference, quantified on 96 real DDR4 chips with proposed mitigations like column screening and row layout changes.

  3. HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

    cs.CR 2026-05 accept novelty 7.0

    Characterization of HE kernels on commercial UPMEM PIM identifies modular multiplication and per-bank capacity as dominant bottlenecks and concludes PIM becomes competitive with CPU/GPU once those are addressed.

  4. Memory-Centric Computing: Security Benefits and Challenges of Processing-in-DRAM

    cs.CR 2026-06 unverdicted novelty 5.0

    Processing-in-DRAM enables new DRAM-based TRNGs (16.05 Gb/s) and PUFs (5.75% lower latency) on real chips but increases read disturbance by 158x and creates 14.8 Mb/s timing channels.