RowClone: Accelerating Data Movement and Initialization Using DRAM

Chris Fallin; Donghyuk Lee; Gennady Pekhimenko; Michael A. Kozuch; Onur Mutlu; Phillip B. Gibbons; Rachata Ausavarungnirun; Todd C. Mowry; Vivek Seshadri; Yixin Luo

arxiv: 1805.03502 · v1 · pith:R7QCUEXCnew · submitted 2018-05-07 · 💻 cs.AR

RowClone: Accelerating Data Movement and Initialization Using DRAM

Vivek Seshadri , Yoongu Kim , Chris Fallin , Donghyuk Lee , Rachata Ausavarungnirun , Gennady Pekhimenko , Yixin Luo , Onur Mutlu

show 3 more authors

Phillip B. Gibbons Michael A. Kozuch Todd C. Mowry

This is my paper

classification 💻 cs.AR

keywords datadramcopyinitializationoperationrowclonebulkcache

0 comments

read the original abstract

In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding
cs.AR 2026-06 unverdicted novelty 7.0

Clutch accelerates vector-scalar comparisons in PuD systems via chunked temporal coding, delivering 2.9x throughput and 3.0x energy gains over prior bit-serial PuD while also mapping decision tree inference to PuD for...
DejaVu: Why You Should Write to Your DRAM Rows Twice, Carefully
cs.AR 2026-06 unverdicted novelty 7.0

DejaVu is a newly characterized data-pattern effect in commercial DDR4 DRAM where double-writing rows alters read-disturbance bitflip thresholds, shown across 112 chips and with implications for PUD operations.