Recognition: unknown
MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding
Pith reviewed 2026-05-08 16:00 UTC · model grok-4.3
The pith
MCFlash performs bulk bitwise operations directly inside commercial 3D NAND flash chips using only standard user-mode instructions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MCFlash is a technique that executes bulk bitwise operations directly within commercial off-the-shelf 3D NAND flash chips. It relies solely on standard user-mode instructions, combining Multi-Level Cell data encodings with dynamically tuned read reference voltages to execute in-place bitwise operations. Evaluations across diverse NAND chips, both floating-gate and charge-trap, demonstrate error-free operation sustaining over one billion operations on fresh blocks and bit-error rates below 0.015 percent even after 10,000 program/erase cycles.
What carries the argument
The pairing of multi-level cell charge encodings with on-the-fly adjustment of read reference voltages that turns standard sense operations into bitwise logic gates performed inside the NAND array itself.
If this is right
- Bulk bitwise operations become possible without moving data between the NAND array and an external processor.
- Energy and latency costs for scans and reductions over large bit vectors drop because computation stays inside the memory.
- The same approach works across floating-gate and charge-trap cell technologies from different generations.
- Reliability holds for more than a billion operations on fresh blocks and continues after heavy wear.
Where Pith is reading between the lines
- Storage arrays could act as simple compute units for database filters or neural-network bit packing without extra hardware.
- Chaining several bitwise steps inside the array might support wider arithmetic without leaving the chip.
- Power savings in large-scale analytics systems would grow if the method scales to entire planes or dies running in parallel.
Load-bearing premise
That any commercial 3D NAND chip will accept and correctly respond to the same user-mode read commands and voltage settings without hidden internal behaviors that would break the bitwise results.
What would settle it
Running the described bitwise sequences on a new commercial 3D NAND chip model, applying the same dynamic voltage adjustments through ordinary commands, and checking whether the output bit-error rate stays below 0.015 percent after 10,000 program-erase cycles; failure on either fresh or aged blocks would disprove the central claim.
read the original abstract
This paper presents MCFlash, a practical and immediately deployable technique for executing bulk bitwise operations directly within commercial off-the-shelf(COTS) 3D NAND flash chips. MCFlash relies solely on standard user-mode instructions, combining Multi-Level Cell (MLC) data encodings with dynamically tuned read reference voltages to execute in-place bitwise operations. We evaluate MCFlash across diverse NAND flash chips, both floating-gate and charge-trap variants, from different generations. Our results represent the first demonstration of error-free, on-chip bitwise operations, sustaining over one billion operations on fresh blocks and maintaining bit-error rates below 0.015% even after 10,000 program/erase (P/E) cycles.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents MCFlash, a technique for performing bulk bitwise operations directly inside commercial off-the-shelf 3D NAND flash chips. It combines multi-level cell encodings with dynamically tuned read reference voltages applied via standard user-mode instructions, and reports error-free execution of over one billion operations on fresh blocks together with bit-error rates below 0.015% after 10,000 P/E cycles across both floating-gate and charge-trap devices from multiple generations.
Significance. If the central claims are substantiated, the work would constitute a meaningful advance in in-memory computing for NAND-based systems by demonstrating a practical, hardware-agnostic method for on-chip bitwise processing that requires no custom silicon and exhibits strong endurance. The cross-generation, cross-technology evaluation is a positive feature that supports broader applicability.
major comments (2)
- [Methods] Methods section (voltage tuning procedure): the claim that bitwise operations are performed using only standard user-mode instructions plus dynamically tuned read voltages is load-bearing for both the 'immediately deployable on COTS' assertion and the 'first demonstration' status, yet the manuscript provides insufficient detail on the exact command sequences, calibration steps, and safeguards against FTL intervention or vendor-specific restrictions. Without this, it is impossible to confirm that the tuning does not rely on non-standard access unavailable in normal operation.
- [Evaluation] Evaluation / Results section: the abstract states concrete metrics (error-free operation for >1 billion cycles, BER < 0.015% after 10k P/E), but the manuscript lacks accompanying data tables, statistical summaries, or raw measurement logs that would allow independent verification of these numbers under the stated conditions.
minor comments (1)
- [Abstract] The abstract would benefit from explicitly naming the bitwise operations (AND, OR, XOR, etc.) demonstrated so that the scope of the claimed error-free execution is immediately clear.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential significance of MCFlash as a practical in-memory computing approach on COTS 3D NAND. We address each major comment point by point below and outline the revisions we will make to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Methods] Methods section (voltage tuning procedure): the claim that bitwise operations are performed using only standard user-mode instructions plus dynamically tuned read voltages is load-bearing for both the 'immediately deployable on COTS' assertion and the 'first demonstration' status, yet the manuscript provides insufficient detail on the exact command sequences, calibration steps, and safeguards against FTL intervention or vendor-specific restrictions. Without this, it is impossible to confirm that the tuning does not rely on non-standard access unavailable in normal operation.
Authors: We agree that the current Methods section would benefit from greater specificity to allow readers to reproduce the exact procedure. In the revised manuscript we will add a dedicated subsection that enumerates the precise sequence of standard user-mode NAND commands (including the read, program, and erase opcodes issued through the standard interface), the iterative calibration algorithm used to select the dynamic read reference voltages for each multi-level encoding, and the explicit steps taken to operate on raw blocks while disabling or bypassing file-system and FTL layers (e.g., by using direct block-level access on unmounted devices). These additions will be supported by pseudocode and timing diagrams so that the claim of standard-instruction-only operation can be independently verified. revision: yes
-
Referee: [Evaluation] Evaluation / Results section: the abstract states concrete metrics (error-free operation for >1 billion cycles, BER < 0.015% after 10k P/E), but the manuscript lacks accompanying data tables, statistical summaries, or raw measurement logs that would allow independent verification of these numbers under the stated conditions.
Authors: The aggregate results are derived from repeated trials across multiple devices and P/E points, but we concur that tabular presentation and statistical detail would strengthen verifiability. The revised manuscript will include new tables that report, for each device generation and technology, the mean BER, standard deviation, number of trials, and total operations performed at 0, 1k, 5k, and 10k P/E cycles. A supplementary data file containing per-trial BER logs for a representative subset of experiments will also be provided. Full raw traces exceed practical appendix size; therefore we will make a curated subset available upon request while ensuring the tables allow direct confirmation of the reported thresholds. revision: partial
Circularity Check
No circularity: experimental hardware technique with no equations or fitted predictions
full rationale
The paper describes an empirical hardware technique for in-place bitwise operations on COTS 3D NAND using standard user-mode commands and dynamic read-voltage tuning. No mathematical derivation chain, equations, parameters fitted to subsets of data, or self-citation load-bearing premises are present. Claims rest on direct measurements across multiple chip generations and types, with error rates reported from physical experiments rather than any constructed prediction that reduces to the inputs. This is the expected non-finding for a purely experimental systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
D. S. Cali, G. S. Kalsi, Z. Bing¨ ol, C. Firtina, L. Subramanian, J. S. Kim, R. Ausavarungnirun, M. Alser, J. Gomez-Luna, A. Boroumand, A. Norion, A. Scibisz, S. Subramoneyon, C. Alkan, S. Ghose, and O. Mutlu. Genasm: A high- performance, low-power approximate string matching acceleration framework for genome sequence analysis. In53rd Annual IEEE/ACM Inte...
2020
-
[2]
Perach, R
B. Perach, R. Ronen, B. Kimelfeld, and S. Kvatinsky. Understanding bulk- bitwise processing in-memory through database analytics.IEEE Transactions on Emerging Topics in Computing, 12(1):7–22, 2024
2024
-
[3]
Besta, R
M. Besta, R. Kanakagiri, G. Kwasniewski, R. Ausavarungnirun, J. Ber´ anek, K. Kanellopoulos, K. Janda, Z. Vonarburg-Shmaria, L. Gianinazzi, I. Stefan, J. G. Luna, J. Golinowski, M. Copik, L. Kapp-Schwoerer, S. Di Girolamo, N. Blach, M. Konieczny, O. Mutlu, and T. Hoefler. Sisa: Set-centric instruction set archi- tecture for graph mining on processing-in-m...
2021
-
[4]
Karunaratne, M
G. Karunaratne, M. Le Gallo, G. Cherubini, L. Benini, A. Rahimi, and A. Sebas- tian. In-memory hyperdimensional computing.Nature Electronics, 3(6):327–337, 2020
2020
-
[5]
Lee and J.-H
S.-T. Lee and J.-H. Lee. Neuromorphic computing using nand flash memory architecture with pulse width modulation scheme.Frontiers in Neuroscience, 14:571292, 2020
2020
-
[7]
C. Gao, X. Xin, Y. Lu, Y. Zhang, J. Yang, and J. Shu. Parabit: Processing parallel bitwise operations in nand flash memory based ssds. InMICRO-54, pages 59–70, 2021
2021
-
[8]
J. Park, R. Azizi, G. F. Oliveira, M. Sadrosadati, R. Nadig, D. Novo, J. G´ omez- Luna, M. Kim, and O. Mutlu. Flash-cosmos: In-flash bulk bitwise operations using inherent computation capability of nand flash memory. In55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 937–955, 2022
2022
-
[9]
Merrikh-Bayat, X
F. Merrikh-Bayat, X. Guo, M. Klachko, M. Prezioso, K. K. Likharev, and D. B. Strukov. High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cell arrays.IEEE Transactions on Neural Networks and Learning Systems, 29(10):4782–4790, 2018
2018
-
[10]
W. H. Choi, P.-F. Chiu, W. Ma, G. Hemink, T. T. Hoang, M. Lueker-Boden, and Z. Bandic. An in-flash binary neural network accelerator with slc nand flash array. InIEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5, 2020
2020
-
[11]
Bavandpour, S
M. Bavandpour, S. Sahay, M. R. Mahmoodi, and D. Strukov. Mixed-signal vector- by-matrix multiplier circuits based on 3d-nand memories for neurocomputing. In Design, Automation & Test in Europe Conference (DATE), pages 696–701, 2020
2020
-
[12]
M. Kim, M. Liu, L. R. Everson, and C. H. Kim. An embedded nand flash- based compute-in-memory array demonstrated in a standard logic process.IEEE Journal of Solid-State Circuits, 57(2):625–638, 2022
2022
-
[13]
P. Wang, F. Xu, B. Wang, B. Gao, H. Wu, H. Qian, and S. Yu. Three-dimensional nand flash for vector–matrix multiplication.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(4):988–991, 2019
2019
-
[14]
B. Gu, A. S. Yoon, D.-H. Bae, I. Jo, J. Lee, J. Yoon, J.-U. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, and D. Chang. Biscuit: A framework for near-data processing of big data workloads. In43rd International Symposium on Computer 24 Architecture (ISCA), pages 153–165, 2016
2016
-
[15]
Seshadri, M
S. Seshadri, M. Gahagan, S. Bhaskaran, T. Bunker, A. De, Y. Jin, Y. Liu, and S. Swanson. Willow: A user-programmable ssd. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 67–80, 2014
2014
-
[16]
G. Koo, K. K. Matam, T. I, H. V. K. G. Narra, J. Li, H.-W. Tseng, S. Swanson, and M. Annavaram. Summarizer: Trading communication with computing near storage. InMICRO-50, pages 219–231, 2017
2017
-
[17]
Mansouri Ghiasi, J
N. Mansouri Ghiasi, J. Park, H. Mustafa, J. Kim, A. Olgun, A. Gollwitzer, D. Senol Cali, C. Firtina, H. Mao, N. Almadhoun Alserr, R. Ausavarungnirun, N. Vijaykumar, M. Alser, and O. Mutlu. Genstore: A high-performance in-storage processing system for genome sequence analysis. InASPLOS, pages 635–654, 2022
2022
-
[18]
Z. Ruan, T. He, and J. Cong. Insider: Designing in-storage computing system for emerging high-performance drive. InUSENIX Annual Technical Conference, pages 379–394, 2019
2019
-
[19]
S. Pei, J. Yang, and Q. Yang. Registor: A platform for unstructured data processing inside ssd storage.ACM Transactions on Storage, 15(1):7:1–7:24, 2019
2019
-
[20]
D. Gouk, M. Kwon, H. Bae, and M. Jung. Dockerssd: Containerized in-storage processing and hardware acceleration for computational ssds. InIEEE Interna- tional Symposium on High-Performance Computer Architecture (HPCA), pages 379–394, 2024
2024
-
[21]
V. S. Mailthody, Z. Qureshi, W. Liang, Z. Feng, S. G. de Gonzalo, Y. Li, H. Franke, J. Xiong, J. Huang, and W. Hwu. Deepstore: In-storage acceleration for intelligent queries. InMICRO-52, pages 224–238, 2019
2019
-
[22]
Torabzadehkashi, S
M. Torabzadehkashi, S. Rezaei, A. Heydarigorji, H. Bobarshad, V. Alves, and N. Bagherzadeh. Catalina: In-storage processing acceleration for scalable big data analytics. In27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pages 430–437, 2019
2019
-
[23]
Hajinazar, G
N. Hajinazar, G. F. Oliveira, S. Gregorio, J. D. Ferreira, N. M. Ghiasi, M. Patel, M. Alser, S. Ghose, J. G´ omez-Luna, and O. Mutlu. Simdram: A framework for bit-serial simd processing using dram. InASPLOS, pages 329–345, 2021
2021
-
[24]
Seshadri, Y
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization. InMICRO-46, pages 185–197, 2013
2013
-
[25]
Seshadri, D
V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Ambit: In-memory accelerator for bulk bitwise operations using commodity dram technology. InMICRO-50, pages 273–287, 2017
2017
-
[26]
S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A processing- in-memory architecture for bulk bitwise operations in emerging non-volatile memories. InDesign Automation Conference (DAC), pages 1–6, 2016
2016
-
[27]
M. F. Ali, A. Jaiswal, and K. Roy. In-memory low-cost bit-serial addition using commodity dram technology.IEEE Transactions on Circuits and Systems I: Regular Papers, 67(1):155–165, 2020. 25
2020
-
[28]
X. Xin, Y. Zhang, and J. Yang. Elp2im: Efficient and low power bitwise operation processing in dram. InHPCA, pages 303–314, 2020
2020
-
[29]
G. Dai, T. Huang, Y. Chi, J. Zhao, G. Sun, Y. Liu, Y. Wang, Y. Xie, and H. Yang. Graphh: A processing-in-memory architecture for large-scale graph pro- cessing.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(4):640–653, 2019
2019
-
[30]
˙I. E. Y¨ uksel, Y. C. Tu˘ grul, F. N. Bostancı, G. F. Oliveira, A. G. Ya˘ glık¸ cı, A. Olgun, M. Soysal, H. Luo, J. G´ omez-Luna, M. Sadrosadati, and O. Mutlu. Simultaneous many-row activation in off-the-shelf dram chips: Experimental characterization and analysis. InDSN, pages 99–114, 2024
2024
-
[31]
G. F. Oliveira, A. Olgun, A. G. Ya˘ glık¸ cı, F. N. Bostancı, J. G´ omez-Luna, S. Ghose, and O. Mutlu. Mimdram: An end-to-end processing-using-dram system for high- throughput, energy-efficient and programmer-transparent multiple-instruction multiple-data computing. InHPCA, pages 186–203, 2024
2024
-
[32]
G. F. Oliveira Junior, M. Kabra, Y. Guo, K. Chen, A. G. Yaglikci, M. Soysal, M. Sadrosadati, J. O. Bueno, S. Ghose, J. G´ omez-Luna, and O. Mutlu. Proteus: Achieving high-performance processing-using-dram with dynamic bit-precision, adaptive data representation, and flexible arithmetic. InICS, pages 473–494, 2025
2025
-
[33]
F. Gao, G. Tziantzioulis, and D. Wentzlaff. Computedram: In-memory compute using off-the-shelf drams. InMICRO-52, pages 100–113, 2019
2019
-
[34]
˙I. E. Y¨ uksel, Y. C. Tu˘ grul, A. Olgun, F. N. Bostancı, A. G. Ya˘ glık¸ cı, G. F. Oliveira, H. Luo, J. G´ omez-Luna, M. Sadrosadati, and O. Mutlu. Functionally-complete boolean logic in real dram chips: Experimental characterization and analysis. In HPCA, pages 280–296, 2024
2024
-
[35]
Eckert, X
C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das. Neural cache: Bit-serial in-cache acceleration of deep neural networks. InISCA, pages 383–396, 2018
2018
-
[36]
S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. Compute caches. InHPCA, pages 481–492, 2017
2017
-
[37]
Zhang, Z
J. Zhang, Z. Wang, and N. Verma. In-memory computation of a machine-learning classifier in a standard 6t sram array.IEEE Journal of Solid-State Circuits, 52(4):915–924, 2017
2017
-
[38]
M. Kang, S. K. Gonugondla, and N. R. Shanbhag. Deep in-memory architectures in sram: An analog approach to approximate computing.Proceedings of the IEEE, 108(12):2251–2275, 2020
2020
-
[39]
Kim and W
J. Kim and W. Sung. Low-energy error correction of nand flash memory through soft-decision decoding.EURASIP Journal on Advances in Signal Processing, 2012(1):195, 2012
2012
-
[40]
J. Park, M. Kim, M. Chun, L. Orosa, J. Kim, and O. Mutlu. Reducing solid-state drive read latency by optimizing read-retry. InASPLOS, pages 702–716, 2021
2021
-
[41]
Lee, H.-S
J. Lee, H.-S. Im, D.-S. Byeon, K.-H. Lee, D.-H. Chae, K.-H. Lee, S. W. Hwang, S.- S. Lee, Y.-H. Lim, J.-D. Lee, J.-D. Choi, Y.-I. Seo, J.-S. Lee, and K.-D. Suh. High- performance 1-gb-nand flash memory with 0.12-µm technology.IEEE Journal of Solid-State Circuits, 37(11):1502–1509, 2002. 26
2002
-
[42]
Ftdi chip, 2025
Future Technology Devices International Ltd. Ftdi chip, 2025
2025
-
[43]
Open nandflash interface specification revision 5.2, 2025
Open NANDFlash Interface. Open nandflash interface specification revision 5.2, 2025
2025
-
[44]
Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and C. Ren. Exploring and exploiting the multilevel parallelism inside ssds for improved performance and endurance. IEEE Transactions on Computers, 62(6):1141–1155, 2013
2013
-
[45]
C. Kim, J. Ryu, T. Lee, H. Kim, J. Lim, J. Jeong, S. Seo, H. Jeon, B. Kim, I. Lee, D. Lee, P. Kwak, S. Cho, Y. Yim, C. Cho, W. Jeong, K. Park, J.-M. Han, D. Song, K. Kyung, Y.-H. Lim, and Y.-H. Jun. A 21 nm high performance 64 gb mlc nand flash memory with 400 mb/s asynchronous toggle ddr interface.IEEE Journal of Solid-State Circuits, 47(4):981–989, 2012
2012
-
[46]
Kim, D.-H
C. Kim, D.-H. Kim, W. Jeong, H.-J. Kim, I. H. Park, H.-W. Park, J. Lee, J. Park, Y.-L. Ahn, J. Y. Lee, S.-B. Kim, H. Yoon, J. D. Yu, N. Choi, N. Kim, H. Jang, J. Park, S. Song, Y. Park, J. Bang, S. Hong, Y. Choi, M.-S. Kim, H. Kim, P. Kwak, J.-D. Ihm, D. S. Byeon, J.-Y. Lee, K.-T. Park, and K.-H. Kyung. A 512-gb 3- b/cell 64-stacked wl 3-d-nand flash memo...
2018
-
[47]
H. U. Rahman, S. Tharini, S. Pasricha, and B. Ray. Tcflash: In-flash bulk bitwise processing via dynamic sensing and tlc encoding in 3d nand. In2025 IEEE 43rd International Conference on Computer Design (ICCD), pages 491–494, 2025. 27
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.