arxiv: 2604.13743 · v1 · submitted 2026-04-15 · 💻 cs.DC · cs.DB

Recognition: unknown

OffloadFS: Leveraging Disaggregated Storage for Computation Offloading

Sungho Moon , Daegyu Han , Hera Koo , Sangeun Chae , Duck-Ho Bae , Euiseong Seo , Beomseok Nam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:23 UTC · model grok-4.3

classification 💻 cs.DC cs.DB

keywords disaggregated storagecomputation offloadingnear-data processingRocksDBuser-level file systemcache managementNVMe over fabricsmachine learning preprocessing

0 comments

The pith

OffloadFS offloads IO-intensive tasks like database compaction to disaggregated storage nodes for near-data processing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Disaggregated storage separates storage hardware from compute nodes but leaves the storage nodes' own processors and memory largely idle beyond basic I/O handling. OffloadFS is a user-level file system that lets applications send entire IO-heavy operations to those storage nodes without requiring distributed lock management across the cluster. Cache management is adjusted so that threads handling different operations interfere less with one another. The approach is demonstrated on RocksDB, where MemTable flush and compaction are moved to the storage node, and on machine-learning image preprocessing pipelines. Measured results show up to 3.36 times faster RocksDB operation and 1.85 times faster preprocessing compared with a conventional clustered file system.

Core claim

OffloadFS provides a file-system interface that supports computation offloading of IO-intensive tasks to disaggregated storage nodes using NVMe over fabrics. By eliminating the need for distributed locks and by reducing thread interference through cache-management changes, the system enables OffloadDB to move RocksDB compaction and flush work and OffloadPrep to move ML preprocessing work, producing performance gains of up to 3.36x and 1.85x respectively over OCFS2.

What carries the argument

OffloadFS, a user-level file system whose cache-management layer reduces interference between threads that perform distinct I/O operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The design suggests storage nodes can be treated as general-purpose compute resources rather than pure I/O devices.
Similar offloading could be applied to other data-intensive systems such as key-value stores or analytics engines.
Dynamic choice between storage-node and peer-node offloading may become feasible once the basic mechanism is in place.

Load-bearing premise

Disaggregated storage nodes have enough spare compute and memory capacity to run the offloaded tasks, and the cache optimizations eliminate thread interference without introducing new bottlenecks or correctness problems.

What would settle it

Running the same workloads on storage nodes that are already saturated with other work and observing whether the reported speedups vanish or turn into slowdowns.

Figures

Figures reproduced from arXiv: 2604.13743 by Beomseok Nam, Daegyu Han, Duck-Ho Bae, Euiseong Seo, Hera Koo, Sangeun Chae, Sungho Moon.

**Figure 2.** Figure 2: Performance with Various File Systems and SPDK [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Resource Under-utilization on Storage Node [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: OffloadFS Architecture tasks to remote CPU servers. In contrast, Pecan [25] and FusionFlow [32] propose to schedule data workers across both DNN training nodes with GPUs and remote CPU nodes. Unlike these previous studies, which use shared-disk file systems or distributed object stores and focus on leveraging idle resources of remote compute nodes, our study complements their work by showing the benefits … view at source ↗

**Figure 5.** Figure 5: OffloadDB Architecture A. Compaction Offloading In RocksDB, there are four types of IO operations: (1) logging for individual write requests, (2) creating L0 SSTables by flushing MemTables, (3) merge-sorting SSTables during compaction, and (4) updating the MANIFEST file to commit SSTables created and deleted during compaction. In OffloadDB, two background IO operations, i.e., flush and compaction, are o… view at source ↗

**Figure 6.** Figure 6: Log Recycling is considered garbage and the allocated blocks are reclaimed. Therefore, updating the MANIFEST file serves as the commit mark for the compaction operation, as in vanilla RocksDB. B. Log Recycling In vanilla RocksDB, when a client insert a key-value pair, it is indexed in the MemTable and also appended to the WAL file. When NVMeoF storage is used, the NVMe device that stores the WAL resides on… view at source ↗

**Figure 7.** Figure 7: Comparison with Shared-Disk File Systems [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: OffloadDB Scalability with YCSB A (a) AI Epoch Time (b) CPU Utilization on Target [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: ML Pre-Processing Scalability primary compute node, where OffloadPrep is running. When more than 50% of tasks are offloaded, the pre-processing time is affected by the computational capability of offloadee nodes, i.e., the storage or peer nodes. Unlike the OffloadDB results, ML preprocessing performs better when offloaded to peer compute nodes rather than to storage nodes, even when using OCFS2 or GFS2. Th… view at source ↗

**Figure 10.** Figure 10: Quantifying Effect of OFfloadDB Designs and Performance Com [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Latency-Throughput Analysis vanilla RocksDB. This is because OffloadFS is not currently optimized for sequential scans. We leave improving the scan performance of OffloadDB as future work. 2) Comparative Performance: In [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 13.** Figure 13: Impact of Offloading on Cache Pollution memory of the compute node is used for foreground queries, and the cache hit rate increases accordingly. VII. CONCLUSION This study identifies that NVMeoF-based disaggregated storage systems introduce a new type of resource underutilization problem within storage nodes. To address this challenge, we develop OffloadFS, a lightweight user-level file system that allow… view at source ↗

read the original abstract

Disaggregated storage systems improve resource utilization and enable independent scaling of storage and compute resources by separating storage resources from computing resources in data centers. NVMe over fabrics (NVMeoF) is a key technology that underpins the functionality and benefits of disaggregated storage systems. While NVMeoF inherently possesses substantial computing and memory capacity, these resources are often underutilized for tasks beyond simple I/O delegation. This study proposes OffloadFS, a user-level file system that enables offloaded IO-intensive tasks primarily to a disaggregated storage node for near-data processing, with the option to offload to peer compute nodes as well, without the need for distributed lock management. OffloadFS optimizes cache management by reducing interference between threads performing distinct I/O operations. On top of OffloadFS, we develop OffloadDB, which enables RocksDB to offload MemTable flush and compaction operations, and OffloadPrep, which offloads image pre-processing tasks for machine learning to disaggregated storage nodes. Our evaluation shows that OffloadFS improves the performance of RocksDB and machine learning pre-processing tasks by up to 3.36x and 1.85x, respectively, compared to OCFS2.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OffloadFS adds a user-level file system to push IO tasks like RocksDB flushes and image prep onto disaggregated storage nodes without distributed locks, with reported gains versus OCFS2.

read the letter

The core of this paper is a user-level file system that offloads IO-heavy work to storage nodes in disaggregated setups. It skips distributed lock management entirely and adds cache tweaks to limit interference between different IO threads. They built two applications on top: OffloadDB for RocksDB MemTable flushes and compaction, plus OffloadPrep for machine learning image preprocessing. The abstract gives concrete numbers—up to 3.36x for RocksDB and 1.85x for the prep tasks compared to OCFS2—which come from running the system rather than simulation alone. That combination of no-locks design and targeted offloads looks like the actual new piece relative to prior NVMeoF work. The cache optimization is a practical detail that could matter in real deployments where threads compete for resources. The paper stays grounded in implementation and measurement, which is a plus for systems work. The evaluation summary is thin on methods, though. No error bars, workload details, or full baseline descriptions appear in the abstract, so it is hard to tell how sensitive the gains are to specific hardware or load conditions. The assumption that storage nodes have spare compute and memory is standard for this area but still needs checking in the full text to see if it holds without creating new bottlenecks. This is the kind of paper that fits a systems conference track on storage or data-center architecture. Readers who already work with disaggregated setups or near-data processing will get the most out of the concrete offload examples and numbers. It has a working prototype and measurable results, so it deserves a serious referee who can examine the implementation and test reproducibility.

Referee Report

2 major / 2 minor

Summary. The paper introduces OffloadFS, a user-level file system that offloads I/O-intensive tasks to disaggregated storage nodes (with optional peer compute nodes) for near-data processing without distributed locks. It includes cache-management optimizations to reduce thread interference. Applications built on it are OffloadDB (offloading RocksDB MemTable flush and compaction) and OffloadPrep (offloading ML image pre-processing). Evaluation reports speedups of up to 3.36x for RocksDB and 1.85x for ML pre-processing versus OCFS2.

Significance. If the results hold under rigorous verification, the work demonstrates a practical way to exploit underutilized compute and memory on NVMeoF storage nodes, improving efficiency for I/O-bound workloads in disaggregated data centers. The concrete speedups for database and ML tasks highlight potential impact on resource utilization without requiring changes to distributed locking.

major comments (2)

[§5] §5 (Evaluation): the central performance claims (3.36x RocksDB, 1.85x ML pre-processing) are presented without error bars, number of runs, or full baseline configuration details (e.g., OCFS2 setup, hardware specs, or workload parameters). This prevents verification of statistical significance and raises risk of unstated selection effects or post-hoc tuning.
[§4.1] §4.1 (System Assumptions): the weakest assumption—that storage nodes have sufficient unused compute/memory and that cache optimizations avoid new bottlenecks—is stated but not quantified with utilization measurements before/after offloading. This is load-bearing for the offloading benefit claim.

minor comments (2)

[Figures/Tables] Figure 3 and Table 2: axis labels and legend entries use inconsistent abbreviations (e.g., 'OffloadFS' vs 'OFS') that should be standardized for clarity.
[§3.2] §3.2: the description of cache-management optimizations would benefit from a small pseudocode snippet or state diagram to illustrate thread-interference reduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve verifiability and strengthen the presentation of assumptions.

read point-by-point responses

Referee: [§5] §5 (Evaluation): the central performance claims (3.36x RocksDB, 1.85x ML pre-processing) are presented without error bars, number of runs, or full baseline configuration details (e.g., OCFS2 setup, hardware specs, or workload parameters). This prevents verification of statistical significance and raises risk of unstated selection effects or post-hoc tuning.

Authors: We agree that the evaluation section would benefit from greater statistical rigor and transparency. Our experiments were run multiple times per configuration on a controlled testbed to mitigate variability, but these details (including run counts, standard deviations, and complete OCFS2/hardware/workload parameters) were not fully reported. In the revised manuscript, we will add error bars (standard deviation), explicitly state the number of runs (minimum of five per data point), and include a dedicated subsection or appendix with full baseline configurations, hardware specifications, and workload parameters to enable independent verification. revision: yes
Referee: [§4.1] §4.1 (System Assumptions): the weakest assumption—that storage nodes have sufficient unused compute/memory and that cache optimizations avoid new bottlenecks—is stated but not quantified with utilization measurements before/after offloading. This is load-bearing for the offloading benefit claim.

Authors: We acknowledge that the assumptions regarding available compute/memory on storage nodes and the non-interference of cache optimizations are central and would be stronger with quantitative support. The manuscript states these based on the typical underutilization in disaggregated NVMeoF deployments, but does not provide before/after utilization data. We will revise Section 4.1 to include CPU and memory utilization measurements from our testbed (with and without offloading enabled), plus analysis confirming that the cache optimizations do not create new bottlenecks. This will be presented via additional figures or tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an implementation and evaluation paper for a user-level file system (OffloadFS) that offloads I/O tasks to disaggregated storage. The performance results (3.36x RocksDB, 1.85x ML pre-processing vs OCFS2) are obtained by running the implemented system and its extensions (OffloadDB, OffloadPrep) rather than any derivation, equations, fitted parameters, or self-citation chain. No load-bearing steps reduce to inputs by construction; the argument is self-contained empirical measurement.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5536 in / 1118 out tokens · 51521 ms · 2026-05-10T12:23:52.837162+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 1 canonical work pages

[1]

https://hbase.apache.org/

Apache HBase. https://hbase.apache.org/
[2]

https://github.com/google/leveldb

LevelDB. https://github.com/google/leveldb
[3]

https: //lenovopress.com/lp0742.pdf, 2017

Intel Xeon Scalable Family Balanced Memory Configurations. https: //lenovopress.com/lp0742.pdf, 2017

2017
[4]

https://nvmexpress.org/wp-content/uploads /NVMe-over-Fabrics-1.1-2019.10.22-Ratified.pdf, 2019

NVM Express over Fabrics. https://nvmexpress.org/wp-content/uploads /NVMe-over-Fabrics-1.1-2019.10.22-Ratified.pdf, 2019

2019
[5]

https://review.spdk.io/download/performance-repor ts/SPDK rdma mlx perf report 2305.pdf, 2023

SPDK NVMe-oF RDMA (Target & Initiator) Performance Report Release 23.05. https://review.spdk.io/download/performance-repor ts/SPDK rdma mlx perf report 2305.pdf, 2023

2023
[6]

https://github.com/poseidonos/poseidonos, 2025

PoseidonOS. https://github.com/poseidonos/poseidonos, 2025

2025
[7]

https://rocksdb.org/, 2025

RocksDB. https://rocksdb.org/, 2025

2025
[8]

Compaction Manage- ment in Distributed Key-Value Datastores.Proceedings of the VLDB Endowment, 8(8):850–861, 2015

Muhammad Yousuf Ahmad and Bettina Kemme. Compaction Manage- ment in Distributed Key-Value Datastores.Proceedings of the VLDB Endowment, 8(8):850–861, 2015

2015
[9]

Thekkath

Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Ji ˇr´ı ˇSimˇsa, and Chandramohan A. Thekkath. tf.data service: A Case for Disaggregating ML Input Data Processing. InACM Symposium on Cloud Computing (SoCC), page 358–375, 2023

2023
[10]

TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores

Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. InUSENIX Conference on Usenix Annual Technical Conference (USENIX ATC), pages 363–375, 2017

2017
[11]

SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores

Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravis- hankar Chandhiramoorthi, and Diego Didona. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In2019 USENIX Annual Technical Conference (USENIX ATC), pages 753–766, July 2019

2019
[12]

FloDB: Unlocking Memory in Persistent Key-Value Stores

Oana Balmau, Rachid Guerraoui, Vasileios Trigonakis, and Igor Zablotchi. FloDB: Unlocking Memory in Persistent Key-Value Stores. In12th European Conference on Computer Systems (EuroSys), pages 80–94, 2017

2017
[13]

Hail- storm: Disaggregated Compute and Storage for Distributed LSM-based Databases

Laurent Bindschaedler, Ashvin Goel, and Willy Zwaenepoel. Hail- storm: Disaggregated Compute and Storage for Distributed LSM-based Databases. In25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 301–316, 2020

2020
[14]

TerarkDB

ByteDance. TerarkDB. https://github.com/bytedance/terarkdb, 2021

2021
[15]

Zhichao Cao, Huibing Dong, Yixun Wei, Shiyong Liu, and David H. C. Du. IS-HBase: An In-Storage Computing Optimized HBase with I/O Offloading and Self-Adaptive Caching in Compute-Storage Disaggregated Infrastructure.ACM Transactions on Storage, 18(2), apr 2022

2022
[16]

SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage

Hao Chen, Chaoyi Ruan, Cheng Li, Xiaosong Ma, and Yinlong Xu. SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage. In19th USENIX Conference on File and Storage Technologies (FAST), pages 17–32, February 2021

2021
[17]

Near-Data Processing for Dif- ferentiable Machine Learning Models

Hyeokjun Choe, Seil Lee, Hyunha Nam, Seongsik Park, Seijoon Kim, Eui-Young Chung, and Sungroh Yoon. Near-Data Processing for Dif- ferentiable Machine Learning Models. https://arxiv.org/pdf/1610.02273, 2017

work page arXiv 2017
[18]

Benchmarking cloud serving systems with YCSB

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In ACM Symposium on Cloud Computing (SoCC), pages 143–154, 2010

2010
[19]

D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage.ACM Transactions on Architecture and Code Optimization, apr 2024

Chen Ding, Jian Zhou, Kai Lu, Sicen Li, Yiqin Xiong, Jiguang Wan, and Ling Zhan. D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage.ACM Transactions on Architecture and Code Optimization, apr 2024. Just Accepted

2024
[20]

DComp: Efficient Offload of LSM-Tree Compaction with Data Process- ing Units

Chen Ding, Jian Zhou, Jiguang Wan, Yiqin Xiong, Sicen Li, Shuning Chen, Hanyang Liu, Liu Tang, Ling Zhan, Kai Lu, and Peng Xu. DComp: Efficient Offload of LSM-Tree Compaction with Data Process- ing Units. In52nd International Conference on Parallel Processing (ICPP), page 233–243, 2023

2023
[21]

Evolu- tion of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience

Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. Evolu- tion of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience. In19th USENIX Conference on File and Storage Technologies (FAST), pages 33–49, February 2021

2021
[22]

Disaggregating RocksDB: A Production Experience

Siying Dong, Shiva Shankar P, Satadru Pan, Anand Ananthab- hotla, Dhanabal Ekambaram, Abhinav Sharma, Shobhit Dayal, Nis- hant Vinaybhai Parikh, Yanqin Jin, Albert Kim, Sushil Patil, Jay Zhuang, Sam Dunster, Akanksha Mahajan, Anirudh Chelluri, Chaitanya Datye, Lucas Vasconcelos Santana, Nitin Garg, and Omkar Gawde. Disaggregating RocksDB: A Production Exp...

2023
[23]

Practical Near-Data Processing for In-Memory Analytics Frameworks

Mingyu Gao, Grant Ayers, and Christos Kozyrakis. Practical Near-Data Processing for In-Memory Analytics Frameworks. In2015 International Conference on Parallel Architecture and Compilation (PACT), pages 113–124, 2015

2015
[24]

Thekkath, and Ana Klimovic

Dan Graur, Damien Aymon, Dan Kluser, Tanguy Albrici, Chandramo- han A. Thekkath, and Ana Klimovic. Cachew: Machine Learning Input Data Processing as a Service. InUSENIX Annual Technical Conference (USENIX ATC), pages 689–706, July 2022

2022
[25]

Thekkath, and Ana Klimovic

Dan Graur, Oto Mraz, Muyu Li, Sepehr Pourghannad, Chandramohan A. Thekkath, and Ana Klimovic. Pecan: Cost-Efficient ML Data Prepro- cessing with Automatic Transformation Ordering and Hybrid Placement. 12 InUSENIX Annual Technical Conference (USENIX ATC), pages 649– 665, July 2024

2024
[26]

Per- formance characterization of nvme-over-fabrics storage disaggregation

Zvika Guz, Harry Li, Anahita Shayesteh, and Vijay Balakrishnan. Per- formance characterization of nvme-over-fabrics storage disaggregation. ACM Transactions on Storage, 14(4):1–18, 2018

2018
[27]

Performance Characterization of NVMe-over-Fabrics Storage Disaggregation.ACM Transactions on Storage, 14(4), dec 2018

Zvika Guz, Harry (Huan) Li, Anahita Shayesteh, and Vijay Balakr- ishnan. Performance Characterization of NVMe-over-Fabrics Storage Disaggregation.ACM Transactions on Storage, 14(4), dec 2018

2018
[28]

X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing

Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing. InACM SIGMOD International Conference on Management of Data (SIGMOD), page 651–665, 2019

2019
[29]

Noh, and Young ri Choi

Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young ri Choi. SLM-DB: Single-Level Key-Value Store with Persistent Memory. In17th USENIX Conference on File and Storage Technologies (FAST), 2019

2019
[30]

BoLT: Barrier-Optimized LSM-Tree

Dongui Kim, Chanyeol Park, Sang-Won Lee, and Beomseok Nam. BoLT: Barrier-Optimized LSM-Tree. In21st International Middleware Conference (Middleware), page 119–133, 2020

2020
[31]

LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism

Jongyul Kim, Insu Jang, Waleed Reda, Jaeseong Im, Marco Canini, Dejan Kosti ´c, Youngjin Kwon, Simon Peter, and Emmett Witchel. LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism. InACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP), pages 756–771, 2021

2021
[32]

FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation.Proceedings of the VLDB Endowment, 17(4):863–876, mar 2024

Taeyoon Kim, ChanHo Park, Mansur Mukimbekov, Heelim Hong, Min- seok Kim, Ze Jin, Changdae Kim, Ji-Yong Shin, and Myeongjae Jeon. FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation.Proceedings of the VLDB Endowment, 17(4):863–876, mar 2024

2024
[33]

ListDB: Union of Write- Ahead Logs and Persistent SkipLists for Incremental Checkpointing on Persistent Memory

Wonbae Kim, Chanyeol Park, Dongui Kim, Hyeongjun Park, Young ri Choi, Alan Sussman, and Beomseok Nam. ListDB: Union of Write- Ahead Logs and Persistent SkipLists for Incremental Checkpointing on Persistent Memory. In16th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 161–177, July 2022

2022
[34]

The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale.International Journal of Computer Vision, 128, 03 2020

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale.International Journal of Computer Vision, 128...

2020
[35]

Cassandra: A Decentralized Structured Storage System.ACM SIGOPS Operating Systems Review, 44(2):35–40, April 2010

Avinash Lakshman and Prashant Malik. Cassandra: A Decentralized Structured Storage System.ACM SIGOPS Operating Systems Review, 44(2):35–40, April 2010

2010
[36]

Understanding Rack-Scale Disaggregated Storage

Sergey Legtchenko, Hugh Williams, Kaveh Razavi, Austin Donnelly, Richard Black, Andrew Douglas, Nathanael Cheriere, Daniel Fryer, Kai Mast, Angela Demke Brown, Ana Klimovic, Andy Slowey, and Antony Rowstron. Understanding Rack-Scale Disaggregated Storage. In9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), July 2017

2017
[37]

KVell: The Design and Implementation of a Fast Persistent Key-Value Store

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. Inthe 27th ACM Symposium on Operating Systems Principles (SOSP), page 447–461, 2019

2019
[38]

Elastic and Stable Compaction for LSM-tree: A FaaS- based Approach on TerarkDB

Jianchuan Li, Peiquan Jin, Yuanjin Lin, Ming Zhao, Yi Wang, and Kuankuan Guo. Elastic and Stable Compaction for LSM-tree: A FaaS- based Approach on TerarkDB. In30th ACM International Conference on Information & Knowledge Management (CIKM), pages 3906–3915, 2021

2021
[39]

LSM-tree Compaction Acceleration Using In-storage Processing

Minje Lim, Jeeyoon Jung, and Dongkun Shin. LSM-tree Compaction Acceleration Using In-storage Processing. In2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pages 1–3, 2021

2021
[40]

Analyzing and Mitigating Data Stalls in DNN Training

Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. Analyzing and Mitigating Data Stalls in DNN Training. Proceedings of the VLDB Endowment, 14(5):771–784, jan 2021

2021
[41]

The Log-structured Merge-tree (LSM-tree).Acta Informatica, 33(4):351–385, June 1996

Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. The Log-structured Merge-tree (LSM-tree).Acta Informatica, 33(4):351–385, June 1996

1996
[42]

Lockify: Understanding Linux Distributed Lock Management Overheads in Shared Storage

Taeyoung Park, Yunjae Jo, Daegyu Han, Beomseok Nam, and Jaehyun Hwang. Lockify: Understanding Linux Distributed Lock Management Overheads in Shared Storage. In24th USENIX Conference on File and Storage Technologies (FAST), February 2026

2026
[43]

PebblesDB: Building Key-Value Stores Using Fragmented Log- Structured Merge Trees

Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abra- ham. PebblesDB: Building Key-Value Stores Using Fragmented Log- Structured Merge Trees. In26th Symposium on Operating Systems Principles (SOSP), pages 497–514, 2017

2017
[44]

Near-Data Processing-Enabled and Time-Aware Compaction Op- timization for LSM-tree-based Key-Value Stores

Hui Sun, Wei Liu, Jianzhong Huang, Song Fu, Zhi Qiao, and Weisong Shi. Near-Data Processing-Enabled and Time-Aware Compaction Op- timization for LSM-tree-based Key-Value Stores. In48th International Conference on Parallel Processing (ICPP), pages 1–11, 2019

2019
[45]

GLSM: Using GPGPU to Accelerate Compactions in LSM-Tree-Based Key-Value Stores.ACM Transactions on Storage, nov

Hui Sun, Jinfeng Xu, Xiangxiang Jiang, Guanzhong Chen, Yinliang Yue, and Xiao Qin. GLSM: Using GPGPU to Accelerate Compactions in LSM-Tree-Based Key-Value Stores.ACM Transactions on Storage, nov
[46]

FPGA- based Compaction Engine for Accelerating LSM-tree Key-Value Stores

Xuan Sun, Jinghuan Yu, Zimeng Zhou, and Chun Jason Xue. FPGA- based Compaction Engine for Accelerating LSM-tree Key-Value Stores. InIEEE 36th International Conference on Data Engineering (ICDE), pages 1261–1272. IEEE, 2020

2020
[47]

White Paper - Memory Population Rules for 3rd Generation Intel® Xeon® Scalable Processors on PowerEdge Servers, release 1.2

Dell Technologies. White Paper - Memory Population Rules for 3rd Generation Intel® Xeon® Scalable Processors on PowerEdge Servers, release 1.2. https://www.delltechnologies.com/asset/en-us/products/s ervers/industry-market/whitepaper-memory-population-rules-for-3rd -generation-intel-xeon-scalable-processors-on-poweredge-servers.pdf, 2021

2021
[48]

A Low-Cost Disk Solution Enabling LSM- Tree to Achieve High Performance for Mixed Read/Write Workloads

Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Yanfeng Zhang, Siyuan Ma, and Xiaodong Zhang. A Low-Cost Disk Solution Enabling LSM- Tree to Achieve High Performance for Mixed Read/Write Workloads. ACM Transactions on Storage, 14(2), apr 2018

2018
[49]

Ioflow: A Software-Defined Storage Architecture

Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. Ioflow: A Software-Defined Storage Architecture. In24th ACM Symposium on Operating Systems Principles (SOSP), pages 182–196, 2013

2013
[50]

Edge Cloud Offloading Algorithms: Issues, Methods, and Perspectives.ACM Computing Surveys, 52(1), feb 2019

Jianyu Wang, Jianli Pan, Flavio Esposito, Prasad Calyam, Zhicheng Yang, and Prasant Mohapatra. Edge Cloud Offloading Algorithms: Issues, Methods, and Perspectives.ACM Computing Surveys, 52(1), feb 2019

2019
[51]

A Se- lective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

Meng Wang, Gus Waldspurger, and Swaminathan Sundararaman. A Se- lective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training. In16th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage), page 63–70, 2024

2024
[52]

Range Cache: An Efficient Cache Component for Accelerating Range Queries on LSM - Based Key-Value Stores

Xiaoliang Wang, Peiquan Jin, Yongping Luo, and Zhaole Chu. Range Cache: An Efficient Cache Component for Accelerating Range Queries on LSM - Based Key-Value Stores. InIEEE 40th International Conference on Data Engineering (ICDE), pages 488–500, 2024

2024
[53]

CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical Sections

Yiduo Wang, Yufei Wu, Cheng Li, Pengfei Zheng, Biao Cao, Yan Sun, Fei Zhou, Yinlong Xu, Yao Wang, and Guangjun Xie. CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical Sections. InProceedings of the Eighteenth European Conference on Computer Systems (EuroSys), page 331–346, 2023

2023
[54]

Fenggang Wu, Ming-Hong Yang, Baoquan Zhang, and David H.C. Du. AC-Key: Adaptive Caching for LSM-based Key-Value Stores. In USENIX Annual Technical Conference (USENIX ATC), pages 603–615, July 2020

2020
[55]

LightPool: A NVMe-oF-based High-performance and Lightweight Storage Pool Architecture for Cloud-Native Distributed Database

Jiexiong Xu, Yiquan Chen, Yijing Wang, Wenhui Shi, Guoju Fang, Yi Chen, Huasheng Liao, Yang Wang, Hai Lin, Zhen Jin, Qiang Liu, and Wenzhi Chen. LightPool: A NVMe-oF-based High-performance and Lightweight Storage Pool Architecture for Cloud-Native Distributed Database. InIEEE International Symposium on High-Performance Computer Architecture (HPCA), pages ...

2024
[56]

Leaper: a learned prefetcher for cache invalidation in LSM-tree based storage engines.Proceedings of the VLDB Endowment, 13(12):1976–1989, jul 2020

Lei Yang, Hong Wu, Tieying Zhang, Xuntao Cheng, Feifei Li, Lei Zou, Yujie Wang, Rongyao Chen, Jianying Wang, and Gui Huang. Leaper: a learned prefetcher for cache invalidation in LSM-tree based storage engines.Proceedings of the VLDB Endowment, 13(12):1976–1989, jul 2020

1976
[57]

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM

Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. InUSENIX Annual Technical Conference (USENIX ATC), pages 17–31, July 2020

2020
[58]

FPGA- Accelerated Compactions for LSM-based Key-Value Store

Teng Zhang, Jianying Wang, Xuntao Cheng, Hao Xu, Nanlong Yu, Gui Huang, Tieying Zhang, Dengcheng He, Feifei Li, Wei Cao, et al. FPGA- Accelerated Compactions for LSM-based Key-Value Store. In18th USENIX Conference on File and Storage Technologies (FAST), pages 225–237, 2020

2020
[59]

The Next Evolution in Storage: Understanding NVMe over Fabrics (NVMeoF)

Jan Ziele ´znicki. The Next Evolution in Storage: Understanding NVMe over Fabrics (NVMeoF). https://codilime.com/blog/understanding-nvm e-over-fabrics-nvmeof/, 2024. 13

2024