arxiv: 2605.11517 · v1 · submitted 2026-05-12 · 💻 cs.DC

Recognition: no theorem link

GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading

Hongsun Jang, Hunseong Lim, Jaewon Jung, Jaeyong Song, Jinho Lee, Junguk Hong, Seongyeon Park

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:52 UTC · model grok-4.3

classification 💻 cs.DC

keywords GNN trainingstorage offloadingfull-graph trainingmemory wallNVMe SSDsingle GPUgraph neural networks

0 comments

The pith

GriNNder enables full-graph GNN training on a single GPU by offloading data to storage devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GriNNder as a method to perform full-graph training of graph neural networks without being limited by GPU or host memory size. It achieves this by using high-capacity storage devices through a structured offloading approach that handles the unique data access patterns of GNNs. This matters because it allows training on much larger graphs using only a single machine, avoiding the costs and complexities of distributed multi-GPU systems while delivering comparable speeds.

Core claim

GriNNder is the first system to use storage offloading for full-graph GNN training under memory constraints by introducing structured storage offloading (SSO). SSO coordinates caching in host memory based on cross-partition dependencies, regathers data for gradient computation to eliminate redundant storage accesses, and applies lightweight partitioning to reduce memory needs.

What carries the argument

Structured storage offloading (SSO), which manages the GPU-host-storage hierarchy using coordinated cache, regather, and bypass mechanisms tailored to GNN access patterns.

If this is right

Full-graph training becomes feasible for graphs that exceed available memory capacities.
Training achieves speedups of up to 9.78 times compared to previous single-server approaches.
Throughput reaches levels comparable to those of distributed multi-server systems.
Large-scale GNN training no longer requires multiple GPUs or servers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be adapted for other machine learning workloads that involve large, irregularly accessed datasets.
Adoption might significantly reduce the infrastructure costs for training large graph models in resource-limited settings.
Testing on graphs with varying sparsity or different storage hardware would reveal the limits of the offloading benefits.

Load-bearing premise

The overhead of storage accesses stays low enough, thanks to GNN data patterns and modern SSD bandwidth, that the system outperforms methods limited by memory without excessive added latency.

What would settle it

A direct comparison of training time and accuracy on a large graph using GriNNder versus a memory-constrained baseline, checking if the storage-based method completes faster without accuracy loss.

Figures

Figures reproduced from arXiv: 2605.11517 by Hongsun Jang, Hunseong Lim, Jaewon Jung, Jaeyong Song, Jinho Lee, Junguk Hong, Seongyeon Park.

**Figure 1.** Figure 1: Full-graph training procedure with a two-layer GNN. • Switching-aware partitioning: A lightweight, memoryefficient partitioning algorithm specifically designed for host-memory-limited environments, avoiding the high memory footprint of standard graph partitioners. We implement GriNNder as PyGriNNder, enabling users to leverage existing PyTorch Geometric (Fey & Lenssen, 2019) code by simple inheritance. No… view at source ↗

**Figure 3.** Figure 3: Overall workflow with cache-(re)gather-bypass. the neighbors of a ( a , b , and g ) fit within GPU memory, training can proceed. However, this approach yields sub-optimal performance for three main reasons: 1 Ensuring gathered neighbor features fit within GPU memory is challenging due to power-law degree distributions. Memory requirements per partition vary dramatically, making GPU capacity violations di… view at source ↗

**Figure 4.** Figure 4: GriNNder forward and backward procedures for layer 1. quires their source vertex activations in layer l − 1. Because transferring the source vertex data in a partition granularity would be too costly, the host processor gathers and transfers the union of the source vertices of all destination vertices in the current partition (GAl−1 p ), from the cached data. Unlike existing approaches, which snapshot the … view at source ↗

**Figure 6.** Figure 6: Advantage of (c) grad-engine activation regathering compared to (a) PyTorch autograd and (b) HongTu. tion, GriNNder partitions the graph and caches graph features/gradients in the host memory at partition granularity. However, existing autograd engines (Paszke et al., 2017; Fey & Lenssen, 2019) such as torch.autograd from PyTorch, even with generic activation checkpointing (Chen et al., 2016), are not de… view at source ↗

**Figure 7.** Figure 7: b illustrates an example intermediate state during partitioning. Following the CSR format, our data structure comprises source pointers (SrcPtr) and destination indices (DstIdx). We manage an additional array (Dst’s Partition) and fill this array with the partition ID of each destination index in DstIdx. For example in Figure 7b, vertex 0 has neighbors {1, 2, 5, 7, 4, 3}, and we fill Dst’s Partition with … view at source ↗

**Figure 8.** Figure 8: Framework structure of GriNNder. Switching-aware partitioning converges in 30–50 iterations, consuming only 0.07%/0.02%/0.39% of total training time on Products/IGBM/Papers datasets (Appendix O)—a negligible overhead for practical workflows. Despite its lightweight design, switching-aware partitioning achieves competitive partitioning quality compared to state-of-the-art lightweight partitioners (Section … view at source ↗

**Figure 9.** Figure 9: Host memory usage of GriNNder on the IGBM. 0 2 4 6 1.5 2.0 2.5 Expansion Ratio ( ) 210 GRD Spinner 2PS-L 0 10 20 30 40 4 5 6 7 8 Expansion Ratio ( ) 203 GRD Spinner 2PS-L 0 10 20 30 40 10 15 20 25Expansion Ratio ( ) 87 GRD Spinner 2PS-L Time (sec) Time (sec) Time (min) (a) Products (b) IGBM (c) Papers [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Time-to-quality comparison with alternatives on Products (4 parts), IGBM (32 parts), and Papers (2048 parts). layers), GRD-G alone provides sufficient performance benefits. However, in 5-layer settings, host memory becomes a bottleneck, making cache replacement crucial. Thus, GRDGC gains 3.09–4.04× speedup over GRD-G. Overall, GriNNder is robust on cache sizes. Also, we find that larger datasets have … view at source ↗

**Figure 13.** Figure 13 [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

**Figure 14.** Figure 14: illustrates an example graph and discusses the drawbacks of the above methods based on the full-graph dependency. Micro-Batch Training: Betty (Yang et al., 2023) (Figure 14c) accumulates gradients from message flow graphs (MFGs) with all neighbor information across all layers, followed by a single weight update. However, even a small number of GNN layers cause MFGs to expand rapidly (Figure 14b), often… view at source ↗

**Figure 15.** Figure 15: Partition dependency profile. (left) Products with 16 partitions, (mid) IGBM with 64 partitions, and (right) Papers with 2048 partitions. In the case of Papers, we only presented earlier 64 partitions for visibility. Host Storage e e c d a b a b Host Storage e e c d a b Swap In i h g c d g h f g h i f f Buffer a b e g h Buffer c d e f Unnecessary I/O Random Access Random Access Unnecessary I/O Swap Out / … view at source ↗

**Figure 17.** Figure 17: Overview of overlapping cache management with computation. GriNNder schedules host memory cache evictions and prefetching to overlap with GPU computations, minimizing storage I/O latency as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 16.** Figure 16: Advantage of partition-wise cache management compared to vertex-wise one. trend of power-law distribution clearer. For instance, in [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 18.** Figure 18: Profiling results of GriNNder’s forward and backward pass. To avoid this, after the graph is partitioned, we reorder the individual adjacency lists such that the neighbors are first sorted by their partition IDs and then by their vertex IDs. This replaces the random lookups with a single random lookup per partition, as in Figure 16b. H IN-DEPTH I/O VOLUME AND MEMORY FOOTPRINT ANALYSES [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 19.** Figure 19: Switching-aware partitioning. the same partition. Additionally, we need to balance the size of each partition to reduce the workload imbalance between partitions. To do so, we iteratively refine the partitions by selectively relocating vertices within a certain limit [PITH_FULL_IMAGE:figures/full_fig_p023_19.png] view at source ↗

**Figure 20.** Figure 20: User interface of GriNNder. Users inherit GriNNderGNN and implement layer forward to enable layer-wise execution for partition-based full-graph training [PITH_FULL_IMAGE:figures/full_fig_p024_20.png] view at source ↗

**Figure 21.** Figure 21: Functionality check of GriNNder. While GriNNder does not change the algorithm of fullgraph training, we tested the accuracy of GriNNder compared to full-graph training and HongTu for the functionality check, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗

read the original abstract

Full-graph training of graph neural networks (GNNs) is widely used as it enables direct validation of algorithmic improvements by preserving complete neighborhood information. However, it typically requires multiple GPUs or servers, incurring substantial hardware and inter-device communication costs. While existing single-server methods reduce infrastructure requirements, they remain constrained by GPU and host memory capacity as graph sizes increase. To address this limitation, we introduce GriNNder, which is the first work to leverage storage devices to enable full-graph training even with limited memory. Because modern NVMe SSDs offer multi-terabyte capacities and bandwidths exceeding 10 GB/s, they provide an appealing option when memory resources are scarce. Yet, directly applying storage-based methods from other domains fails to address the unique access patterns and data dependencies in full-graph GNN training. GriNNder tackles these challenges by structured storage offloading (SSO), a framework that manages the GPU-host-storage hierarchy through coordinated cache, (re)gather, and bypass mechanisms. To realize the framework, we devise (i) a partition-wise caching strategy for host memory that exploits the observation on cross-partition dependencies, (ii) a regathering strategy for gradient computation that eliminates redundant storage operations, and (iii) a lightweight partitioning scheme that mitigates the memory requirements of existing graph partitioners. In experiments performed over various models and datasets, GriNNder achieves up to 9.78x speedup over state-of-the-art baselines and throughput comparable to distributed systems, enabling previously infeasible large-scale full-graph training even on a single GPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GriNNder shows that storage offloading with GNN-aware caching and regathering can make full-graph training feasible on single-GPU setups with limited memory, but the 9.78x speedups rest on whether those mechanisms actually keep I/O from dominating.

read the letter

GriNNder is the first system to target storage devices for full-graph GNN training when graphs exceed host memory. It introduces a structured storage offloading framework that coordinates the GPU-host-storage hierarchy with three pieces: partition-wise host caching that uses observed cross-partition dependencies, a regathering strategy during gradient steps to skip redundant storage reads, and a lightweight partitioning method that reduces the memory footprint of the partitioner itself. These address the irregular multi-hop access patterns that make generic offloading ineffective for GNNs. The work does well by spelling out why prior storage techniques from other domains fall short here and by reporting up to 9.78x speedup over single-server baselines plus throughput comparable to distributed systems. That combination points to a practical way to avoid multi-GPU clusters for some workloads. The soft spot is the one flagged in the stress test. The central claim requires that the caching, regathering, and bypass keep total I/O volume and stalls low enough that compute still dominates once the graph no longer fits in DRAM. The abstract states the mechanisms handle the patterns and that NVMe bandwidth above 10 GB/s helps, but without the full experimental breakdown—datasets, I/O volume measurements, ablations, and whether the graphs truly exceed memory by a wide margin—it is hard to judge how robust the gains are. If the test cases still benefit from substantial host caching or favorable access locality, the advantage could shrink on harder instances. This paper is for systems builders and practitioners in scalable graph ML who need to train on large graphs without scaling hardware. Readers focused on infrastructure cost and single-machine limits would find the design choices and throughput numbers useful. It shows clear engagement with the problem and prior work, so it deserves a serious referee to verify the implementation details and evaluation.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces GriNNder, a system for full-graph GNN training on single-GPU hardware with limited memory by offloading to high-bandwidth NVMe SSD storage. It proposes a structured storage offloading (SSO) framework with three mechanisms: partition-wise host caching that exploits cross-partition dependencies, gradient regathering to avoid redundant storage reads during backpropagation, and a lightweight graph partitioning scheme to reduce memory overhead of existing partitioners. The authors claim that these techniques enable previously infeasible large-scale full-graph training, with experimental results showing up to 9.78x speedup over state-of-the-art single-server baselines and throughput comparable to distributed multi-GPU systems across various models and datasets.

Significance. If the performance claims hold under rigorous scrutiny, the work has substantial practical significance for the GNN community by lowering the hardware barrier for full-graph training, which preserves complete neighborhood information and enables direct validation of algorithmic improvements. By leveraging commodity storage rather than additional GPUs or servers, it addresses a real scalability bottleneck. The paper's strength lies in its system-level design tailored to GNN access patterns and the provision of concrete implementation details for the SSO components.

major comments (1)

[Experiments (results and ablation sections)] The central claim that GriNNder outperforms memory-constrained baselines and matches distributed throughput rests on the quantitative demonstration that the SSO mechanisms (partition-wise caching, gradient regathering, and bypass) keep total storage I/O overhead low enough that execution time remains compute-dominated even when the graph exceeds host DRAM. The manuscript should provide explicit measurements—such as per-epoch I/O volume in GB, storage stall time as a fraction of total runtime, and latency-hiding effectiveness via overlap with computation—for the largest datasets in the experimental evaluation to verify that the >10 GB/s NVMe bandwidth is effectively utilized despite irregular multi-hop access patterns.

minor comments (2)

[Abstract] The abstract states performance numbers without referencing specific dataset sizes, model architectures, or hardware configurations used in the experiments; these details should be summarized early in the introduction or experimental setup for immediate context.
[System Design] Notation for the three SSO components (cache, regather, bypass) is introduced in the abstract but would benefit from a single consolidated diagram or table in the system overview section to clarify their interactions in the GPU-host-storage hierarchy.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Experiments (results and ablation sections)] The central claim that GriNNder outperforms memory-constrained baselines and matches distributed throughput rests on the quantitative demonstration that the SSO mechanisms (partition-wise caching, gradient regathering, and bypass) keep total storage I/O overhead low enough that execution time remains compute-dominated even when the graph exceeds host DRAM. The manuscript should provide explicit measurements—such as per-epoch I/O volume in GB, storage stall time as a fraction of total runtime, and latency-hiding effectiveness via overlap with computation—for the largest datasets in the experimental evaluation to verify that the >10 GB/s NVMe bandwidth is effectively utilized despite irregular multi-hop access patterns.

Authors: We agree that explicit quantification of I/O overhead is important to rigorously support the claim that execution remains compute-dominated. While the reported speedups (up to 9.78x) and throughput parity with distributed systems already indicate that storage latency is effectively hidden by our SSO mechanisms, we acknowledge the value of the requested breakdowns. In the revised manuscript we will add, for the largest datasets, (i) per-epoch I/O volumes in GB, (ii) storage stall time as a percentage of total runtime, and (iii) measurements of overlap effectiveness between I/O and computation. These additions will directly demonstrate utilization of the >10 GB/s NVMe bandwidth despite irregular multi-hop patterns and will be placed in both the main results and ablation sections. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on empirical evaluation of implemented mechanisms, not self-referential derivations

full rationale

The paper describes an engineering system (GriNNder) with three concrete mechanisms—partition-wise host caching, gradient regathering, and lightweight partitioning—under the SSO framework. These are presented as design choices to handle GNN access patterns on NVMe storage, followed by experimental measurements of speedup (up to 9.78x) against baselines. No equations, fitted parameters, or first-principles predictions appear in the provided text; the central claims are not derived mathematically but demonstrated via runtime comparisons on real graphs and models. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is used to justify the core result. The derivation chain is therefore self-contained as an implementation-plus-benchmark paper rather than a closed logical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on hardware performance assumptions and GNN-specific data dependency observations rather than new mathematical entities or fitted constants.

axioms (2)

domain assumption Modern NVMe SSDs offer multi-terabyte capacities and bandwidths exceeding 10 GB/s
Invoked to justify storage as a viable offload target when memory is scarce.
domain assumption Directly applying storage-based methods from other domains fails to address unique access patterns and data dependencies in full-graph GNN training
Used to motivate the need for the new SSO framework and its three components.

pith-pipeline@v0.9.0 · 5610 in / 1303 out tokens · 38387 ms · 2026-05-13T01:52:11.137804+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

116 extracted references · 116 canonical work pages · 1 internal anchor

[1]

2023 , booktitle =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , title =. 2023 , booktitle =

work page 2023
[2]

, title =

Eyerman, Stijn and Eeckhout, Lieven and Karkhanis, Tejas and Smith, James E. , title =. 2006 , booktitle =

work page 2006
[3]

Zirui Liu and Kaixiong Zhou and Fan Yang and Li Li and Rui Chen and Xia Hu , booktitle=

work page
[4]

Zhihao Shi and Xize Liang and Jie Wang , booktitle=

work page
[5]

International Conference on Learning Representations (ICLR) , year=

Graph Neural Networks Exponentially Lose Expressive Power for Node Classification , author=. International Conference on Learning Representations (ICLR) , year=

work page
[6]

USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

John Thorpe and Yifan Qiao and Jonathan Eyolfson and Shen Teng and Guanzhou Hu and Zhihao Jia and Jinliang Wei and Keval Vora and Ravi Netravali and Miryung Kim and Guoqing Harry Xu , title =. USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

work page
[7]

2022 , journal =

Demirci, Gunduz Vehbi and Haldar, Aparajita and Ferhatosmanoglu, Hakan , title =. 2022 , journal =

work page 2022
[8]

2025 , journal =

Saurabh Bajaj and Hojae Son and Juelin Liu and Hui Guan and Marco Serafini , title =. 2025 , journal =

work page 2025
[9]

2020 , booktitle =

Tripathy, Alok and Yelick, Katherine and Bulu. 2020 , booktitle =

work page 2020
[10]

2022 , journal =

Peng, Jingshu and Chen, Zhao and Shao, Yingxia and Shen, Yanyan and Chen, Lei and Cao, Jiannong , title =. 2022 , journal =

work page 2022
[11]

International Conference on Management of Data (SIGMOD) , year =

Wang, Qiange and Zhang, Yanfeng and Wang, Hao and Chen, Chaoyi and Zhang, Xiaodong and Yu, Ge , title =. International Conference on Management of Data (SIGMOD) , year =

work page
[12]

Conference on Machine Learning and Systems (MLSys) , year =

Jia, Zhihao and Lin, Sina and Gao, Mingyu and Zaharia, Matei and Aiken, Alex , title =. Conference on Machine Learning and Systems (MLSys) , year =

work page
[13]

arXiv preprint arXiv:2010.05337 , year=

Da Zheng and Chao Ma and Minjie Wang and Jinjing Zhou and Qidong Su and Xiang Song and Quan Gan and Zheng Zhang and George Karypis , title=. arXiv preprint arXiv:2010.05337 , year=

work page arXiv 2010
[14]

2023 , booktitle =

Khatua, Arpandeep and Mailthody, Vikram Sharma and Taleka, Bhagyashree and Ma, Tengfei and Song, Xiang and Hwu, Wen-mei , title =. 2023 , booktitle =

work page 2023
[15]

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , year =

Zheng, Da and Song, Xiang and Yang, Chengru and LaSalle, Dominique and Karypis, George , title =. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , year =

work page
[16]

and Kyrillidis, Anastasios and Kim, Nam Sung and Lin, Yingyan , title =

Wan, Cheng and Li, Youjie and Wolfe, Cameron R. and Kyrillidis, Anastasios and Kim, Nam Sung and Lin, Yingyan , title =. International Conference on Learning Representations (ICLR) , year =

work page
[17]

Semi-Supervised Classification with Graph Convolutional Networks

Semi-supervised classification with graph convolutional networks , author=. arXiv preprint arXiv:1609.02907 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Bioinformatics , year=

DeepRank-GNN: a graph neural network framework to learn patterns in protein--protein interfaces , author=. Bioinformatics , year=

work page
[19]

ACM Computing Surveys , year=

A survey of graph neural networks for social recommender systems , author=. ACM Computing Surveys , year=

work page
[20]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page
[21]

International Conference on Learning Representations (ICLR) , year=

Graph Attention Networks , author=. International Conference on Learning Representations (ICLR) , year=

work page
[22]

International Conference on Learning Representations (ICLR) , year=

How Powerful are Graph Neural Networks? , author=. International Conference on Learning Representations (ICLR) , year=

work page
[23]

Open Graph Benchmark: Datasets for Machine Learning on Graphs , year =

Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure , booktitle =. Open Graph Benchmark: Datasets for Machine Learning on Graphs , year =

work page
[24]

Inductive Representation Learning on Large Graphs , year =

Hamilton, Will and Ying, Zhitao and Leskovec, Jure , booktitle =. Inductive Representation Learning on Large Graphs , year =

work page
[25]

Prasanna , title =

Hanqing Zeng and Hongkuan Zhou and Ajitesh Srivastava and Rajgopal Kannan and Viktor K. Prasanna , title =. International Conference on Learning Representations (ICLR) , year =

work page
[26]

, author=

NeuGraph: Parallel Deep Neural Network Computation on Large Graphs. , author=. USENIX Annual Technical Conference (ATC) , year=

work page
[27]

USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

Mart. USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

work page
[28]

Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory , journal=

work page
[29]

SIAM Journal on scientific Computing , year=

A fast and high quality multilevel scheme for partitioning irregular graphs , author=. SIAM Journal on scientific Computing , year=

work page
[30]

Conference on Machine Learning and Systems (MLSys) , year=

Accelerating training and inference of graph neural networks with fast sampling and pipelining , author=. Conference on Machine Learning and Systems (MLSys) , year=

work page
[31]

arXiv preprint arXiv:2112.08541 , year=

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing , author=. arXiv preprint arXiv:2112.08541 , year=

work page arXiv
[32]

2020 , booktitle =

Lin, Zhiqi and Li, Cheng and Miao, Youshan and Liu, Yunxin and Xu, Yinlong , title =. 2020 , booktitle =

work page 2020
[33]

, author=

P3: Distributed Deep Graph Learning at Scale. , author=. USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year=

work page
[34]

2019 , journal =

Zhu, Rong and Zhao, Kun and Yang, Hongxia and Lin, Wei and Zhou, Chang and Ai, Baole and Li, Yong and Zhou, Jingren , title =. 2019 , journal =

work page 2019
[35]

2020 , journal =

Zhang, Dalong and Huang, Xin and Liu, Ziqi and Zhou, Jun and Hu, Zhiyang and Song, Xianzheng and Ge, Zhibang and Wang, Lin and Zhang, Zhiqiang and Qi, Yuan , title =. 2020 , journal =

work page 2020
[36]

2022 , booktitle =

Yang, Jianbang and Tang, Dahai and Song, Xiaoniu and Wang, Lei and Yin, Qiang and Chen, Rong and Yu, Wenyuan and Zhou, Jingren , title =. 2022 , booktitle =

work page 2022
[37]

and Karypis, George , title =

Dong, Jialin and Zheng, Da and Yang, Lin F. and Karypis, George , title =. 2021 , booktitle =

work page 2021
[38]

Wan, Cheng and Li, Youjie and Li, Ang and Kim, Nam Sung and Lin, Yingyan , booktitle =

work page
[39]

ICLR Workshop on Representation Learning on Graphs and Manifolds (ICLRW) , year=

Fast graph representation learning with PyTorch Geometric , author=. ICLR Workshop on Representation Learning on Graphs and Manifolds (ICLRW) , year=

work page
[40]

2005 , booktitle =

Leskovec, Jure and Kleinberg, Jon and Faloutsos, Christos , title =. 2005 , booktitle =

work page 2005
[41]

2024 , booktitle =

Song, Jaeyong and Jang, Hongsun and Lim, Hunseong and Jung, Jaewon and Kim, Youngsok and Lee, Jinho , title =. 2024 , booktitle =

work page 2024
[42]

and Strogatz, S

Watts, D. and Strogatz, S. , title =. Nature , year =

work page
[43]

Yoo, Mingi and Song, Jaeyong and Lee, Jounghoo and Kim, Namhyung and Kim, Youngsok and Lee, Jinho , booktitle =

work page
[44]

International Conference on Learning Representations (ICLR) , year=

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , author=. International Conference on Learning Representations (ICLR) , year=

work page
[45]

International Conference on Parallel Architectures and Compilation Techniques (PACT) , year=

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing , author=. International Conference on Parallel Architectures and Compilation Techniques (PACT) , year=

work page
[46]

2023 , booktitle =

Song, Jaeyong and Yim, Jinkyu and Jung, Jaewon and Jang, Hongsun and Kim, Hyung-Jin and Kim, Youngsok and Lee, Jinho , title =. 2023 , booktitle =

work page 2023
[47]

2019 , booktitle =

Vogels, Thijs and Karimireddy, Sai Praneeth and Jaggi, Martin , title =. 2019 , booktitle =

work page 2019
[48]

Schardl and Charles E

Tim Kaler and Alexandros-Stavros Iliopoulos and Philip Murzynowski and Tao B. Schardl and Charles E. Leiserson and Jie Chen , title =. Conference on Machine Learning and Systems (MLSys) , year =

work page
[49]

2023 , booktitle =

Yang, Shuangyan and Zhang, Minjia and Dong, Wenqian and Li, Dong , title =. 2023 , booktitle =

work page 2023
[50]

2026 , booktitle =

A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs , author=. 2026 , booktitle =

work page 2026
[51]

2023 , booktitle =

Waleffe, Roger and Mohoney, Jason and Rekatsinas, Theodoros and Venkataraman, Shivaram , title =. 2023 , booktitle =

work page 2023
[52]

2024 , booktitle =

Jiang, Qisheng and Jia, Lei and Wang, Chundong , title =. 2024 , booktitle =

work page 2024
[53]

2025 , booktitle =

Liu, Renjie and Wang, Yichuan and Yan, Xiao and Jiang, Haitian and Cai, Zhenkun and Wang, Minjie and Tang, Bo and Li, Jinyang , title =. 2025 , booktitle =

work page 2025
[54]

Out-of-Core Edge Partitioning at Linear Run-Time , year=

Mayer, Ruben and Orujzade, Kamil and Jacobsen, Hans-Arno , booktitle=. Out-of-Core Edge Partitioning at Linear Run-Time , year=

work page
[55]

2023 , booktitle=

Wan, Xinchen and Xu, Kaiqiang and Liao, Xudong and Jin, Yilun and Chen, Kai and Jin, Xin , title =. 2023 , booktitle=

work page 2023
[56]

and Lenssen, J

Fey, M. and Lenssen, J. E. and Weichert, F. and Leskovec, J. , booktitle=

work page
[57]

Rajbhandari, Samyam and Ruwase, Olatunji and Rasley, Jeff and Smith, Shaden and He, Yuxiong , booktitle=

work page
[58]

Sun, Jie and Sun, Mo and Zhang, Zheng and Xie, Jun and Shi, Zuocheng and Yang, Zihan and Zhang, Jie and Wu, Fei and Wang, Zeke , journal=

work page
[59]

, title =

Park, Yeonhong and Min, Sunhong and Lee, Jae W. , title =. 2022 , journal =

work page 2022
[60]

2023 , journal =

Wang, Qiange and Chen, Yao and Wong, Weng-Fai and He, Bingsheng , title =. 2023 , journal =

work page 2023
[61]

LaSalle, Dominique and Karypis, George , booktitle=

work page
[62]

Martella, Claudio and Logothetis, Dionysios and Loukas, Andreas and Siganos, Georgos , booktitle=

work page
[63]

and Bik, Aart J.C and Dehnert, James C

Malewicz, Grzegorz and Austern, Matthew H. and Bik, Aart J.C and Dehnert, James C. and Horn, Ilan and Leiser, Naty and Czajkowski, Grzegorz , year =

work page
[64]

Karypis, George and Schloegel, Kirk and Kumar, Vipin , institution=

work page
[65]

Zhang, Ruisi and Javaheripi, Mojan and Ghodsi, Zahra and Bleiweiss, Amit and Koushanfar, Farinaz , booktitle=

work page
[66]

Tsourakakis, Charalampos and Gkantsidis, Christos and Radunovic, Bozidar and Vojnovic, Milan , booktitle =

work page
[67]

Echbarthi, Ghizlane and Kheddouci, Hamamache , booktitle=

work page
[68]

Kaur, Gurneet and Gupta, Rajiv , booktitle=

work page
[69]

2017 , journal =

Jia, Zhihao and Kwon, Yongkee and Shipman, Galen and McCormick, Pat and Erez, Mattan and Aiken, Alex , title =. 2017 , journal =

work page 2017
[70]

Huang, Wenbing and Zhang, Tong and Rong, Yu and Huang, Junzhou , journal=

work page
[71]

2018 , booktitle =

Chen, Hongzhi and Liu, Miao and Zhao, Yunjian and Yan, Xiao and Yan, Da and Cheng, James , title =. 2018 , booktitle =

work page 2018
[72]

and Leskovec, Jure , title =

Ying, Rex and He, Ruining and Chen, Kaifeng and Eksombatchai, Pong and Hamilton, William L. and Leskovec, Jure , title =. ACM SIGKDD International Conference on Knowledge Discovery Data Mining (KDD) , year =

work page
[73]

International Conference on Machine Learning (ICML) , year=

Sheng, Ying and Zheng, Lianmin and Yuan, Binhang and Li, Zhuohan and Ryabinin, Max and Chen, Beidi and Liang, Percy and R. International Conference on Machine Learning (ICML) , year=

work page
[74]

Sun, Jie and Su, Li and Shi, Zuocheng and Shen, Wenting and Wang, Zeke and Wang, Lei and Zhang, Jie and Li, Yong and Yu, Wenyuan and Zhou, Jingren and others , booktitle=

work page
[75]

Karypis, George and Kumar, Vipin , booktitle=

work page
[76]

Physical review E , year=

Raghavan, Usha Nandini and Albert, R. Physical review E , year=

work page
[77]

and Hager, William W

Davis, Timothy A. and Hager, William W. and Kolodziej, Scott P. and Yeralan, S. Nuri , title =. 2020 , journal =

work page 2020
[78]

Heuer, Tobias and Schlag, Sebastian , booktitle=

work page
[79]

Shaydulin, Ruslan and Safro, Ilya , journal=

work page
[80]

Akhremtsev, Yaroslav and Heuer, Tobias and Sanders, Peter and Schlag, Sebastian , booktitle=

work page

Showing first 80 references.