pith. machine review for the scientific record. sign in

arxiv: 2605.11517 · v1 · submitted 2026-05-12 · 💻 cs.DC

Recognition: no theorem link

GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading

Hongsun Jang, Hunseong Lim, Jaewon Jung, Jaeyong Song, Jinho Lee, Junguk Hong, Seongyeon Park

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:52 UTC · model grok-4.3

classification 💻 cs.DC
keywords GNN trainingstorage offloadingfull-graph trainingmemory wallNVMe SSDsingle GPUgraph neural networks
0
0 comments X

The pith

GriNNder enables full-graph GNN training on a single GPU by offloading data to storage devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GriNNder as a method to perform full-graph training of graph neural networks without being limited by GPU or host memory size. It achieves this by using high-capacity storage devices through a structured offloading approach that handles the unique data access patterns of GNNs. This matters because it allows training on much larger graphs using only a single machine, avoiding the costs and complexities of distributed multi-GPU systems while delivering comparable speeds.

Core claim

GriNNder is the first system to use storage offloading for full-graph GNN training under memory constraints by introducing structured storage offloading (SSO). SSO coordinates caching in host memory based on cross-partition dependencies, regathers data for gradient computation to eliminate redundant storage accesses, and applies lightweight partitioning to reduce memory needs.

What carries the argument

Structured storage offloading (SSO), which manages the GPU-host-storage hierarchy using coordinated cache, regather, and bypass mechanisms tailored to GNN access patterns.

If this is right

  • Full-graph training becomes feasible for graphs that exceed available memory capacities.
  • Training achieves speedups of up to 9.78 times compared to previous single-server approaches.
  • Throughput reaches levels comparable to those of distributed multi-server systems.
  • Large-scale GNN training no longer requires multiple GPUs or servers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could be adapted for other machine learning workloads that involve large, irregularly accessed datasets.
  • Adoption might significantly reduce the infrastructure costs for training large graph models in resource-limited settings.
  • Testing on graphs with varying sparsity or different storage hardware would reveal the limits of the offloading benefits.

Load-bearing premise

The overhead of storage accesses stays low enough, thanks to GNN data patterns and modern SSD bandwidth, that the system outperforms methods limited by memory without excessive added latency.

What would settle it

A direct comparison of training time and accuracy on a large graph using GriNNder versus a memory-constrained baseline, checking if the storage-based method completes faster without accuracy loss.

Figures

Figures reproduced from arXiv: 2605.11517 by Hongsun Jang, Hunseong Lim, Jaewon Jung, Jaeyong Song, Jinho Lee, Junguk Hong, Seongyeon Park.

Figure 1
Figure 1. Figure 1: Full-graph training procedure with a two-layer GNN. • Switching-aware partitioning: A lightweight, memory￾efficient partitioning algorithm specifically designed for host-memory-limited environments, avoiding the high memory footprint of standard graph partitioners. We implement GriNNder as PyGriNNder, enabling users to leverage existing PyTorch Geometric (Fey & Lenssen, 2019) code by simple inheritance. No… view at source ↗
Figure 3
Figure 3. Figure 3: Overall workflow with cache-(re)gather-bypass. the neighbors of a ( a , b , and g ) fit within GPU mem￾ory, training can proceed. However, this approach yields sub-optimal performance for three main reasons: 1 Ensur￾ing gathered neighbor features fit within GPU memory is challenging due to power-law degree distributions. Memory requirements per partition vary dramatically, making GPU capacity violations di… view at source ↗
Figure 4
Figure 4. Figure 4: GriNNder forward and backward procedures for layer 1. quires their source vertex activations in layer l − 1. Because transferring the source vertex data in a partition granularity would be too costly, the host processor gathers and transfers the union of the source vertices of all destination vertices in the current partition (GAl−1 p ), from the cached data. Unlike existing approaches, which snapshot the … view at source ↗
Figure 6
Figure 6. Figure 6: Advantage of (c) grad-engine activation regathering com￾pared to (a) PyTorch autograd and (b) HongTu. tion, GriNNder partitions the graph and caches graph fea￾tures/gradients in the host memory at partition granularity. However, existing autograd engines (Paszke et al., 2017; Fey & Lenssen, 2019) such as torch.autograd from PyTorch, even with generic activation checkpointing (Chen et al., 2016), are not de… view at source ↗
Figure 7
Figure 7. Figure 7: b illustrates an example intermediate state during partitioning. Following the CSR format, our data structure comprises source pointers (SrcPtr) and destination in￾dices (DstIdx). We manage an additional array (Dst’s Partition) and fill this array with the partition ID of each destination index in DstIdx. For example in Figure 7b, vertex 0 has neighbors {1, 2, 5, 7, 4, 3}, and we fill Dst’s Partition with … view at source ↗
Figure 8
Figure 8. Figure 8: Framework structure of GriNNder. Switching-aware partitioning converges in 30–50 itera￾tions, consuming only 0.07%/0.02%/0.39% of total training time on Products/IGBM/Papers datasets (Appendix O)—a negligible overhead for practical workflows. Despite its lightweight design, switching-aware partitioning achieves competitive partitioning quality compared to state-of-the-art lightweight partitioners (Section … view at source ↗
Figure 9
Figure 9. Figure 9: Host memory usage of GriNNder on the IGBM. 0 2 4 6 1.5 2.0 2.5 Expansion Ratio ( ) 210 GRD Spinner 2PS-L 0 10 20 30 40 4 5 6 7 8 Expansion Ratio ( ) 203 GRD Spinner 2PS-L 0 10 20 30 40 10 15 20 25Expansion Ratio ( ) 87 GRD Spinner 2PS-L Time (sec) Time (sec) Time (min) (a) Products (b) IGBM (c) Papers [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Time-to-quality comparison with alternatives on Prod￾ucts (4 parts), IGBM (32 parts), and Papers (2048 parts). layers), GRD-G alone provides sufficient performance ben￾efits. However, in 5-layer settings, host memory becomes a bottleneck, making cache replacement crucial. Thus, GRD￾GC gains 3.09–4.04× speedup over GRD-G. Overall, GriN￾Nder is robust on cache sizes. Also, we find that larger datasets have … view at source ↗
Figure 13
Figure 13. Figure 13 [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: illustrates an example graph and discusses the drawbacks of the above methods based on the full-graph dependency. Micro-Batch Training: Betty (Yang et al., 2023) (Fig￾ure 14c) accumulates gradients from message flow graphs (MFGs) with all neighbor information across all layers, fol￾lowed by a single weight update. However, even a small number of GNN layers cause MFGs to expand rapidly (Fig￾ure 14b), often… view at source ↗
Figure 15
Figure 15. Figure 15: Partition dependency profile. (left) Products with 16 partitions, (mid) IGBM with 64 partitions, and (right) Papers with 2048 partitions. In the case of Papers, we only presented earlier 64 partitions for visibility. Host Storage e e c d a b a b Host Storage e e c d a b Swap In i h g c d g h f g h i f f Buffer a b e g h Buffer c d e f Unnecessary I/O Random Access Random Access Unnecessary I/O Swap Out / … view at source ↗
Figure 17
Figure 17. Figure 17: Overview of overlapping cache management with com￾putation. GriNNder schedules host memory cache evictions and prefetching to overlap with GPU computations, minimizing storage I/O latency as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 16
Figure 16. Figure 16: Advantage of partition-wise cache management com￾pared to vertex-wise one. trend of power-law distribution clearer. For instance, in [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: Profiling results of GriNNder’s forward and backward pass. To avoid this, after the graph is partitioned, we reorder the individual adjacency lists such that the neighbors are first sorted by their partition IDs and then by their vertex IDs. This replaces the random lookups with a single random lookup per partition, as in Figure 16b. H IN-DEPTH I/O VOLUME AND MEMORY FOOTPRINT ANALYSES [PITH_FULL_IMAGE:fi… view at source ↗
Figure 19
Figure 19. Figure 19: Switching-aware partitioning. the same partition. Additionally, we need to balance the size of each partition to reduce the workload imbalance between partitions. To do so, we iteratively refine the partitions by selectively relocating vertices within a certain limit [PITH_FULL_IMAGE:figures/full_fig_p023_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: User interface of GriNNder. Users inherit GriNNderGNN and implement layer forward to en￾able layer-wise execution for partition-based full-graph training [PITH_FULL_IMAGE:figures/full_fig_p024_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Functionality check of GriNNder. While GriNNder does not change the algorithm of full￾graph training, we tested the accuracy of GriNNder com￾pared to full-graph training and HongTu for the functionality check, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗
read the original abstract

Full-graph training of graph neural networks (GNNs) is widely used as it enables direct validation of algorithmic improvements by preserving complete neighborhood information. However, it typically requires multiple GPUs or servers, incurring substantial hardware and inter-device communication costs. While existing single-server methods reduce infrastructure requirements, they remain constrained by GPU and host memory capacity as graph sizes increase. To address this limitation, we introduce GriNNder, which is the first work to leverage storage devices to enable full-graph training even with limited memory. Because modern NVMe SSDs offer multi-terabyte capacities and bandwidths exceeding 10 GB/s, they provide an appealing option when memory resources are scarce. Yet, directly applying storage-based methods from other domains fails to address the unique access patterns and data dependencies in full-graph GNN training. GriNNder tackles these challenges by structured storage offloading (SSO), a framework that manages the GPU-host-storage hierarchy through coordinated cache, (re)gather, and bypass mechanisms. To realize the framework, we devise (i) a partition-wise caching strategy for host memory that exploits the observation on cross-partition dependencies, (ii) a regathering strategy for gradient computation that eliminates redundant storage operations, and (iii) a lightweight partitioning scheme that mitigates the memory requirements of existing graph partitioners. In experiments performed over various models and datasets, GriNNder achieves up to 9.78x speedup over state-of-the-art baselines and throughput comparable to distributed systems, enabling previously infeasible large-scale full-graph training even on a single GPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces GriNNder, a system for full-graph GNN training on single-GPU hardware with limited memory by offloading to high-bandwidth NVMe SSD storage. It proposes a structured storage offloading (SSO) framework with three mechanisms: partition-wise host caching that exploits cross-partition dependencies, gradient regathering to avoid redundant storage reads during backpropagation, and a lightweight graph partitioning scheme to reduce memory overhead of existing partitioners. The authors claim that these techniques enable previously infeasible large-scale full-graph training, with experimental results showing up to 9.78x speedup over state-of-the-art single-server baselines and throughput comparable to distributed multi-GPU systems across various models and datasets.

Significance. If the performance claims hold under rigorous scrutiny, the work has substantial practical significance for the GNN community by lowering the hardware barrier for full-graph training, which preserves complete neighborhood information and enables direct validation of algorithmic improvements. By leveraging commodity storage rather than additional GPUs or servers, it addresses a real scalability bottleneck. The paper's strength lies in its system-level design tailored to GNN access patterns and the provision of concrete implementation details for the SSO components.

major comments (1)
  1. [Experiments (results and ablation sections)] The central claim that GriNNder outperforms memory-constrained baselines and matches distributed throughput rests on the quantitative demonstration that the SSO mechanisms (partition-wise caching, gradient regathering, and bypass) keep total storage I/O overhead low enough that execution time remains compute-dominated even when the graph exceeds host DRAM. The manuscript should provide explicit measurements—such as per-epoch I/O volume in GB, storage stall time as a fraction of total runtime, and latency-hiding effectiveness via overlap with computation—for the largest datasets in the experimental evaluation to verify that the >10 GB/s NVMe bandwidth is effectively utilized despite irregular multi-hop access patterns.
minor comments (2)
  1. [Abstract] The abstract states performance numbers without referencing specific dataset sizes, model architectures, or hardware configurations used in the experiments; these details should be summarized early in the introduction or experimental setup for immediate context.
  2. [System Design] Notation for the three SSO components (cache, regather, bypass) is introduced in the abstract but would benefit from a single consolidated diagram or table in the system overview section to clarify their interactions in the GPU-host-storage hierarchy.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Experiments (results and ablation sections)] The central claim that GriNNder outperforms memory-constrained baselines and matches distributed throughput rests on the quantitative demonstration that the SSO mechanisms (partition-wise caching, gradient regathering, and bypass) keep total storage I/O overhead low enough that execution time remains compute-dominated even when the graph exceeds host DRAM. The manuscript should provide explicit measurements—such as per-epoch I/O volume in GB, storage stall time as a fraction of total runtime, and latency-hiding effectiveness via overlap with computation—for the largest datasets in the experimental evaluation to verify that the >10 GB/s NVMe bandwidth is effectively utilized despite irregular multi-hop access patterns.

    Authors: We agree that explicit quantification of I/O overhead is important to rigorously support the claim that execution remains compute-dominated. While the reported speedups (up to 9.78x) and throughput parity with distributed systems already indicate that storage latency is effectively hidden by our SSO mechanisms, we acknowledge the value of the requested breakdowns. In the revised manuscript we will add, for the largest datasets, (i) per-epoch I/O volumes in GB, (ii) storage stall time as a percentage of total runtime, and (iii) measurements of overlap effectiveness between I/O and computation. These additions will directly demonstrate utilization of the >10 GB/s NVMe bandwidth despite irregular multi-hop patterns and will be placed in both the main results and ablation sections. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on empirical evaluation of implemented mechanisms, not self-referential derivations

full rationale

The paper describes an engineering system (GriNNder) with three concrete mechanisms—partition-wise host caching, gradient regathering, and lightweight partitioning—under the SSO framework. These are presented as design choices to handle GNN access patterns on NVMe storage, followed by experimental measurements of speedup (up to 9.78x) against baselines. No equations, fitted parameters, or first-principles predictions appear in the provided text; the central claims are not derived mathematically but demonstrated via runtime comparisons on real graphs and models. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is used to justify the core result. The derivation chain is therefore self-contained as an implementation-plus-benchmark paper rather than a closed logical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on hardware performance assumptions and GNN-specific data dependency observations rather than new mathematical entities or fitted constants.

axioms (2)
  • domain assumption Modern NVMe SSDs offer multi-terabyte capacities and bandwidths exceeding 10 GB/s
    Invoked to justify storage as a viable offload target when memory is scarce.
  • domain assumption Directly applying storage-based methods from other domains fails to address unique access patterns and data dependencies in full-graph GNN training
    Used to motivate the need for the new SSO framework and its three components.

pith-pipeline@v0.9.0 · 5610 in / 1303 out tokens · 38387 ms · 2026-05-13T01:52:11.137804+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

116 extracted references · 116 canonical work pages · 1 internal anchor

  1. [1]

    2023 , booktitle =

    Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , title =. 2023 , booktitle =

  2. [2]

    , title =

    Eyerman, Stijn and Eeckhout, Lieven and Karkhanis, Tejas and Smith, James E. , title =. 2006 , booktitle =

  3. [3]

    Zirui Liu and Kaixiong Zhou and Fan Yang and Li Li and Rui Chen and Xia Hu , booktitle=

  4. [4]

    Zhihao Shi and Xize Liang and Jie Wang , booktitle=

  5. [5]

    International Conference on Learning Representations (ICLR) , year=

    Graph Neural Networks Exponentially Lose Expressive Power for Node Classification , author=. International Conference on Learning Representations (ICLR) , year=

  6. [6]

    USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

    John Thorpe and Yifan Qiao and Jonathan Eyolfson and Shen Teng and Guanzhou Hu and Zhihao Jia and Jinliang Wei and Keval Vora and Ravi Netravali and Miryung Kim and Guoqing Harry Xu , title =. USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

  7. [7]

    2022 , journal =

    Demirci, Gunduz Vehbi and Haldar, Aparajita and Ferhatosmanoglu, Hakan , title =. 2022 , journal =

  8. [8]

    2025 , journal =

    Saurabh Bajaj and Hojae Son and Juelin Liu and Hui Guan and Marco Serafini , title =. 2025 , journal =

  9. [9]

    2020 , booktitle =

    Tripathy, Alok and Yelick, Katherine and Bulu. 2020 , booktitle =

  10. [10]

    2022 , journal =

    Peng, Jingshu and Chen, Zhao and Shao, Yingxia and Shen, Yanyan and Chen, Lei and Cao, Jiannong , title =. 2022 , journal =

  11. [11]

    International Conference on Management of Data (SIGMOD) , year =

    Wang, Qiange and Zhang, Yanfeng and Wang, Hao and Chen, Chaoyi and Zhang, Xiaodong and Yu, Ge , title =. International Conference on Management of Data (SIGMOD) , year =

  12. [12]

    Conference on Machine Learning and Systems (MLSys) , year =

    Jia, Zhihao and Lin, Sina and Gao, Mingyu and Zaharia, Matei and Aiken, Alex , title =. Conference on Machine Learning and Systems (MLSys) , year =

  13. [13]

    arXiv preprint arXiv:2010.05337 , year=

    Da Zheng and Chao Ma and Minjie Wang and Jinjing Zhou and Qidong Su and Xiang Song and Quan Gan and Zheng Zhang and George Karypis , title=. arXiv preprint arXiv:2010.05337 , year=

  14. [14]

    2023 , booktitle =

    Khatua, Arpandeep and Mailthody, Vikram Sharma and Taleka, Bhagyashree and Ma, Tengfei and Song, Xiang and Hwu, Wen-mei , title =. 2023 , booktitle =

  15. [15]

    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , year =

    Zheng, Da and Song, Xiang and Yang, Chengru and LaSalle, Dominique and Karypis, George , title =. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , year =

  16. [16]

    and Kyrillidis, Anastasios and Kim, Nam Sung and Lin, Yingyan , title =

    Wan, Cheng and Li, Youjie and Wolfe, Cameron R. and Kyrillidis, Anastasios and Kim, Nam Sung and Lin, Yingyan , title =. International Conference on Learning Representations (ICLR) , year =

  17. [17]

    Semi-Supervised Classification with Graph Convolutional Networks

    Semi-supervised classification with graph convolutional networks , author=. arXiv preprint arXiv:1609.02907 , year=

  18. [18]

    Bioinformatics , year=

    DeepRank-GNN: a graph neural network framework to learn patterns in protein--protein interfaces , author=. Bioinformatics , year=

  19. [19]

    ACM Computing Surveys , year=

    A survey of graph neural networks for social recommender systems , author=. ACM Computing Surveys , year=

  20. [20]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  21. [21]

    International Conference on Learning Representations (ICLR) , year=

    Graph Attention Networks , author=. International Conference on Learning Representations (ICLR) , year=

  22. [22]

    International Conference on Learning Representations (ICLR) , year=

    How Powerful are Graph Neural Networks? , author=. International Conference on Learning Representations (ICLR) , year=

  23. [23]

    Open Graph Benchmark: Datasets for Machine Learning on Graphs , year =

    Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure , booktitle =. Open Graph Benchmark: Datasets for Machine Learning on Graphs , year =

  24. [24]

    Inductive Representation Learning on Large Graphs , year =

    Hamilton, Will and Ying, Zhitao and Leskovec, Jure , booktitle =. Inductive Representation Learning on Large Graphs , year =

  25. [25]

    Prasanna , title =

    Hanqing Zeng and Hongkuan Zhou and Ajitesh Srivastava and Rajgopal Kannan and Viktor K. Prasanna , title =. International Conference on Learning Representations (ICLR) , year =

  26. [26]

    , author=

    NeuGraph: Parallel Deep Neural Network Computation on Large Graphs. , author=. USENIX Annual Technical Conference (ATC) , year=

  27. [27]

    USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

    Mart. USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year =

  28. [28]

    Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory , journal=

  29. [29]

    SIAM Journal on scientific Computing , year=

    A fast and high quality multilevel scheme for partitioning irregular graphs , author=. SIAM Journal on scientific Computing , year=

  30. [30]

    Conference on Machine Learning and Systems (MLSys) , year=

    Accelerating training and inference of graph neural networks with fast sampling and pipelining , author=. Conference on Machine Learning and Systems (MLSys) , year=

  31. [31]

    arXiv preprint arXiv:2112.08541 , year=

    BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing , author=. arXiv preprint arXiv:2112.08541 , year=

  32. [32]

    2020 , booktitle =

    Lin, Zhiqi and Li, Cheng and Miao, Youshan and Liu, Yunxin and Xu, Yinlong , title =. 2020 , booktitle =

  33. [33]

    , author=

    P3: Distributed Deep Graph Learning at Scale. , author=. USENIX Symposium on Operating Systems Design and Implementation (OSDI) , year=

  34. [34]

    2019 , journal =

    Zhu, Rong and Zhao, Kun and Yang, Hongxia and Lin, Wei and Zhou, Chang and Ai, Baole and Li, Yong and Zhou, Jingren , title =. 2019 , journal =

  35. [35]

    2020 , journal =

    Zhang, Dalong and Huang, Xin and Liu, Ziqi and Zhou, Jun and Hu, Zhiyang and Song, Xianzheng and Ge, Zhibang and Wang, Lin and Zhang, Zhiqiang and Qi, Yuan , title =. 2020 , journal =

  36. [36]

    2022 , booktitle =

    Yang, Jianbang and Tang, Dahai and Song, Xiaoniu and Wang, Lei and Yin, Qiang and Chen, Rong and Yu, Wenyuan and Zhou, Jingren , title =. 2022 , booktitle =

  37. [37]

    and Karypis, George , title =

    Dong, Jialin and Zheng, Da and Yang, Lin F. and Karypis, George , title =. 2021 , booktitle =

  38. [38]

    Wan, Cheng and Li, Youjie and Li, Ang and Kim, Nam Sung and Lin, Yingyan , booktitle =

  39. [39]

    ICLR Workshop on Representation Learning on Graphs and Manifolds (ICLRW) , year=

    Fast graph representation learning with PyTorch Geometric , author=. ICLR Workshop on Representation Learning on Graphs and Manifolds (ICLRW) , year=

  40. [40]

    2005 , booktitle =

    Leskovec, Jure and Kleinberg, Jon and Faloutsos, Christos , title =. 2005 , booktitle =

  41. [41]

    2024 , booktitle =

    Song, Jaeyong and Jang, Hongsun and Lim, Hunseong and Jung, Jaewon and Kim, Youngsok and Lee, Jinho , title =. 2024 , booktitle =

  42. [42]

    and Strogatz, S

    Watts, D. and Strogatz, S. , title =. Nature , year =

  43. [43]

    Yoo, Mingi and Song, Jaeyong and Lee, Jounghoo and Kim, Namhyung and Kim, Youngsok and Lee, Jinho , booktitle =

  44. [44]

    International Conference on Learning Representations (ICLR) , year=

    DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , author=. International Conference on Learning Representations (ICLR) , year=

  45. [45]

    International Conference on Parallel Architectures and Compilation Techniques (PACT) , year=

    GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing , author=. International Conference on Parallel Architectures and Compilation Techniques (PACT) , year=

  46. [46]

    2023 , booktitle =

    Song, Jaeyong and Yim, Jinkyu and Jung, Jaewon and Jang, Hongsun and Kim, Hyung-Jin and Kim, Youngsok and Lee, Jinho , title =. 2023 , booktitle =

  47. [47]

    2019 , booktitle =

    Vogels, Thijs and Karimireddy, Sai Praneeth and Jaggi, Martin , title =. 2019 , booktitle =

  48. [48]

    Schardl and Charles E

    Tim Kaler and Alexandros-Stavros Iliopoulos and Philip Murzynowski and Tao B. Schardl and Charles E. Leiserson and Jie Chen , title =. Conference on Machine Learning and Systems (MLSys) , year =

  49. [49]

    2023 , booktitle =

    Yang, Shuangyan and Zhang, Minjia and Dong, Wenqian and Li, Dong , title =. 2023 , booktitle =

  50. [50]

    2026 , booktitle =

    A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs , author=. 2026 , booktitle =

  51. [51]

    2023 , booktitle =

    Waleffe, Roger and Mohoney, Jason and Rekatsinas, Theodoros and Venkataraman, Shivaram , title =. 2023 , booktitle =

  52. [52]

    2024 , booktitle =

    Jiang, Qisheng and Jia, Lei and Wang, Chundong , title =. 2024 , booktitle =

  53. [53]

    2025 , booktitle =

    Liu, Renjie and Wang, Yichuan and Yan, Xiao and Jiang, Haitian and Cai, Zhenkun and Wang, Minjie and Tang, Bo and Li, Jinyang , title =. 2025 , booktitle =

  54. [54]

    Out-of-Core Edge Partitioning at Linear Run-Time , year=

    Mayer, Ruben and Orujzade, Kamil and Jacobsen, Hans-Arno , booktitle=. Out-of-Core Edge Partitioning at Linear Run-Time , year=

  55. [55]

    2023 , booktitle=

    Wan, Xinchen and Xu, Kaiqiang and Liao, Xudong and Jin, Yilun and Chen, Kai and Jin, Xin , title =. 2023 , booktitle=

  56. [56]

    and Lenssen, J

    Fey, M. and Lenssen, J. E. and Weichert, F. and Leskovec, J. , booktitle=

  57. [57]

    Rajbhandari, Samyam and Ruwase, Olatunji and Rasley, Jeff and Smith, Shaden and He, Yuxiong , booktitle=

  58. [58]

    Sun, Jie and Sun, Mo and Zhang, Zheng and Xie, Jun and Shi, Zuocheng and Yang, Zihan and Zhang, Jie and Wu, Fei and Wang, Zeke , journal=

  59. [59]

    , title =

    Park, Yeonhong and Min, Sunhong and Lee, Jae W. , title =. 2022 , journal =

  60. [60]

    2023 , journal =

    Wang, Qiange and Chen, Yao and Wong, Weng-Fai and He, Bingsheng , title =. 2023 , journal =

  61. [61]

    LaSalle, Dominique and Karypis, George , booktitle=

  62. [62]

    Martella, Claudio and Logothetis, Dionysios and Loukas, Andreas and Siganos, Georgos , booktitle=

  63. [63]

    and Bik, Aart J.C and Dehnert, James C

    Malewicz, Grzegorz and Austern, Matthew H. and Bik, Aart J.C and Dehnert, James C. and Horn, Ilan and Leiser, Naty and Czajkowski, Grzegorz , year =

  64. [64]

    Karypis, George and Schloegel, Kirk and Kumar, Vipin , institution=

  65. [65]

    Zhang, Ruisi and Javaheripi, Mojan and Ghodsi, Zahra and Bleiweiss, Amit and Koushanfar, Farinaz , booktitle=

  66. [66]

    Tsourakakis, Charalampos and Gkantsidis, Christos and Radunovic, Bozidar and Vojnovic, Milan , booktitle =

  67. [67]

    Echbarthi, Ghizlane and Kheddouci, Hamamache , booktitle=

  68. [68]

    Kaur, Gurneet and Gupta, Rajiv , booktitle=

  69. [69]

    2017 , journal =

    Jia, Zhihao and Kwon, Yongkee and Shipman, Galen and McCormick, Pat and Erez, Mattan and Aiken, Alex , title =. 2017 , journal =

  70. [70]

    Huang, Wenbing and Zhang, Tong and Rong, Yu and Huang, Junzhou , journal=

  71. [71]

    2018 , booktitle =

    Chen, Hongzhi and Liu, Miao and Zhao, Yunjian and Yan, Xiao and Yan, Da and Cheng, James , title =. 2018 , booktitle =

  72. [72]

    and Leskovec, Jure , title =

    Ying, Rex and He, Ruining and Chen, Kaifeng and Eksombatchai, Pong and Hamilton, William L. and Leskovec, Jure , title =. ACM SIGKDD International Conference on Knowledge Discovery Data Mining (KDD) , year =

  73. [73]

    International Conference on Machine Learning (ICML) , year=

    Sheng, Ying and Zheng, Lianmin and Yuan, Binhang and Li, Zhuohan and Ryabinin, Max and Chen, Beidi and Liang, Percy and R. International Conference on Machine Learning (ICML) , year=

  74. [74]

    Sun, Jie and Su, Li and Shi, Zuocheng and Shen, Wenting and Wang, Zeke and Wang, Lei and Zhang, Jie and Li, Yong and Yu, Wenyuan and Zhou, Jingren and others , booktitle=

  75. [75]

    Karypis, George and Kumar, Vipin , booktitle=

  76. [76]

    Physical review E , year=

    Raghavan, Usha Nandini and Albert, R. Physical review E , year=

  77. [77]

    and Hager, William W

    Davis, Timothy A. and Hager, William W. and Kolodziej, Scott P. and Yeralan, S. Nuri , title =. 2020 , journal =

  78. [78]

    Heuer, Tobias and Schlag, Sebastian , booktitle=

  79. [79]

    Shaydulin, Ruslan and Safro, Ilya , journal=

  80. [80]

    Akhremtsev, Yaroslav and Heuer, Tobias and Sanders, Peter and Schlag, Sebastian , booktitle=

Showing first 80 references.