pith. machine review for the scientific record. sign in

arxiv: 2604.21290 · v1 · submitted 2026-04-23 · 💻 cs.CV · cs.DC

Recognition: unknown

GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna

Pith reviewed 2026-05-09 22:23 UTC · model grok-4.3

classification 💻 cs.CV cs.DC
keywords Vision Graph Neural Networksgraph constructionFPGA accelerationkNN searchpipelined designmessage passingVision GNNsdecoupled layers
0
0 comments X

The pith

Vision GNNs can build each layer's graph from the prior layer's features while updating the current layer, removing the sequential bottleneck between construction and convolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision Graph Neural Networks build a fresh k-nearest-neighbor graph from the current patch features at every layer before performing the feature update. This creates a strict sequential dependency and makes graph construction the dominant cost. GraphLeap breaks the dependency by using the graph constructed from the previous layer's features to update the current layer while the current features are used to build the graph for the next layer. The reformulation turns the per-layer work into two overlapping streams that can run concurrently. On FPGA hardware this overlap is realized with a pipelined design that delivers large speedups while accuracy is restored by brief fine-tuning.

Core claim

GraphLeap performs the feature update at layer ℓ using a graph built from the previous layer's features, while simultaneously using the current layer's features to construct the graph for layer ℓ+1. This one-layer-lookahead graph construction enables concurrent graph construction and message passing.

What carries the argument

One-layer-lookahead graph construction that decouples per-layer kNN search from the message-passing feature update.

If this is right

  • A streaming layer-pipelined FPGA accelerator becomes possible because graph construction and feature update can overlap.
  • The design exploits node- and channel-level parallelism and avoids materializing edge features on chip.
  • Up to 95.7 times speedup over CPU and 8.5 times speedup over GPU baselines are observed on isotropic and pyramidal ViG models.
  • Real-time Vision GNN inference on FPGA hardware is shown to be feasible.
  • The same decoupling applies to both isotropic and pyramidal ViG architectures without changing the underlying convolution operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The stability of nearest-neighbor relations across adjacent layers may be a general property of patch-token graphs in vision tasks.
  • Similar lookahead decoupling could be applied to other models that rebuild dynamic graphs at each step, such as point-cloud networks.
  • The FPGA dataflow without explicit edges suggests that memory bandwidth savings would be even larger on memory-constrained edge devices.
  • Combining the lookahead with quantization or pruning would be a direct next step that the current accelerator already supports.

Load-bearing premise

Any accuracy loss from using the prior layer's features to build the current graph can be recovered by a few epochs of fine-tuning.

What would settle it

Training the modified models and measuring whether validation accuracy remains below the original ViG baseline even after ten epochs of fine-tuning.

Figures

Figures reproduced from arXiv: 2604.21290 by Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna.

Figure 1
Figure 1. Figure 1: Overall architecture of GraphLeap, featuring the Graph Construction [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dataflow for the Graph Construction Engine and the Feature Update Engine, including the structure of the Graph Convolution Engine. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pipeline of the FPGA accelerator for end-to-end ViG inference, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: The Gather Module, responsible for obtaining node and co-node [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: End-to-End Inference Latency (ms) for ViG and GraphLeap variants (image resolution [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Vision Graph Neural Networks (ViGs) represent an image as a graph of patch tokens, enabling adaptive, feature-driven neighborhoods. Unlike CNNs with fixed grid biases or Vision Transformers with global token interactions, ViGs rely on dynamic graph convolution: at each layer, a feature-dependent graph is built via k-nearest-neighbor (kNN) search on current patch features, followed by message passing. This per-layer graph construction is the main bottleneck, consuming 50--95\% of graph convolution time on CPUs and GPUs, scaling as $O(N^2)$ with the number of patches $N$, and creating a sequential dependency between graph construction and feature updates. We introduce GraphLeap, a simple reformulation that removes this dependency by decoupling graph construction from feature update across layers. GraphLeap performs the feature update at layer $\ell$ using a graph built from the previous layer's features, while simultaneously using the current layer's features to construct the graph for layer $\ell+1$. This one-layer-lookahead graph construction enables concurrent graph construction and message passing. Although using prior-layer features can introduce minor accuracy degradation, lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy. Building on GraphLeap, we present the first end-to-end FPGA accelerator for Vision GNNs. Our streaming, layer-pipelined design overlaps a kNN graph construction engine with a feature update engine, exploits node- and channel-level parallelism, and enables efficient on-chip dataflow without explicit edge-feature materialization. Evaluated on isotropic and pyramidal ViG models on an Alveo U280 FPGA, GraphLeap achieves up to $95.7\times$ speedup over CPU and $8.5\times$ speedup over GPU baselines, demonstrating the feasibility of real-time Vision GNN inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes GraphLeap, a reformulation of Vision Graph Neural Networks (ViGs) that decouples per-layer kNN graph construction from feature updates via a one-layer lookahead: the graph for layer ℓ is built from layer-(ℓ-1) features while the current features build the graph for ℓ+1. This removes the sequential dependency, enabling concurrent graph construction and message passing. The authors present a streaming, layer-pipelined FPGA accelerator on Alveo U280 that overlaps kNN and feature-update engines, and report up to 95.7× speedup over CPU and 8.5× over GPU baselines for isotropic and pyramidal ViG models, with accuracy recovered after a few epochs of fine-tuning.

Significance. If the accuracy-recovery claim holds, the result is significant because it directly attacks the dominant O(N²) kNN bottleneck that consumes 50–95 % of ViG inference time. The end-to-end FPGA design with node-/channel-level parallelism and on-chip dataflow without explicit edge materialization is a concrete engineering contribution that demonstrates real-time ViG inference is feasible on reconfigurable hardware. The empirical speedups are measured against standard CPU/GPU baselines and therefore constitute a falsifiable, reproducible performance claim.

major comments (2)
  1. [§5] §5 (Experimental Results), Table 3 and associated text: the claim that “lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy” is load-bearing for the reported speedups, yet the manuscript provides neither the magnitude of the pre-fine-tuning accuracy drop for each ViG variant, nor layer-wise statistics on kNN-set divergence when features are lagged by one layer, nor an ablation that trains the lookahead model from scratch. Without these data it is impossible to judge whether the degradation is uniformly minor or dataset-/hyperparameter-dependent.
  2. [§3.2] §3.2 (GraphLeap formulation), Eq. (3)–(5): the reformulation replaces the current-layer feature matrix X^ℓ with X^{ℓ-1} inside the kNN operator for the graph used at layer ℓ. No bound or empirical quantification is given on the resulting change in neighborhood structure or on the perturbation this introduces to the message-passing operator; such an analysis would be required to substantiate that the semantic alteration remains recoverable by short fine-tuning.
minor comments (2)
  1. [Figure 4] Figure 4 (FPGA architecture diagram): the dataflow arrows between the kNN engine and the feature-update engine should be annotated with the exact on-chip buffer sizes and the cycle counts for each overlapped stage to make the claimed concurrency explicit.
  2. [§4.1] §4.1 (Implementation details): the description of the kNN engine states that it avoids explicit edge-feature materialization, but does not specify the bit-widths chosen for distance computation or the sorting network depth; these parameters directly affect both resource utilization and the reported latency numbers.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of GraphLeap for enabling real-time Vision GNN inference on FPGA. We address each major comment below and will revise the manuscript to incorporate additional analysis and data as described.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental Results), Table 3 and associated text: the claim that “lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy” is load-bearing for the reported speedups, yet the manuscript provides neither the magnitude of the pre-fine-tuning accuracy drop for each ViG variant, nor layer-wise statistics on kNN-set divergence when features are lagged by one layer, nor an ablation that trains the lookahead model from scratch. Without these data it is impossible to judge whether the degradation is uniformly minor or dataset-/hyperparameter-dependent.

    Authors: We agree that these supporting data are essential to substantiate the accuracy-recovery claim. In the revised manuscript we will expand Table 3 (or add a companion table) to report the pre-fine-tuning top-1 accuracy for every evaluated ViG variant and dataset, together with the number of fine-tuning epochs required to recover the original accuracy. We will also add a new subsection (or appendix) containing layer-wise empirical statistics on kNN-set divergence, including average Jaccard overlap and set-difference size between neighborhoods computed from current-layer versus previous-layer features. Finally, we will include an ablation experiment that trains the GraphLeap (lookahead) model from scratch and compares its convergence trajectory and final accuracy against the fine-tuned version. These additions will allow readers to assess the magnitude and consistency of any degradation. revision: yes

  2. Referee: [§3.2] §3.2 (GraphLeap formulation), Eq. (3)–(5): the reformulation replaces the current-layer feature matrix X^ℓ with X^{ℓ-1} inside the kNN operator for the graph used at layer ℓ. No bound or empirical quantification is given on the resulting change in neighborhood structure or on the perturbation this introduces to the message-passing operator; such an analysis would be required to substantiate that the semantic alteration remains recoverable by short fine-tuning.

    Authors: We acknowledge that the current manuscript lacks explicit quantification of the neighborhood perturbation. In the revision we will add empirical measurements—specifically, layer-wise Jaccard similarity and average Hamming distance between the kNN graphs constructed from X^ℓ versus X^{ℓ-1}—across the isotropic and pyramidal ViG models and the evaluated datasets. These statistics will directly illustrate the magnitude of the change in neighborhood structure and its effect on the message-passing operator. While we do not provide a theoretical bound (deriving a tight, general bound on kNN perturbation under gradual feature evolution would require substantial additional theoretical development outside the engineering scope of this work), the combination of the new empirical quantification and the fine-tuning recovery results will substantiate that the semantic alteration is limited and recoverable. revision: partial

standing simulated objections not resolved
  • A theoretical bound on the perturbation to neighborhood structure and the message-passing operator induced by the one-layer lookahead formulation (requested in §3.2).

Circularity Check

0 steps flagged

No circularity: algorithmic reformulation and empirical speedups are independent of inputs

full rationale

The paper's core contribution is an explicit algorithmic change—using layer-(ℓ-1) features to build the kNN graph for layer-ℓ message passing while constructing the next graph from current features—followed by measured FPGA throughput on Alveo U280 against CPU/GPU baselines. No equation or result is obtained by fitting a parameter to a subset of the target data and relabeling it a prediction; no uniqueness theorem or ansatz is imported via self-citation; and the accuracy-recovery claim is presented as an empirical observation rather than a derived identity. The reported speedups (95.7× CPU, 8.5× GPU) are direct wall-clock measurements, not quantities forced by the reformulation itself. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions in graph neural networks and hardware design without introducing new free parameters or entities.

axioms (1)
  • domain assumption k-nearest-neighbor search on patch features defines the graph
    Standard in GNNs for dynamic graphs.

pith-pipeline@v0.9.0 · 5638 in / 1222 out tokens · 43616 ms · 2026-05-09T22:23:51.900967+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 23 canonical work pages · 3 internal anchors

  1. [1]

    Gradient-based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [Online]. Available: https://ieeexplore.ieee.org/document/726791

  2. [2]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems (NeurIPS), vol. 25, 2012

  3. [3]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://arxiv.org/abs/2010.11929

  4. [4]

    Do vision transformers see like convolutional neural networks?

    M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy, “Do vision transformers see like convolutional neural networks?” 2022. [Online]. Available: https://arxiv.org/abs/2108.08810

  5. [5]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023. [Online]. Available: https://arxiv.org/abs/2312.00752

  6. [6]

    Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

    L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” inInternational Conference on Machine Learning (ICML), 2024. [Online]. Available: https://arxiv.org/abs/2401.09417

  7. [7]

    arXiv preprint arXiv:2206.00272 , year=

    K. Han, Y . Wang, J. Guo, Y . Tang, and E. Wu, “Vision gnn: An image is worth graph of nodes,” 2022. [Online]. Available: https://arxiv.org/abs/2206.00272

  8. [8]

    Mobilevig: Graph-based sparse attention for mobile vision applications,

    M. Munir, W. Avery, and R. Marculescu, “Mobilevig: Graph-based sparse attention for mobile vision applications,” 2023. [Online]. Available: https://arxiv.org/abs/2307.00395

  9. [9]

    Vision hgnn: An image is more than a graph of nodes,

    Y . Han, P. Wang, S. Kundu, Y . Ding, and Z. Wang, “Vision hgnn: An image is more than a graph of nodes,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19 821–19 831

  10. [10]

    Clustervig: Efficient globally aware vision gnns via image partitioning,

    D. Parikh, J. Fein-Ashley, T. Ye, R. Kannan, and V . Prasanna, “Clustervig: Efficient globally aware vision gnns via image partitioning,”

  11. [11]

    Available: https://arxiv.org/abs/2501.10640

    [Online]. Available: https://arxiv.org/abs/2501.10640

  12. [12]

    Wignet: Windowed vision graph neural network,

    G. Spadaro, M. Grangetto, A. Fiandrotti, E. Tartaglione, and J. H. Giraldo, “Wignet: Windowed vision graph neural network,” 2024. [Online]. Available: https://arxiv.org/abs/2410.00807

  13. [13]

    Scaling graph convolutions for mobile vision,

    W. Avery, M. Munir, and R. Marculescu, “Scaling graph convolutions for mobile vision,” 2024. [Online]. Available: https://arxiv.org/abs/2406. 05850

  14. [14]

    Adaptvig: Adaptive vision gnn with exponential decay gating,

    M. Munir, M. M. Rahman, and R. Marculescu, “Adaptvig: Adaptive vision gnn with exponential decay gating,” 2025. [Online]. Available: https://arxiv.org/abs/2511.09942

  15. [15]

    Attentionvig: Cross-attention- based dynamic neighbor aggregation in vision gnns,

    H. E. Gedik, A. Martin, M. Munir, O. Baser, R. Marculescu, S. P. Chinchali, and A. C. Bovik, “Attentionvig: Cross-attention- based dynamic neighbor aggregation in vision gnns,” 2025. [Online]. Available: https://arxiv.org/abs/2509.25570

  16. [16]

    Searchvig: Optimal vision GNNs via ramanujan spectral optimization,

    M. Munir, M. M. Rahman, X. Wei, Y . Yang, and R. Marculescu, “Searchvig: Optimal vision GNNs via ramanujan spectral optimization,” inThe Fourth Learning on Graphs Conference, 2025. [Online]. Available: https://openreview.net/forum?id=cmEzgaYIJC

  17. [17]

    Dvhgnn: Multi-scale dilated vision hgnn for efficient vision recognition,

    C. Liet al., “Dvhgnn: Multi-scale dilated vision hgnn for efficient vision recognition,”arXiv preprint arXiv:2503.14867, 2025

  18. [18]

    Pvg: Progressive vision graph for vision recognition,

    J. Wu, J. Li, J. Zhang, B. Zhang, M. Chi, Y . Wang, and C. Wang, “Pvg: Progressive vision graph for vision recognition,” inProceedings of the 31st ACM International Conference on Multimedia, ser. MM ’23. ACM, Oct. 2023, p. 2477–2486. [Online]. Available: http://dx.doi.org/10.1145/3581783.3612122

  19. [19]

    Evgnn: An event-driven graph neural network accelerator for edge vision,

    Y . Yang, A. Kneip, and C. Frenkel, “Evgnn: An event-driven graph neural network accelerator for edge vision,”IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2025

  20. [20]

    Greedyvig: Dynamic axial graph construction for efficient vision gnns,

    M. Munir, W. Avery, M. M. Rahman, and R. Marculescu, “Greedyvig: Dynamic axial graph construction for efficient vision gnns,” 2024. [Online]. Available: https://arxiv.org/abs/2405.06849

  21. [21]

    A new model for learning in graph domains,

    M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” inProceedings. 2005 IEEE international joint conference on neural networks, 2005., vol. 2. IEEE, 2005, pp. 729–734

  22. [22]

    The graph neural network model,

    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfar- dini, “The graph neural network model,”IEEE transactions on neural networks, vol. 20, no. 1, pp. 61–80, 2008

  23. [23]

    Neural message passing for quantum chemistry,

    J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” inInternational conference on machine learning. Pmlr, 2017, pp. 1263–1272

  24. [24]

    Semi-Supervised Classification with Graph Convolutional Networks

    T. Kipf, “Semi-supervised classification with graph convolutional net- works,”arXiv preprint arXiv:1609.02907, 2016

  25. [25]

    Inductive representation learning on large graphs,

    W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,”Advances in neural information processing systems, vol. 30, 2017

  26. [26]

    Deepgcns: Can gcns go as deep as cnns?

    G. Li, M. Muller, A. Thabet, and B. Ghanem, “Deepgcns: Can gcns go as deep as cnns?” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9267–9276

  27. [27]

    Peano-vit: Power- efficient approximations of non-linearities in vision transformers,

    M. E. Sadeghi, A. Fayyazi, S. Azizi, and M. Pedram, “Peano-vit: Power- efficient approximations of non-linearities in vision transformers,” in Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, ser. ISLPED ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1–6. [Online]. Available: https://doi...

  28. [28]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255

  29. [29]

    Vig pytorch: Vision gnns official codebase,

    H. N. A. Lab, “Vig pytorch: Vision gnns official codebase,” https://github.com/huawei-noah/Efficient-AI-Backbones/, 2022, accessed: 2025-07-12

  30. [30]

    Drvit: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on fpga,

    X. Sun, Y . Zhang, Q. Wang, X. Zou, Y . Liu, Z. Zeng, and H. Zhuang, “Drvit: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on fpga,”Journal of Parallel and Distributed Computing, vol. 199, p. 105042, 2025

  31. [31]

    Ubimoe: A ubiquitous mixture-of-experts vision transformer accelerator with hybrid computation pattern on fpga,

    J. Dong, W. Lou, Z. Zheng, Y . Qin, L. Gong, C. Wang, and X. Zhou, “Ubimoe: A ubiquitous mixture-of-experts vision transformer accelerator with hybrid computation pattern on fpga,” inarXiv preprint arXiv:2502.05602v3, 2025. [Online]. Available: https://arxiv.org/abs/2502.05602

  32. [32]

    Recent research progress of graph neural networks in computer vision,

    J. Liu, Z. Wang, Y . Zhanget al., “Recent research progress of graph neural networks in computer vision,”Electronics, vol. 14, no. 9, p. 1742, 2025

  33. [33]

    Bi-gcn: Binary graph convolutional network,

    X. Lin, M. Ding, Y . Wanget al., “Bi-gcn: Binary graph convolutional network,” inIEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2021

  34. [34]

    Bitgnn: Unleashing the performance potential of binary graph neural networks on gpus,

    Z. Wang, H. Liuet al., “Bitgnn: Unleashing the performance potential of binary graph neural networks on gpus,” inICS, 2023

  35. [35]

    Accelerating large scale real-time gnn inference using channel pruning,

    L. Zhao, X. Wanget al., “Accelerating large scale real-time gnn inference using channel pruning,” inVLDB, 2021

  36. [36]

    Early-bird gcns: Graph- network co-optimization towards more efficient gcn training and infer- ence via drawing early-bird lottery tickets,

    H. You, Z. Lu, Z. Zhou, Y . Fu, and Y . Lin, “Early-bird gcns: Graph- network co-optimization towards more efficient gcn training and infer- ence via drawing early-bird lottery tickets,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8910– 8918

  37. [37]

    Magnas: A mapping-aware graph neural architecture search framework for heterogeneous mpsoc deployment,

    M. Odema, H. Bouzidi, H. Ouarnoughi, S. Niar, and M. A. Al Faruque, “Magnas: A mapping-aware graph neural architecture search framework for heterogeneous mpsoc deployment,”ACM Trans. Embed. Comput. Syst., vol. 22, no. 5s, Sep. 2023. [Online]. Available: https://doi.org/10.1145/3609386

  38. [38]

    Graph vision networks for link prediction,

    Y . Wei, J. Kwok, and Y . Zhang, “Graph vision networks for link prediction,” 2025. [Online]. Available: https://openreview.net/forum?id= mhCNUP4Udw

  39. [39]

    Hygcn: A gcn accelerator with hybrid architecture,

    M. Yan, L. Deng, X. Hu, L. Liang, Y . Feng, X. Ye, Z. Zhang, D. Fan, and Y . Xie, “Hygcn: A gcn accelerator with hybrid architecture,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020, pp. 15–29

  40. [40]

    Graphagile: An fpga-based overlay accelerator for low-latency gnn inference,

    B. Zhang, H. Zeng, and V . K. Prasanna, “Graphagile: An fpga-based overlay accelerator for low-latency gnn inference,”IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 9, pp. 2580–2597, 2023

  41. [41]

    Graphlily: Accelerating graph linear algebra on hbm-equipped fpgas,

    Y . Hu, Y . Du, E. Ustun, and Z. Zhang, “Graphlily: Accelerating graph linear algebra on hbm-equipped fpgas,” in2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2021, pp. 1–9

  42. [42]

    Mega: A memory-efficient gnn accelerator exploiting degree-aware mixed-precision quantization,

    Z. Zhu, F. Li, G. Li, Z. Liu, Z. Mo, Q. Hu, X. Liang, and J. Cheng, “Mega: A memory-efficient gnn accelerator exploiting degree-aware mixed-precision quantization,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024, pp. 124– 138

  43. [43]

    Accelerating graph convolutional networks through a pim- accelerated approach,

    H. Jin, D. Chen, L. Zheng, Y . Huang, P. Yao, J. Zhao, X. Liao, and W. Jiang, “Accelerating graph convolutional networks through a pim- accelerated approach,”IEEE Transactions on Computers, vol. 72, no. 9, pp. 2628–2640, 2023

  44. [44]

    Lift: Ex- ploiting hybrid stacked memory for energy-efficient processing of graph convolutional networks,

    J. Chen, Z. Zhong, K. Sun, C. Ma, R. Mao, and Y . Wang, “Lift: Ex- ploiting hybrid stacked memory for energy-efficient processing of graph convolutional networks,” in2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6

  45. [45]

    Graphite: Hardware-aware gnn reshaping for acceleration with gpu tensor cores,

    H. Kim, T. Lim, and W. J. Song, “Graphite: Hardware-aware gnn reshaping for acceleration with gpu tensor cores,”IEEE Trans. Parallel Distrib. Syst., vol. 36, no. 5, p. 918–931, May 2025. [Online]. Available: https://doi.org/10.1109/TPDS.2025.3549180

  46. [46]

    Accelerating sparse graph neural networks with tensor core optimization,

    K. W. Wu, “Accelerating sparse graph neural networks with tensor core optimization,” 2025. [Online]. Available: https://arxiv.org/abs/2412. 12218

  47. [47]

    A comprehensive survey on graph neural network accelerators,

    J. Liu, S. Chen, and L. Shen, “A comprehensive survey on graph neural network accelerators,”Frontiers of Computer Science, vol. 19, no. 2, p. 192104, Nov. 2024. [Online]. Available: https: //doi.org/10.1007/s11704-023-3307-2

  48. [48]

    A survey on graph neural network acceleration,

    H. Liu, X. Ma, Z. Zhang, H. Liet al., “A survey on graph neural network acceleration,”arXiv preprint arXiv:2306.14052, 2023. [Online]. Available: https://arxiv.org/abs/2306.14052

  49. [49]

    A survey on fpga-based accelerator for ml models,

    F. Yan, A. Koch, and O. Sinnen, “A survey on fpga-based accelerator for ml models,” 2024. [Online]. Available: https://arxiv.org/abs/2412.15666

  50. [50]

    Acceleration algorithms in gnns: A survey,

    L. Ma, Z. Sheng, X. Li, X. Gao, Z. Hao, L. Yang, W. Zhang, and B. Cui, “Acceleration algorithms in gnns: A survey,” 2024. [Online]. Available: https://arxiv.org/abs/2405.04114

  51. [51]

    Accelerating dynamic image graph construction on fpga for vision gnns,

    A. Ramachandran, D. Parikh, and V . Prasanna, “Accelerating dynamic image graph construction on fpga for vision gnns,” arXiv preprint arXiv:2509.25121, 2025. [Online]. Available: https: //arxiv.org/abs/2509.25121