arxiv: 2604.21290 · v1 · submitted 2026-04-23 · 💻 cs.CV · cs.DC

Recognition: unknown

GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna

Pith reviewed 2026-05-09 22:23 UTC · model grok-4.3

classification 💻 cs.CV cs.DC

keywords Vision Graph Neural Networksgraph constructionFPGA accelerationkNN searchpipelined designmessage passingVision GNNsdecoupled layers

0 comments

The pith

Vision GNNs can build each layer's graph from the prior layer's features while updating the current layer, removing the sequential bottleneck between construction and convolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision Graph Neural Networks build a fresh k-nearest-neighbor graph from the current patch features at every layer before performing the feature update. This creates a strict sequential dependency and makes graph construction the dominant cost. GraphLeap breaks the dependency by using the graph constructed from the previous layer's features to update the current layer while the current features are used to build the graph for the next layer. The reformulation turns the per-layer work into two overlapping streams that can run concurrently. On FPGA hardware this overlap is realized with a pipelined design that delivers large speedups while accuracy is restored by brief fine-tuning.

Core claim

GraphLeap performs the feature update at layer ℓ using a graph built from the previous layer's features, while simultaneously using the current layer's features to construct the graph for layer ℓ+1. This one-layer-lookahead graph construction enables concurrent graph construction and message passing.

What carries the argument

One-layer-lookahead graph construction that decouples per-layer kNN search from the message-passing feature update.

If this is right

A streaming layer-pipelined FPGA accelerator becomes possible because graph construction and feature update can overlap.
The design exploits node- and channel-level parallelism and avoids materializing edge features on chip.
Up to 95.7 times speedup over CPU and 8.5 times speedup over GPU baselines are observed on isotropic and pyramidal ViG models.
Real-time Vision GNN inference on FPGA hardware is shown to be feasible.
The same decoupling applies to both isotropic and pyramidal ViG architectures without changing the underlying convolution operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stability of nearest-neighbor relations across adjacent layers may be a general property of patch-token graphs in vision tasks.
Similar lookahead decoupling could be applied to other models that rebuild dynamic graphs at each step, such as point-cloud networks.
The FPGA dataflow without explicit edges suggests that memory bandwidth savings would be even larger on memory-constrained edge devices.
Combining the lookahead with quantization or pruning would be a direct next step that the current accelerator already supports.

Load-bearing premise

Any accuracy loss from using the prior layer's features to build the current graph can be recovered by a few epochs of fine-tuning.

What would settle it

Training the modified models and measuring whether validation accuracy remains below the original ViG baseline even after ten epochs of fine-tuning.

Figures

Figures reproduced from arXiv: 2604.21290 by Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna.

**Figure 2.** Figure 2: Dataflow for the Graph Construction Engine and the Feature Update Engine, including the structure of the Graph Convolution Engine. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Pipeline of the FPGA accelerator for end-to-end ViG inference, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 3.** Figure 3: The Gather Module, responsible for obtaining node and co-node [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: End-to-End Inference Latency (ms) for ViG and GraphLeap variants (image resolution [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Vision Graph Neural Networks (ViGs) represent an image as a graph of patch tokens, enabling adaptive, feature-driven neighborhoods. Unlike CNNs with fixed grid biases or Vision Transformers with global token interactions, ViGs rely on dynamic graph convolution: at each layer, a feature-dependent graph is built via k-nearest-neighbor (kNN) search on current patch features, followed by message passing. This per-layer graph construction is the main bottleneck, consuming 50--95\% of graph convolution time on CPUs and GPUs, scaling as $O(N^2)$ with the number of patches $N$, and creating a sequential dependency between graph construction and feature updates. We introduce GraphLeap, a simple reformulation that removes this dependency by decoupling graph construction from feature update across layers. GraphLeap performs the feature update at layer $\ell$ using a graph built from the previous layer's features, while simultaneously using the current layer's features to construct the graph for layer $\ell+1$. This one-layer-lookahead graph construction enables concurrent graph construction and message passing. Although using prior-layer features can introduce minor accuracy degradation, lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy. Building on GraphLeap, we present the first end-to-end FPGA accelerator for Vision GNNs. Our streaming, layer-pipelined design overlaps a kNN graph construction engine with a feature update engine, exploits node- and channel-level parallelism, and enables efficient on-chip dataflow without explicit edge-feature materialization. Evaluated on isotropic and pyramidal ViG models on an Alveo U280 FPGA, GraphLeap achieves up to $95.7\times$ speedup over CPU and $8.5\times$ speedup over GPU baselines, demonstrating the feasibility of real-time Vision GNN inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphLeap's one-layer lookahead decouples graph build from update in ViGs to enable pipelined FPGA execution and solid speedups, but the accuracy recovery claim rests on thin evidence.

read the letter

GraphLeap decouples graph construction from convolution in Vision GNNs using a one-layer lookahead, which lets them pipeline the two on an FPGA and report big speedups. The reformulation is straightforward: at layer l they update features using a graph from layer l-1 features, while building the graph for l+1 from the current features. This removes the sequential bottleneck where kNN search took most of the time. They then implement a streaming FPGA design that overlaps the kNN engine with the feature update engine, using on-chip dataflow. What stands out is the end-to-end accelerator. They evaluate on both isotropic and pyramidal ViG models on the Alveo U280 and get up to 95.7 times faster than CPU and 8.5 times over GPU. Avoiding materialization of edge features helps keep memory use down. For people who need real-time ViG inference on FPGAs, this shows it's feasible. The soft spot is the accuracy side. The paper claims that any drop from using lagged features is minor and fixed by a few epochs of fine-tuning. But there is no detail on the size of the initial drop, no layer-by-layer look at how much the neighbor sets change, and no test of training the modified model from scratch. If those numbers are in the full paper they should be highlighted; otherwise the speedups only matter if accuracy stays competitive. This work is aimed at the intersection of graph neural nets and hardware acceleration for vision. Readers building custom accelerators or optimizing GNNs for edge devices will find the pipelining approach and the FPGA results useful. It is worth sending to peer review because the core idea is clean and the hardware numbers are concrete, though the accuracy validation could use more depth. I would recommend peer review with a note to strengthen the ablation on how the lookahead affects model behavior across layers and datasets.

Referee Report

2 major / 2 minor

Summary. The paper proposes GraphLeap, a reformulation of Vision Graph Neural Networks (ViGs) that decouples per-layer kNN graph construction from feature updates via a one-layer lookahead: the graph for layer ℓ is built from layer-(ℓ-1) features while the current features build the graph for ℓ+1. This removes the sequential dependency, enabling concurrent graph construction and message passing. The authors present a streaming, layer-pipelined FPGA accelerator on Alveo U280 that overlaps kNN and feature-update engines, and report up to 95.7× speedup over CPU and 8.5× over GPU baselines for isotropic and pyramidal ViG models, with accuracy recovered after a few epochs of fine-tuning.

Significance. If the accuracy-recovery claim holds, the result is significant because it directly attacks the dominant O(N²) kNN bottleneck that consumes 50–95 % of ViG inference time. The end-to-end FPGA design with node-/channel-level parallelism and on-chip dataflow without explicit edge materialization is a concrete engineering contribution that demonstrates real-time ViG inference is feasible on reconfigurable hardware. The empirical speedups are measured against standard CPU/GPU baselines and therefore constitute a falsifiable, reproducible performance claim.

major comments (2)

[§5] §5 (Experimental Results), Table 3 and associated text: the claim that “lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy” is load-bearing for the reported speedups, yet the manuscript provides neither the magnitude of the pre-fine-tuning accuracy drop for each ViG variant, nor layer-wise statistics on kNN-set divergence when features are lagged by one layer, nor an ablation that trains the lookahead model from scratch. Without these data it is impossible to judge whether the degradation is uniformly minor or dataset-/hyperparameter-dependent.
[§3.2] §3.2 (GraphLeap formulation), Eq. (3)–(5): the reformulation replaces the current-layer feature matrix X^ℓ with X^{ℓ-1} inside the kNN operator for the graph used at layer ℓ. No bound or empirical quantification is given on the resulting change in neighborhood structure or on the perturbation this introduces to the message-passing operator; such an analysis would be required to substantiate that the semantic alteration remains recoverable by short fine-tuning.

minor comments (2)

[Figure 4] Figure 4 (FPGA architecture diagram): the dataflow arrows between the kNN engine and the feature-update engine should be annotated with the exact on-chip buffer sizes and the cycle counts for each overlapped stage to make the claimed concurrency explicit.
[§4.1] §4.1 (Implementation details): the description of the kNN engine states that it avoids explicit edge-feature materialization, but does not specify the bit-widths chosen for distance computation or the sorting network depth; these parameters directly affect both resource utilization and the reported latency numbers.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of GraphLeap for enabling real-time Vision GNN inference on FPGA. We address each major comment below and will revise the manuscript to incorporate additional analysis and data as described.

read point-by-point responses

Referee: [§5] §5 (Experimental Results), Table 3 and associated text: the claim that “lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy” is load-bearing for the reported speedups, yet the manuscript provides neither the magnitude of the pre-fine-tuning accuracy drop for each ViG variant, nor layer-wise statistics on kNN-set divergence when features are lagged by one layer, nor an ablation that trains the lookahead model from scratch. Without these data it is impossible to judge whether the degradation is uniformly minor or dataset-/hyperparameter-dependent.

Authors: We agree that these supporting data are essential to substantiate the accuracy-recovery claim. In the revised manuscript we will expand Table 3 (or add a companion table) to report the pre-fine-tuning top-1 accuracy for every evaluated ViG variant and dataset, together with the number of fine-tuning epochs required to recover the original accuracy. We will also add a new subsection (or appendix) containing layer-wise empirical statistics on kNN-set divergence, including average Jaccard overlap and set-difference size between neighborhoods computed from current-layer versus previous-layer features. Finally, we will include an ablation experiment that trains the GraphLeap (lookahead) model from scratch and compares its convergence trajectory and final accuracy against the fine-tuned version. These additions will allow readers to assess the magnitude and consistency of any degradation. revision: yes
Referee: [§3.2] §3.2 (GraphLeap formulation), Eq. (3)–(5): the reformulation replaces the current-layer feature matrix X^ℓ with X^{ℓ-1} inside the kNN operator for the graph used at layer ℓ. No bound or empirical quantification is given on the resulting change in neighborhood structure or on the perturbation this introduces to the message-passing operator; such an analysis would be required to substantiate that the semantic alteration remains recoverable by short fine-tuning.

Authors: We acknowledge that the current manuscript lacks explicit quantification of the neighborhood perturbation. In the revision we will add empirical measurements—specifically, layer-wise Jaccard similarity and average Hamming distance between the kNN graphs constructed from X^ℓ versus X^{ℓ-1}—across the isotropic and pyramidal ViG models and the evaluated datasets. These statistics will directly illustrate the magnitude of the change in neighborhood structure and its effect on the message-passing operator. While we do not provide a theoretical bound (deriving a tight, general bound on kNN perturbation under gradual feature evolution would require substantial additional theoretical development outside the engineering scope of this work), the combination of the new empirical quantification and the fine-tuning recovery results will substantiate that the semantic alteration is limited and recoverable. revision: partial

standing simulated objections not resolved

A theoretical bound on the perturbation to neighborhood structure and the message-passing operator induced by the one-layer lookahead formulation (requested in §3.2).

Circularity Check

0 steps flagged

No circularity: algorithmic reformulation and empirical speedups are independent of inputs

full rationale

The paper's core contribution is an explicit algorithmic change—using layer-(ℓ-1) features to build the kNN graph for layer-ℓ message passing while constructing the next graph from current features—followed by measured FPGA throughput on Alveo U280 against CPU/GPU baselines. No equation or result is obtained by fitting a parameter to a subset of the target data and relabeling it a prediction; no uniqueness theorem or ansatz is imported via self-citation; and the accuracy-recovery claim is presented as an empirical observation rather than a derived identity. The reported speedups (95.7× CPU, 8.5× GPU) are direct wall-clock measurements, not quantities forced by the reformulation itself. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions in graph neural networks and hardware design without introducing new free parameters or entities.

axioms (1)

domain assumption k-nearest-neighbor search on patch features defines the graph
Standard in GNNs for dynamic graphs.

pith-pipeline@v0.9.0 · 5638 in / 1222 out tokens · 43616 ms · 2026-05-09T22:23:51.900967+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 23 canonical work pages · 3 internal anchors

[1]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [Online]. Available: https://ieeexplore.ieee.org/document/726791

1998
[2]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in Neural Infor- mation Processing Systems (NeurIPS), vol. 25, 2012

2012
[3]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

Do vision transformers see like convolutional neural networks?

M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy, “Do vision transformers see like convolutional neural networks?” 2022. [Online]. Available: https://arxiv.org/abs/2108.08810

work page arXiv 2022
[5]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023. [Online]. Available: https://arxiv.org/abs/2312.00752

work page Pith review arXiv 2023
[6]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” inInternational Conference on Machine Learning (ICML), 2024. [Online]. Available: https://arxiv.org/abs/2401.09417

work page internal anchor Pith review arXiv 2024
[7]

arXiv preprint arXiv:2206.00272 , year=

K. Han, Y . Wang, J. Guo, Y . Tang, and E. Wu, “Vision gnn: An image is worth graph of nodes,” 2022. [Online]. Available: https://arxiv.org/abs/2206.00272

work page arXiv 2022
[8]

Mobilevig: Graph-based sparse attention for mobile vision applications,

M. Munir, W. Avery, and R. Marculescu, “Mobilevig: Graph-based sparse attention for mobile vision applications,” 2023. [Online]. Available: https://arxiv.org/abs/2307.00395

work page arXiv 2023
[9]

Vision hgnn: An image is more than a graph of nodes,

Y . Han, P. Wang, S. Kundu, Y . Ding, and Z. Wang, “Vision hgnn: An image is more than a graph of nodes,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19 821–19 831

2023
[10]

Clustervig: Efficient globally aware vision gnns via image partitioning,

D. Parikh, J. Fein-Ashley, T. Ye, R. Kannan, and V . Prasanna, “Clustervig: Efficient globally aware vision gnns via image partitioning,”
[11]

Available: https://arxiv.org/abs/2501.10640

[Online]. Available: https://arxiv.org/abs/2501.10640

work page arXiv
[12]

Wignet: Windowed vision graph neural network,

G. Spadaro, M. Grangetto, A. Fiandrotti, E. Tartaglione, and J. H. Giraldo, “Wignet: Windowed vision graph neural network,” 2024. [Online]. Available: https://arxiv.org/abs/2410.00807

work page arXiv 2024
[13]

Scaling graph convolutions for mobile vision,

W. Avery, M. Munir, and R. Marculescu, “Scaling graph convolutions for mobile vision,” 2024. [Online]. Available: https://arxiv.org/abs/2406. 05850

2024
[14]

Adaptvig: Adaptive vision gnn with exponential decay gating,

M. Munir, M. M. Rahman, and R. Marculescu, “Adaptvig: Adaptive vision gnn with exponential decay gating,” 2025. [Online]. Available: https://arxiv.org/abs/2511.09942

work page arXiv 2025
[15]

Attentionvig: Cross-attention- based dynamic neighbor aggregation in vision gnns,

H. E. Gedik, A. Martin, M. Munir, O. Baser, R. Marculescu, S. P. Chinchali, and A. C. Bovik, “Attentionvig: Cross-attention- based dynamic neighbor aggregation in vision gnns,” 2025. [Online]. Available: https://arxiv.org/abs/2509.25570

work page arXiv 2025
[16]

Searchvig: Optimal vision GNNs via ramanujan spectral optimization,

M. Munir, M. M. Rahman, X. Wei, Y . Yang, and R. Marculescu, “Searchvig: Optimal vision GNNs via ramanujan spectral optimization,” inThe Fourth Learning on Graphs Conference, 2025. [Online]. Available: https://openreview.net/forum?id=cmEzgaYIJC

2025
[17]

Dvhgnn: Multi-scale dilated vision hgnn for efficient vision recognition,

C. Liet al., “Dvhgnn: Multi-scale dilated vision hgnn for efficient vision recognition,”arXiv preprint arXiv:2503.14867, 2025

work page arXiv 2025
[18]

Pvg: Progressive vision graph for vision recognition,

J. Wu, J. Li, J. Zhang, B. Zhang, M. Chi, Y . Wang, and C. Wang, “Pvg: Progressive vision graph for vision recognition,” inProceedings of the 31st ACM International Conference on Multimedia, ser. MM ’23. ACM, Oct. 2023, p. 2477–2486. [Online]. Available: http://dx.doi.org/10.1145/3581783.3612122

work page doi:10.1145/3581783.3612122 2023
[19]

Evgnn: An event-driven graph neural network accelerator for edge vision,

Y . Yang, A. Kneip, and C. Frenkel, “Evgnn: An event-driven graph neural network accelerator for edge vision,”IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2025

2025
[20]

Greedyvig: Dynamic axial graph construction for efficient vision gnns,

M. Munir, W. Avery, M. M. Rahman, and R. Marculescu, “Greedyvig: Dynamic axial graph construction for efficient vision gnns,” 2024. [Online]. Available: https://arxiv.org/abs/2405.06849

work page arXiv 2024
[21]

A new model for learning in graph domains,

M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” inProceedings. 2005 IEEE international joint conference on neural networks, 2005., vol. 2. IEEE, 2005, pp. 729–734

2005
[22]

The graph neural network model,

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfar- dini, “The graph neural network model,”IEEE transactions on neural networks, vol. 20, no. 1, pp. 61–80, 2008

2008
[23]

Neural message passing for quantum chemistry,

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” inInternational conference on machine learning. Pmlr, 2017, pp. 1263–1272

2017
[24]

Semi-Supervised Classification with Graph Convolutional Networks

T. Kipf, “Semi-supervised classification with graph convolutional net- works,”arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review arXiv 2016
[25]

Inductive representation learning on large graphs,

W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,”Advances in neural information processing systems, vol. 30, 2017

2017
[26]

Deepgcns: Can gcns go as deep as cnns?

G. Li, M. Muller, A. Thabet, and B. Ghanem, “Deepgcns: Can gcns go as deep as cnns?” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9267–9276

2019
[27]

Peano-vit: Power- efficient approximations of non-linearities in vision transformers,

M. E. Sadeghi, A. Fayyazi, S. Azizi, and M. Pedram, “Peano-vit: Power- efficient approximations of non-linearities in vision transformers,” in Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, ser. ISLPED ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1–6. [Online]. Available: https://doi...

work page doi:10.1145/3665314.3670843 2024
[28]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255

2009
[29]

Vig pytorch: Vision gnns official codebase,

H. N. A. Lab, “Vig pytorch: Vision gnns official codebase,” https://github.com/huawei-noah/Efficient-AI-Backbones/, 2022, accessed: 2025-07-12

2022
[30]

Drvit: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on fpga,

X. Sun, Y . Zhang, Q. Wang, X. Zou, Y . Liu, Z. Zeng, and H. Zhuang, “Drvit: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on fpga,”Journal of Parallel and Distributed Computing, vol. 199, p. 105042, 2025

2025
[31]

Ubimoe: A ubiquitous mixture-of-experts vision transformer accelerator with hybrid computation pattern on fpga,

J. Dong, W. Lou, Z. Zheng, Y . Qin, L. Gong, C. Wang, and X. Zhou, “Ubimoe: A ubiquitous mixture-of-experts vision transformer accelerator with hybrid computation pattern on fpga,” inarXiv preprint arXiv:2502.05602v3, 2025. [Online]. Available: https://arxiv.org/abs/2502.05602

work page arXiv 2025
[32]

Recent research progress of graph neural networks in computer vision,

J. Liu, Z. Wang, Y . Zhanget al., “Recent research progress of graph neural networks in computer vision,”Electronics, vol. 14, no. 9, p. 1742, 2025

2025
[33]

Bi-gcn: Binary graph convolutional network,

X. Lin, M. Ding, Y . Wanget al., “Bi-gcn: Binary graph convolutional network,” inIEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2021

2021
[34]

Bitgnn: Unleashing the performance potential of binary graph neural networks on gpus,

Z. Wang, H. Liuet al., “Bitgnn: Unleashing the performance potential of binary graph neural networks on gpus,” inICS, 2023

2023
[35]

Accelerating large scale real-time gnn inference using channel pruning,

L. Zhao, X. Wanget al., “Accelerating large scale real-time gnn inference using channel pruning,” inVLDB, 2021

2021
[36]

Early-bird gcns: Graph- network co-optimization towards more efficient gcn training and infer- ence via drawing early-bird lottery tickets,

H. You, Z. Lu, Z. Zhou, Y . Fu, and Y . Lin, “Early-bird gcns: Graph- network co-optimization towards more efficient gcn training and infer- ence via drawing early-bird lottery tickets,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8910– 8918

2022
[37]

Magnas: A mapping-aware graph neural architecture search framework for heterogeneous mpsoc deployment,

M. Odema, H. Bouzidi, H. Ouarnoughi, S. Niar, and M. A. Al Faruque, “Magnas: A mapping-aware graph neural architecture search framework for heterogeneous mpsoc deployment,”ACM Trans. Embed. Comput. Syst., vol. 22, no. 5s, Sep. 2023. [Online]. Available: https://doi.org/10.1145/3609386

work page doi:10.1145/3609386 2023
[38]

Graph vision networks for link prediction,

Y . Wei, J. Kwok, and Y . Zhang, “Graph vision networks for link prediction,” 2025. [Online]. Available: https://openreview.net/forum?id= mhCNUP4Udw

2025
[39]

Hygcn: A gcn accelerator with hybrid architecture,

M. Yan, L. Deng, X. Hu, L. Liang, Y . Feng, X. Ye, Z. Zhang, D. Fan, and Y . Xie, “Hygcn: A gcn accelerator with hybrid architecture,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020, pp. 15–29

2020
[40]

Graphagile: An fpga-based overlay accelerator for low-latency gnn inference,

B. Zhang, H. Zeng, and V . K. Prasanna, “Graphagile: An fpga-based overlay accelerator for low-latency gnn inference,”IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 9, pp. 2580–2597, 2023

2023
[41]

Graphlily: Accelerating graph linear algebra on hbm-equipped fpgas,

Y . Hu, Y . Du, E. Ustun, and Z. Zhang, “Graphlily: Accelerating graph linear algebra on hbm-equipped fpgas,” in2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2021, pp. 1–9

2021
[42]

Mega: A memory-efficient gnn accelerator exploiting degree-aware mixed-precision quantization,

Z. Zhu, F. Li, G. Li, Z. Liu, Z. Mo, Q. Hu, X. Liang, and J. Cheng, “Mega: A memory-efficient gnn accelerator exploiting degree-aware mixed-precision quantization,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024, pp. 124– 138

2024
[43]

Accelerating graph convolutional networks through a pim- accelerated approach,

H. Jin, D. Chen, L. Zheng, Y . Huang, P. Yao, J. Zhao, X. Liao, and W. Jiang, “Accelerating graph convolutional networks through a pim- accelerated approach,”IEEE Transactions on Computers, vol. 72, no. 9, pp. 2628–2640, 2023

2023
[44]

Lift: Ex- ploiting hybrid stacked memory for energy-efficient processing of graph convolutional networks,

J. Chen, Z. Zhong, K. Sun, C. Ma, R. Mao, and Y . Wang, “Lift: Ex- ploiting hybrid stacked memory for energy-efficient processing of graph convolutional networks,” in2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6

2023
[45]

Graphite: Hardware-aware gnn reshaping for acceleration with gpu tensor cores,

H. Kim, T. Lim, and W. J. Song, “Graphite: Hardware-aware gnn reshaping for acceleration with gpu tensor cores,”IEEE Trans. Parallel Distrib. Syst., vol. 36, no. 5, p. 918–931, May 2025. [Online]. Available: https://doi.org/10.1109/TPDS.2025.3549180

work page doi:10.1109/tpds.2025.3549180 2025
[46]

Accelerating sparse graph neural networks with tensor core optimization,

K. W. Wu, “Accelerating sparse graph neural networks with tensor core optimization,” 2025. [Online]. Available: https://arxiv.org/abs/2412. 12218

2025
[47]

A comprehensive survey on graph neural network accelerators,

J. Liu, S. Chen, and L. Shen, “A comprehensive survey on graph neural network accelerators,”Frontiers of Computer Science, vol. 19, no. 2, p. 192104, Nov. 2024. [Online]. Available: https: //doi.org/10.1007/s11704-023-3307-2

work page doi:10.1007/s11704-023-3307-2 2024
[48]

A survey on graph neural network acceleration,

H. Liu, X. Ma, Z. Zhang, H. Liet al., “A survey on graph neural network acceleration,”arXiv preprint arXiv:2306.14052, 2023. [Online]. Available: https://arxiv.org/abs/2306.14052

work page arXiv 2023
[49]

A survey on fpga-based accelerator for ml models,

F. Yan, A. Koch, and O. Sinnen, “A survey on fpga-based accelerator for ml models,” 2024. [Online]. Available: https://arxiv.org/abs/2412.15666

work page arXiv 2024
[50]

Acceleration algorithms in gnns: A survey,

L. Ma, Z. Sheng, X. Li, X. Gao, Z. Hao, L. Yang, W. Zhang, and B. Cui, “Acceleration algorithms in gnns: A survey,” 2024. [Online]. Available: https://arxiv.org/abs/2405.04114

work page arXiv 2024
[51]

Accelerating dynamic image graph construction on fpga for vision gnns,

A. Ramachandran, D. Parikh, and V . Prasanna, “Accelerating dynamic image graph construction on fpga for vision gnns,” arXiv preprint arXiv:2509.25121, 2025. [Online]. Available: https: //arxiv.org/abs/2509.25121

work page arXiv 2025