pith. machine review for the scientific record. sign in

arxiv: 2604.09240 · v1 · submitted 2026-04-10 · 💻 cs.LG

Recognition: no theorem link

DiffHLS: Differential Learning for High-Level Synthesis QoR Prediction with GNNs and LLM Code Embeddings

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3

classification 💻 cs.LG
keywords high-level synthesisQoR predictiondifferential learninggraph neural networksLLM code embeddingspragma optimizationPolyBench
0
0 comments X

The pith

DiffHLS predicts HLS quality-of-result by learning the delta from pragma changes on a kernel baseline rather than regressing absolute targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DiffHLS as a way to forecast how pragma insertions change the final hardware metrics after high-level synthesis from C/C++ code. It trains on paired examples consisting of an unchanged kernel and its pragma-modified variant. Separate graph neural network branches encode the intermediate-representation structures of each, while a pretrained code language model supplies embeddings for the change pathway. The model outputs a baseline prediction plus a design-induced delta that are added to form the final estimate. On PolyBench this yields lower average MAPE than plain GNN regressors across four backbones, with the language-model component providing consistent further gains, and the approach scales to the ForgeHLS collection.

Core claim

DiffHLS encodes kernel and design intermediate-representation graphs with dedicated GNN branches, augments the delta pathway with code embeddings from a pretrained LLM, and jointly predicts the kernel baseline and the design-induced delta whose sum gives the QoR estimate, attaining lower average MAPE than GNN baselines on PolyBench and showing scalability on ForgeHLS.

What carries the argument

The differential decomposition that predicts kernel baseline separately from pragma-induced delta using dual GNN branches plus LLM code embeddings for the change pathway.

If this is right

  • Lower average MAPE than GNN baselines under four different GNN backbones on PolyBench.
  • Consistent further accuracy gains from adding LLM code embeddings over a GNN-only version.
  • The same differential structure scales without retraining changes to the larger ForgeHLS dataset.
  • Design-space exploration can evaluate many more pragma choices without full synthesis runs for each.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could cut the number of expensive synthesis runs needed during pragma tuning by an order of magnitude if prediction error remains low on industrial workloads.
  • Similar baseline-plus-delta modeling might transfer to other incremental compilation or synthesis tasks where small code edits dominate the change.
  • Combining structural graph encoders with semantic LLM embeddings may become a general pattern for predicting optimization outcomes in code-to-hardware flows.

Load-bearing premise

HLS quality-of-result targets can be usefully split into an additive baseline plus delta learned from kernel-design pairs, with the selected benchmarks capturing typical pragma effects.

What would settle it

A new collection of HLS designs containing complex pragma interactions where DiffHLS fails to produce lower MAPE than the GNN-only baselines would show the differential structure does not generalize.

Figures

Figures reproduced from arXiv: 2604.09240 by Jieru Zhao, Qiang Xu, Zedong Peng, Zeju Li.

Figure 1
Figure 1. Figure 1: Overview of DIFFHLS. We encode kernel and design IR graphs, inject LLM code embeddings into the delta pathway, and predict the kernel baseline and delta; finally the design prediction is obtained by composition. void vector_add(int a[8], int b[8], int c[8]) { for (int i = 0; i < 8; i++) { c[i] = a[i] + b[i]; } } void vector_add(int a[8], int b[8], int c[8]) { #pragma HLS ARRAY_PARTITION variable=a factor=2… view at source ↗
Figure 2
Figure 2. Figure 2: Example kernel code (top) and design code (bottom) illustrating HLS [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average MAPE (%) across DSP, FF, LUT, and CP for the three method [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

High-Level Synthesis (HLS) compiles C/C++ into RTL, but exploring pragma-driven optimization choices remains expensive because each design point requires time-consuming synthesis. We propose \textbf{\DiffHLS}, a differential learning framework for HLS Quality-of-Result (QoR) prediction that learns from kernel--design pairs: a kernel baseline and a pragma-inserted design variant. \DiffHLS~encodes kernel and design intermediate-representation graphs with dedicated graph neural network (GNN) branches, and augments the delta pathway with code embeddings from a pretrained code large language model (LLM). Instead of regressing absolute targets directly, we jointly predict the kernel baseline and the design-induced delta, and compose them to obtain the design prediction. On PolyBench, \DiffHLS~attains lower average MAPE than GNN baselines under four GNN backbones, and LLM code embeddings consistently improve over a GNN-only ablation. We further validate scalability on the ForgeHLS dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DiffHLS, a differential learning framework for HLS QoR prediction that encodes kernel baselines and pragma-inserted design variants as IR graphs using dedicated GNN branches, augments the delta pathway with pretrained LLM code embeddings, and obtains the final design prediction by additively composing the predicted baseline and delta. It reports lower average MAPE than GNN baselines across four backbones on PolyBench, with LLM embeddings providing consistent gains over a GNN-only ablation, and further validates the approach on the ForgeHLS dataset.

Significance. If the empirical gains hold under proper controls, the work could accelerate HLS design-space exploration by replacing expensive synthesis runs with fast, accurate QoR estimates. The combination of differential structure and LLM embeddings is a timely contribution to ML-assisted hardware design, provided the additive decomposition proves necessary rather than incidental.

major comments (2)
  1. [Abstract and §4] Abstract and experimental evaluation: the central claim of MAPE improvement over GNN baselines is presented without any description of data splits, hyperparameter search, number of random seeds, statistical significance tests, or error bars. These omissions are load-bearing because the reported gains cannot be assessed for reliability or reproducibility.
  2. [Method (differential decomposition)] Method section on differential decomposition: the additive form QoR(kernel, design) ≈ f(kernel) + g(design) is the core inductive bias, yet no ablation isolates its contribution from the extra capacity of the dual-branch architecture or the LLM embeddings. Without targeted tests on non-additive pragma interactions (e.g., unrolling factors with loop-carried dependencies), it remains unclear whether the differential structure itself drives the improvement or merely enables richer modeling.
minor comments (2)
  1. [Results figures] Results figures should include error bars or confidence intervals on the MAPE bars to allow visual assessment of the claimed improvements.
  2. [§3.2] Clarify the exact integration mechanism of the LLM embeddings into the delta GNN branch (e.g., concatenation, attention, or learned weighting) with a diagram or equation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional experimental details and targeted ablations are needed to strengthen the claims. We address each major comment below and will incorporate revisions to improve reproducibility and isolate the contribution of the differential structure.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and experimental evaluation: the central claim of MAPE improvement over GNN baselines is presented without any description of data splits, hyperparameter search, number of random seeds, statistical significance tests, or error bars. These omissions are load-bearing because the reported gains cannot be assessed for reliability or reproducibility.

    Authors: We agree that the manuscript as submitted lacks these critical details, which limits assessment of the results. In the revised version we will expand Section 4 with a dedicated experimental setup subsection that explicitly describes the data splitting procedure (ensuring no kernel leakage between train and test), the hyperparameter search method and ranges, the number of random seeds, the use of statistical significance tests, and the inclusion of error bars on all figures and tables. These additions will allow readers to evaluate the reliability of the reported MAPE gains. revision: yes

  2. Referee: [Method (differential decomposition)] Method section on differential decomposition: the additive form QoR(kernel, design) ≈ f(kernel) + g(design) is the core inductive bias, yet no ablation isolates its contribution from the extra capacity of the dual-branch architecture or the LLM embeddings. Without targeted tests on non-additive pragma interactions (e.g., unrolling factors with loop-carried dependencies), it remains unclear whether the differential structure itself drives the improvement or merely enables richer modeling.

    Authors: The referee is correct that the current manuscript does not contain an ablation that isolates the additive decomposition from the added capacity of the dual-branch design or the LLM embeddings. In the revision we will add a controlled ablation comparing DiffHLS against a non-differential dual-branch GNN baseline with matched total parameter count. We will also include targeted experiments on PolyBench kernels that exhibit known non-additive pragma interactions (such as unrolling combined with pipelining in loops with carried dependencies) to test whether the additive inductive bias provides benefit beyond richer modeling. These results will clarify whether the differential structure is necessary or incidental. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard supervised ML predictor on external ground truth

full rationale

The paper presents DiffHLS as a supervised regression model trained directly against synthesis tool outputs on PolyBench and ForgeHLS. The kernel baseline plus delta decomposition is an explicit architectural choice whose parameters are learned from labeled data, not a self-definition or fitted input renamed as a prediction. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are referenced in the derivation. The central claim (lower MAPE than GNN baselines) is evaluated against held-out external benchmarks and therefore remains falsifiable outside the model's own fitted values.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on standard supervised learning assumptions plus the domain choice to model QoR as baseline plus additive delta; no new physical entities are postulated.

free parameters (2)
  • GNN architecture and training hyperparameters
    Choice of GNN backbone, layer sizes, learning rate, and regularization are tuned on the training data.
  • LLM embedding model and integration weights
    Pretrained code LLM is selected and its embeddings are fused via learned parameters.
axioms (1)
  • domain assumption QoR metrics admit an additive decomposition into kernel baseline and pragma-induced delta
    The differential pathway is built on this modeling choice stated in the abstract.

pith-pipeline@v0.9.0 · 5484 in / 1362 out tokens · 37114 ms · 2026-05-10T17:54:20.528281+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    High-level synthesis performance prediction using graph neural networks: Benchmarking, modeling, and advancing,

    N. Wu, H. Yang, Y . Xie, P. Li, and C. Hao, “High-level synthesis performance prediction using graph neural networks: Benchmarking, modeling, and advancing,” inProceedings of the ACM/IEEE Design Automation Conference (DAC), 2022

  2. [2]

    Hierarchical source-to-post- route qor prediction in high-level synthesis with gnns,

    M. Gao, J. Zhao, Z. Lin, and M. Guo, “Hierarchical source-to-post- route qor prediction in high-level synthesis with gnns,” in2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2024, pp. 1–6

  3. [3]

    Polybench/c 4.2,

    L.-N. Pouchet and T. Yuki, “Polybench/c 4.2,” https://sourceforge.net/ projects/polybench/files/polybench-c-4.2/, 2016, accessed: 2025-11-20

  4. [4]

    Db4hls: A database of high-level synthesis design space explorations,

    L. Ferretti, J. Kwon, G. Ansaloni, G. Di Guglielmo, L. Carloni, and L. Pozzi, “Db4hls: A database of high-level synthesis design space explorations,”IEEE Embedded Systems Letters, vol. 13, no. 4, pp. 194– 197, 2021

  5. [5]

    Towards a comprehensive benchmark for high-level synthesis targeted to FPGAs,

    Y . Bai, A. Sohrabizadeh, Z. Qin, Z. Hu, Y . Sun, and J. Cong, “Towards a comprehensive benchmark for high-level synthesis targeted to FPGAs,” Advances in Neural Information Processing Systems, vol. 36, pp. 45 288– 45 299, 2023

  6. [6]

    arXiv preprint arXiv:2507.03255abs/2507.03255 (2025), 1–15

    Z. Peng, Z. Li, M. Gao, Q. Xu, C. Zhang, and J. Zhao, “ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis,” Aug. 2025, arXiv:2507.03255 [cs]. [Online]. Available: http://arxiv.org/abs/2507.03255

  7. [7]

    wa-hls4ml: A benchmark and surrogate models for hls4ml resource and latency estimation,

    B. Hawks, J. Weitz, D. Demler, K. Tame-Narvaez, D. Plotnikov, M. M. Rahimifar, H. E. Rahali, A. C. Therrien, D. Sproule, E. E. Khodaet al., “wa-hls4ml: A benchmark and surrogate models for hls4ml resource and latency estimation,”arXiv preprint arXiv:2511.05615, 2025

  8. [8]

    Design space exploration of fpga-based accelerators with multi-level parallelism,

    G. Zhong, A. Prakash, S. Wang, Y . Liang, T. Mitra, and S. Niar, “Design space exploration of fpga-based accelerators with multi-level parallelism,” inDesign, Automation & Test in Europe Conference & Exhibition (DATE),

  9. [9]

    1141–1146

    IEEE, 2017, pp. 1141–1146

  10. [10]

    Fast and accurate estimation of quality of results in high-level synthesis with machine learning,

    S. Dai, Y . Zhou, H. Zhang, E. Ustun, E. F. Young, and Z. Zhang, “Fast and accurate estimation of quality of results in high-level synthesis with machine learning,” in2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2018, pp. 129–132

  11. [11]

    Comba: A comprehensive model-based analysis framework for high level synthesis of real applications,

    J. Zhao, L. Feng, S. Sinha, W. Zhang, Y . Liang, and B. He, “Comba: A comprehensive model-based analysis framework for high level synthesis of real applications,” in2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2017, pp. 430–437

  12. [12]

    Automated accelerator optimization aided by graph neural networks,

    A. Sohrabizadeh, Y . Bai, Y . Sun, and J. Cong, “Automated accelerator optimization aided by graph neural networks,” inProceedings of the 59th ACM/IEEE Design Automation Conference, 2022, pp. 55–60

  13. [13]

    Powergear: Early-stage power estimation in fpga hls via heterogeneous edge-centric gnns,

    Z. Lin, Z. Yuan, J. Zhao, W. Zhang, H. Wang, and Y . Tian, “Powergear: Early-stage power estimation in fpga hls via heterogeneous edge-centric gnns,” inProcs. of Design, Automation and Test in Europe Conference and Exhibition (DATE), 2022

  14. [14]

    Hl-pow: A learning-based power modeling framework for high-level synthesis,

    Z. Lin, J. Zhao, S. Sinha, and W. Zhang, “Hl-pow: A learning-based power modeling framework for high-level synthesis,” in2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), 2020, pp. 574–580

  15. [15]

    Predicting post-route quality of results estimates for HLS designs using machine learning,

    P. Goswami and D. Bhatia, “Predicting post-route quality of results estimates for HLS designs using machine learning,” in2022 23rd International Symposium on Quality Electronic Design (ISQED). IEEE, 2022, pp. 45–50

  16. [16]

    A graph neural network model for fast and accurate quality of result estimation for high-level synthesis,

    M. U. Jamal, Z. Li, M. T. Lazarescu, and L. Lavagno, “A graph neural network model for fast and accurate quality of result estimation for high-level synthesis,”IEEE Access, 2023

  17. [17]

    Robust GNN-based representation learning for HLS,

    A. Sohrabizadeh, Y . Bai, Y . Sun, and J. Cong, “Robust GNN-based representation learning for HLS,” in2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–9

  18. [18]

    Balor: Hls source code evaluator based on custom graphs and hierarchical gnns,

    E. Murphy and L. Josipovi ´c, “Balor: Hls source code evaluator based on custom graphs and hierarchical gnns,” inProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024, pp. 1–9

  19. [19]

    Hippo: A hierarchy- preserving and noise-tolerant pre-hls power modeling framework for fpga,

    Z. Lin, Z. Peng, M. Gao, J. Zhao, and Z. Lin, “Hippo: A hierarchy- preserving and noise-tolerant pre-hls power modeling framework for fpga,” inProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2025

  20. [20]

    Cross-modality program representation learning for electronic design automation with high-level synthesis,

    Z. Qin, Y . Bai, A. Sohrabizadeh, Z. Ding, Z. Hu, Y . Sun, and J. Cong, “Cross-modality program representation learning for electronic design automation with high-level synthesis,”arXiv preprint arXiv:2406.09606, 2024

  21. [21]

    Hierarchical mixture of experts: Generalizable learning for high- level synthesis,

    W. Li, D. Wang, Z. Ding, A. Sohrabizadeh, Z. Qin, J. Cong, and Y . Sun, “Hierarchical mixture of experts: Generalizable learning for high- level synthesis,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 17, 2025, pp. 18 476–18 484

  22. [22]

    CodeBERT: A Pre-Trained Model for Programming and Natural Languages

    Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, and D. Jiang, “Codebert: A pre-trained model for programming and natural languages,”arXiv preprint arXiv:2002.08155, 2020

  23. [23]

    arXiv preprint arXiv:2009.08366 , year=

    D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fuet al., “Graphcodebert: Pre-training code representations with data flow,”arXiv preprint arXiv:2009.08366, 2020

  24. [24]

    Adapting decoder-based language models for diverse encoder downstream tasks,

    P. Suganthan, F. Moiseev, L. Yanet al., “Adapting decoder-based language models for diverse encoder downstream tasks,” 2025, arXiv preprint

  25. [25]

    Decoding-based regression,

    X. Song and D. Bahri, “Decoding-based regression,” 2025, arXiv preprint