arxiv: 2605.05920 · v1 · submitted 2026-05-07 · 💻 cs.AR · cs.AI· cs.PF

Recognition: unknown

LLM-Driven Design Space Exploration of FPGA-based Accelerators

Vinamra Sharma , Xingjian Fu , Jude Haris , Jos\'e Cano

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:31 UTC · model grok-4.3

classification 💻 cs.AR cs.AIcs.PF

keywords FPGA acceleratorsdesign space explorationlarge language modelshardware-software co-designAI workloadshigh-level synthesisautomation

0 comments

The pith

SECDA-DSE integrates large language models to automate design space exploration for FPGA-based AI accelerators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that LLMs can be embedded into an existing hardware design flow to generate valid FPGA accelerator configurations for AI tasks with far less manual tuning. A sympathetic reader would care because the design space of architectures, dataflows, and memory systems is enormous, and current methods still demand significant expert time even when simulation tools exist. The proposed framework adds a structured explorer that proposes configurations and an LLM Stack that applies retrieval-augmented generation plus chain-of-thought prompting to guide the search, closed by a feedback loop for iterative refinement. Feasibility is shown by taking one generated design through high-level synthesis and confirming it satisfies timing and resource limits on a Zynq-7000 device.

Core claim

SECDA-DSE combines a DSE Explorer for producing candidate accelerator configurations with an LLM Stack that performs reasoning-guided exploration via retrieval-augmented generation and chain-of-thought prompting, together with a feedback loop for reinforced fine-tuning, and shows that the resulting designs can pass high-level synthesis timing and resource checks on a Zynq-7000 FPGA.

What carries the argument

The LLM Stack that performs reasoning-guided exploration using retrieval-augmented generation and chain-of-thought prompting to propose and refine accelerator configurations.

If this is right

Design of FPGA accelerators for AI workloads requires substantially less repeated manual iteration and expert intervention.
A feedback loop allows the exploration process to improve over successive runs without restarting from scratch.
Valid configurations can be produced that already satisfy synthesis constraints before full implementation.
The same LLM-driven loop can be applied inside existing hardware-software co-design environments such as SECDA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be retargeted to other reconfigurable or fixed-function hardware platforms where similar large parameter spaces exist.
LLMs might surface accelerator organizations that standard heuristic or exhaustive searches overlook because they encode patterns from training data.
Lowering the expertise barrier could let more software teams prototype custom accelerators for edge or embedded AI without hiring hardware specialists.

Load-bearing premise

The LLM Stack can reliably generate valid and useful accelerator configurations that meaningfully reduce manual effort and domain expertise.

What would settle it

A test in which the large majority of LLM-generated configurations fail to meet timing or resource constraints when run through high-level synthesis on the target FPGA would show the automation does not yet deliver usable designs.

Figures

Figures reproduced from arXiv: 2605.05920 by Jos\'e Cano, Jude Haris, Vinamra Sharma, Xingjian Fu.

**Figure 1.** Figure 1: SECDA-DSE framework architecture showing the interaction between the DSE Explorer and the LLM Stack. data movement costs. These metrics are collected as performance data and stored in a cost model database, which is then used to guide subsequent exploration iterations through refinement and fine-tuning. By incorporating this feedbackdriven loop, SECDA-DSE enables iterative improvement of accelerator conf… view at source ↗

**Figure 2.** Figure 2: Workflow of the DSE Explorer. objective is to navigate the large hardware design space associated with FPGA-based AI accelerators by combining structured exploration with feedback from evaluation results. The DSE explorer operates based on three primary inputs: the target AI workload, the target FPGA device, and a set of architectural directives that constrain or guide the exploration process. Using this … view at source ↗

**Figure 4.** Figure 4: Block diagram showing how Chain-of-Thought (CoT) component is used for improving prompt structure view at source ↗

read the original abstract

Designing field-programmable gate array (FPGA)-based accelerators for modern artificial intelligence workloads requires navigating a large and complex hardware design space encompassing architectural parameters, dataflow strategies, and memory hierarchies, making the process time-consuming and resource-intensive. While the SECDA methodology enables rapid hardware-software co-design of accelerators through SystemC simulation and FPGA execution, identifying optimal accelerator configurations still requires substantial manual effort and domain expertise. This work presents SECDA-DSE, a framework that integrates Large Language Models (LLMs) into the SECDA ecosystem, comprising tools built around SECDA to automate the design space exploration (DSE) of FPGA-based accelerators. SECDA-DSE combines a structured DSE Explorer for generating accelerator configurations with an LLM Stack that performs reasoning-guided exploration using retrieval-augmented generation and chain-of-thought prompting, alongside a feedback loop that enables reinforced fine-tuning for continuous improvement. We demonstrate the feasibility of SECDA-DSE through an initial high-level synthesis based evaluation of a generated accelerator design that meets synthesis timing and resource constraints on an Zynq-7000 FPGA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SECDA-DSE adds an LLM stack with RAG and CoT to automate FPGA DSE on top of prior SECDA work, but the only evidence is one successful HLS run.

read the letter

The paper's main move is to wrap the existing SECDA co-design flow with an LLM-driven DSE Explorer. The LLM Stack uses retrieval-augmented generation and chain-of-thought prompting to generate accelerator configurations, plus a feedback loop for reinforced fine-tuning. This is a direct extension of their earlier SECDA methodology into more automated exploration of architectural parameters, dataflow, and memory hierarchies for AI accelerators on FPGAs like the Zynq-7000.

Referee Report

2 major / 2 minor

Summary. The paper introduces SECDA-DSE, a framework extending the SECDA methodology by integrating Large Language Models (LLMs) to automate design space exploration (DSE) for FPGA-based accelerators targeting AI workloads. It comprises a structured DSE Explorer for configuration generation, an LLM Stack employing retrieval-augmented generation and chain-of-thought prompting for reasoning-guided search, and a feedback loop for reinforced fine-tuning. Feasibility is demonstrated through a single high-level synthesis (HLS) evaluation of one LLM-generated accelerator design that satisfies timing and resource constraints on a Zynq-7000 FPGA.

Significance. If the LLM-based components can be shown to consistently produce high-quality, workload-correct accelerator configurations with substantially less manual intervention than traditional methods, the work could meaningfully advance automated hardware-software co-design flows for FPGAs. The extension of the established SECDA ecosystem with modern LLM techniques is a timely idea, though the current evidence base is limited to an initial, single-instance synthesis success without supporting metrics or baselines.

major comments (2)

Abstract: The central feasibility claim rests on a single initial HLS evaluation of one generated design that meets basic timing and resource constraints. No quantitative metrics (e.g., achieved performance, resource utilization beyond constraint satisfaction, or design-space coverage), multiple tested configurations, success/failure rates across iterations, or comparisons against manual DSE or non-LLM baselines are reported. This single data point is insufficient to substantiate the claim that the LLM Stack with RAG/CoT reliably generates valid and useful accelerator configurations that reduce manual effort.
Framework description (LLM Stack and feedback loop): The manuscript describes the use of RAG and chain-of-thought prompting together with a feedback loop for reinforced fine-tuning, yet provides no concrete details on prompt templates, retrieved knowledge base contents, how feedback is converted into training signals, or any ablation showing the contribution of these LLM techniques versus simpler generation methods. Without such specifics, it is difficult to evaluate whether the approach is load-bearing for the reported outcome or merely incidental to the single successful synthesis.

minor comments (2)

Abstract: The construction 'an Zynq-7000 FPGA' is grammatically incorrect and should read 'a Zynq-7000 FPGA'.
Abstract and introduction: The abstract states that SECDA-DSE 'comprises tools built around SECDA' but does not enumerate or reference the specific tools, their interfaces, or how they interact with the LLM Stack; this omission reduces clarity for readers unfamiliar with the prior SECDA codebase.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We acknowledge the preliminary nature of the evaluation and will revise the paper to provide greater detail on the LLM components while clarifying the scope of our feasibility claims.

read point-by-point responses

Referee: Abstract: The central feasibility claim rests on a single initial HLS evaluation of one generated design that meets basic timing and resource constraints. No quantitative metrics (e.g., achieved performance, resource utilization beyond constraint satisfaction, or design-space coverage), multiple tested configurations, success/failure rates across iterations, or comparisons against manual DSE or non-LLM baselines are reported. This single data point is insufficient to substantiate the claim that the LLM Stack with RAG/CoT reliably generates valid and useful accelerator configurations that reduce manual effort.

Authors: We appreciate the referee's point. The manuscript presents the single successful HLS synthesis explicitly as an initial feasibility demonstration of the SECDA-DSE framework rather than evidence of reliability, consistency, or broad reduction in manual effort. To address the concern, we will revise the abstract and results section to qualify the claims more precisely as a proof-of-concept, include the concrete resource utilization figures and timing results from the synthesis report, and add a discussion of how the framework is intended to lower manual intervention. We do not have additional configurations or baselines in the current work, so we will also outline planned future comparative evaluations. revision: partial
Referee: Framework description (LLM Stack and feedback loop): The manuscript describes the use of RAG and chain-of-thought prompting together with a feedback loop for reinforced fine-tuning, yet provides no concrete details on prompt templates, retrieved knowledge base contents, how feedback is converted into training signals, or any ablation showing the contribution of these LLM techniques versus simpler generation methods. Without such specifics, it is difficult to evaluate whether the approach is load-bearing for the reported outcome or merely incidental to the single successful synthesis.

Authors: We agree that additional implementation details are required. In the revised manuscript we will add an appendix containing the prompt templates, a description of the RAG knowledge base contents (SECDA methodology documents, HLS guidelines, and AI accelerator design patterns), and a step-by-step explanation of the feedback loop including how successful designs generate signals for reinforced fine-tuning. We will also include a qualitative discussion of the contribution of RAG and CoT to the generated design. Full quantitative ablations are beyond the scope of this initial feasibility study and will be noted as future work. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior SECDA methodology; central feasibility claim remains independent

full rationale

The paper presents SECDA-DSE as an extension that adds an LLM Stack (RAG + CoT prompting + feedback loop) to the existing SECDA ecosystem for automating DSE. The abstract and framework description reference the prior SECDA methodology for rapid co-design via SystemC simulation and FPGA execution, but this is background context rather than a load-bearing premise that reduces the new claim to a self-referential definition or fitted input. The strongest claim is an initial HLS-based check that one LLM-generated design meets timing and resource constraints on Zynq-7000; this is a direct empirical demonstration and does not collapse by construction to any parameter fit or prior result within the paper. No equations, uniqueness theorems, or ansatzes are invoked that would create circularity. The single-example validation is statistically limited but does not constitute circular reasoning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the unverified capability of LLMs to perform effective hardware design reasoning; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Large language models can perform effective reasoning-guided exploration of hardware design spaces using retrieval-augmented generation and chain-of-thought prompting
This assumption enables the LLM Stack to replace manual effort in generating valid accelerator configurations.

pith-pipeline@v0.9.0 · 5494 in / 1341 out tokens · 27119 ms · 2026-05-08T04:31:03.666377+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 3 canonical work pages

[1]

Andrew Boutros, Aman Arora, and Vaughn Betz. 2025. Field- Programmable Gate Array Architecture for Deep Learning: Survey and Future Directions.Proc. IEEE(2025)

2025
[2]

2025.Rapid Prototyping of Edge AI Accelerators: An HLS-based Approach for CNNs on FPGAs for the AIdge ML Deployment Framework

Jacopo Cesaretti. 2025.Rapid Prototyping of Edge AI Accelerators: An HLS-based Approach for CNNs on FPGAs for the AIdge ML Deployment Framework. Ph. D. Dissertation. Politecnico di Torino

2025
[3]

Pudi Dhilleswararao, Srinivas Boppu, M Sabarimalai Manikandan, and Linga Reddy Cenkeramaddi. 2022. Efficient hardware architectures for accelerating deep neural networks: Survey.IEEE access10 (2022), 131788–131828

2022
[4]

Perry Gibson, Jose Cano, Elliot Crowley, Amos Storkey, and Michael O’boyle. 2025. DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration.ACM Trans. Archit. Code Optim.(2025)

2025
[5]

Jude Haris, Perry Gibson, José Cano, Nicolas Bohm Agostini, and David Kaeli. 2021. SECDA: Efficient hardware/software co-design of FPGA-based DNN accelerators for edge inference. In2021 IEEE 33rd In- ternational Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 33–43

2021
[6]

Jude Haris, Perry Gibson, José Cano, Nicolas Bohm Agostini, and David Kaeli. 2023. SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference.J. Parallel and Distrib. Comput.173 (2023), 140–151

2023
[7]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3

2022
[8]

Runkai Li, Jia Xiong, and Xi Wang. 2025. iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs.arXiv preprint arXiv:2505.22086(2025)

work page arXiv 2025
[9]

Chen Ling, Xujiang Zhao, Jiaying Lu, Chengyuan Deng, Can Zheng, Junxiang Wang, Tanmoy Chowdhury, Yun Li, Hejie Cui, Xuchao Zhang, et al. 2025. Domain specialization as the key to make large language models disruptive: A comprehensive survey.Comput. Surveys58, 3 (2025), 1–39

2025
[10]

Ollama. 2024. Ollama.https://ollama.com. Software for running LLMs locally

2024
[11]

Gaurav Tiwari, Sangeeta Nakhate, Alok Pathak, Abhinandan Jain, and Shardul Penurkar. 2025. Hardware accelerators for deep learning ap- plications. In2025 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS). IEEE, 1–10

2025
[12]

Deepak Vungarala, Md Hasibul Amin, Pietro Mercati, Arnob Ghosh, Arman Roohi, Ramtin Zand, and Shaahin Angizi. 2025. Limca: Llm for automating analog in-memory computing architecture design explo- ration.arXiv preprint arXiv:2503.13301(2025)

work page arXiv 2025
[13]

Xilinx, Inc. 2019. Vivado High-Level Synthesis

2019
[14]

Tao Zhang, Rui Ma, Shuotao Xu, Peng Cheng, and Yongqiang Xiong
[15]

arXiv:2603.05904 [cs.AR]https://arxiv.org/abs/2603

LUMINA: LLM-Guided GPU Architecture Exploration via Bot- tleneck Analysis. arXiv:2603.05904 [cs.AR]https://arxiv.org/abs/2603. 05904

work page arXiv
[16]

Tianshi Zheng, Zheye Deng, Hong Ting Tsang, Weiqi Wang, Jiaxin Bai, Zihao Wang, and Yangqiu Song. 2025. From automation to au- tonomy: A survey on large language models in scientific discovery. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 17744–17761. Appendix The following is the initial prompt provided to the...

2025