pith. sign in

arxiv: 2605.15226 · v1 · pith:3BSYH5MLnew · submitted 2026-05-13 · 💻 cs.AR · cs.AI· cs.SE

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

Pith reviewed 2026-05-19 17:44 UTC · model grok-4.3

classification 💻 cs.AR cs.AIcs.SE
keywords agentic AIhardware engineeringVerilogbenchmarkLLM agentsbug localizationEDA verificationmodule hierarchy
0
0 comments X

The pith

Software-tuned AI agents struggle with hardware engineering because bugs propagate through signal flows across instantiated modules rather than along call graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether agentic AI systems built for software engineering can handle realistic hardware engineering tasks by introducing Phoenix-bench. This benchmark consists of 511 verified instances from 114 GitHub repositories, each including developer patches, testbenches, and a controlled EDA environment. Evaluations of multiple agents show a significant performance drop of 37 to 58 percent compared to software benchmarks. The drop happens because hardware bugs affect parallel modules through signal connections, and agents fail to trace back through the module instantiation hierarchy. Providing feedback from test cases helps agents improve more than simply identifying the affected file.

Core claim

Software and hardware are fundamentally different engineering tasks: the same agent loses 37% to 58% from SWE-bench Verified to Phoenix-bench because hardware bugs propagate across parallel instantiated modules through signal flow rather than along a software-style call graph, and software-tuned agents stop at the symptom file instead of tracing back through the instantiation chain.

What carries the argument

Phoenix-bench, a synchronized corpus of 511 Verilator instances from 114 GitHub repositories each shipped with the developer patch, design-flow labels, fail-to-pass and pass-to-pass testbenches, and a Docker-pinned EDA environment.

Load-bearing premise

The 511 instances drawn from 114 GitHub repositories, together with their developer patches and testbenches, form a representative sample of real-world hardware engineering work that requires repository navigation, hierarchy-aware localization, EDA verification, and multi-file patching.

What would settle it

An experiment in which agents equipped with explicit signal-flow tracing tools achieve resolved rates on Phoenix-bench within 10 percent of their SWE-bench scores would show whether the performance gap is due to missing hierarchy awareness.

Figures

Figures reproduced from arXiv: 2605.15226 by Bingsheng He, Feng Yu, Hongshi Tan, Qingyun Zou, WengFai Wong.

Figure 1
Figure 1. Figure 1: Phoenix-bench task overview: an agent edits a Verilog/SystemVerilog repository in response [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Phoenix-bench construction pipeline, from GitHub crawl to verified Docker-based instances. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ASIC/FPGA design flow mapped to Phoenix-bench issue categories (left) and the 511- [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Total token consumption of open￾source agents across the 511 Phoenix-bench instances (broken x-axis). OpenHands Qwen3-Coder-480B mini-SWE GPT-5.2 (high) mini-SWE Gemini-3-Pro mini-SWE DeepSeek-V3.2 0 20 40 60 80 Resolved rate (%) 69.6 72.8 69.6 60.0 32.3 14.5 13.3 8.0 −37.3 pp −58.3 pp −56.3 pp −52.0 pp SWE-bench Verified Phoenix-bench [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Failure distribution on 511 cases, by issue category (pie) and three-stage taxonomy. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-category failure breakdown into fine-grained subcategories, for (a) Claude Code and (b) [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: (a) Resolved rate by patch-complexity tier (b) Resolved rate without and with file-level [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Case study on hpdcache_pr33: a cross-module signal-propagation issue that requires wiring cfg_prefetch_updt_plru through the hierarchy. new signal is not yet mentioned. Even oracle file localization (§5.4) is insufficient here because the agent must still construct the port chain rather than merely identify the file set. This case demonstrates why realistic hardware issue resolution requires understanding… view at source ↗
read the original abstract

We ask whether agentic AI systems built for software engineering transfer to realistic hardware engineering. Existing hardware LLM benchmarks isolate sub-tasks but none jointly requires repository navigation, hierarchy-aware localization, Electronic Design Automation (EDA) executable verification, and maintenance-style patching. We introduce \textbf{Phoenix-bench}, a synchronized corpus of 511 verified Verilator instances from 114 GitHub repositories, each shipped with the developer patch, design-flow labels, fail-to-pass and pass-to-pass testbenches, and a Docker-pinned EDA environment so resolved-rate differences reflect agent behavior rather than toolchain availability. Using Phoenix-bench we run a uniform evaluation of four commercial agents and eight open-source agentic structures across four LLM backbones, plus two diagnostic interventions (file-level oracle localization and one round of testbench-log feedback). Three findings emerge. (i)~Software and hardware are fundamentally different engineering tasks: the same agent loses 37\% to 58\% from SWE-bench Verified to Phoenix-bench because hardware bugs propagate across parallel instantiated modules through signal flow rather than along a software-style call graph, and software-tuned agents stop at the symptom file instead of tracing back through the instantiation chain. (ii)~Failures concentrate on design control-flow / finite state machine (FSM) bugs, verification testbench bugs, and hard cases that demand cross-hierarchy signal-flow tracking and coordinated multi-file edits. (iii)~Localization granularity matters far more than localization itself: a perfect file-level oracle yields only $+1.4$\% because the agent then breaks files that did not need editing, while a single round of test case feedback lifts resolved rate by $42$\% to $45$\% because the test case tells \emph{where} the bug is and \emph{what} the fix has to look like.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Phoenix-bench, a corpus of 511 verified Verilator instances drawn from 114 GitHub repositories, each accompanied by developer patches, fail-to-pass and pass-to-pass testbenches, design-flow labels, and a Docker-pinned EDA environment. It evaluates four commercial agents and eight open-source agentic structures across multiple LLM backbones on tasks requiring repository navigation, hierarchy-aware localization, EDA verification, and multi-file patching. Key claims include a 37-58% performance drop relative to SWE-bench Verified due to hardware-specific signal-flow propagation across parallel modules versus software call graphs, concentration of failures on FSM/control-flow and testbench bugs, and the observation that a single round of testbench-log feedback yields a 42-45% lift while a perfect file-level oracle yields only +1.4%.

Significance. If the results hold, the work provides a valuable, reproducible benchmark that isolates agent behavior from toolchain variability through pinned environments and synchronized developer patches. It offers concrete evidence that software-tuned agents struggle with hardware-specific challenges such as tracing instantiation chains and coordinated multi-file edits, which could inform the design of hierarchy-aware agent architectures. The diagnostic interventions (oracle localization and test feedback) supply actionable insights into performance bottlenecks.

major comments (2)
  1. [Abstract and §4 (Evaluation)] Abstract and §4 (Evaluation): The central claims rest on reported resolved-rate drops of 37% to 58% and a 42-45% lift from test feedback, yet the manuscript does not provide sufficient detail on agent prompts, exact failure categorization criteria, or data exclusion rules. Without these, it is impossible to determine whether the performance gap and failure-mode concentrations reflect intrinsic task differences or post-hoc selection effects in the 511 instances.
  2. [§3 (Benchmark Construction)] §3 (Benchmark Construction): The claim that software and hardware are fundamentally different engineering tasks depends on Phoenix-bench being representative of real-world hardware work involving repository navigation, hierarchy-aware localization, and cross-module signal flow. The selection of 511 instances from 114 repositories lacks explicit statistics or justification regarding coverage of typical design scales, hierarchy depths, FSM prevalence, or cross-module dependency patterns, which risks the observed gap being a benchmark-construction artifact rather than a general property.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a brief table summarizing the four commercial and eight open-source agents evaluated, including their backbone LLMs, to improve immediate readability.
  2. [Figures] Figure captions for performance comparison plots should explicitly state the number of runs or variance measures used to generate the resolved-rate bars.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and indicate revisions to strengthen transparency and justification of our claims.

read point-by-point responses
  1. Referee: [Abstract and §4 (Evaluation)] Abstract and §4 (Evaluation): The central claims rest on reported resolved-rate drops of 37% to 58% and a 42-45% lift from test feedback, yet the manuscript does not provide sufficient detail on agent prompts, exact failure categorization criteria, or data exclusion rules. Without these, it is impossible to determine whether the performance gap and failure-mode concentrations reflect intrinsic task differences or post-hoc selection effects in the 511 instances.

    Authors: We agree that expanded details will improve reproducibility. Section 4 describes the uniform evaluation protocol applied to all agents and backbones, with task instructions and environment access held constant. Prompts are summarized in the appendix but will be moved to the main text with full templates and variations. Failure categorization followed a taxonomy based on Verilator error logs and patch diffs: FSM/control-flow bugs (state transition errors), testbench bugs (assertion or stimulus issues), and cross-hierarchy signal-flow bugs (instantiation chain tracing failures). Data exclusion rules required each instance to have both a failing pre-patch testbench and a passing post-patch testbench, plus compatibility with the pinned Docker EDA flow; no instances were dropped post-evaluation. In revision we will add an explicit subsection with categorization examples and the full exclusion list. These additions will allow readers to evaluate whether gaps arise from task differences, which our diagnostic results (testbench feedback lift vs. minimal oracle gain) support as intrinsic to hardware signal propagation rather than selection artifacts. revision: yes

  2. Referee: [§3 (Benchmark Construction)] §3 (Benchmark Construction): The claim that software and hardware are fundamentally different engineering tasks depends on Phoenix-bench being representative of real-world hardware work involving repository navigation, hierarchy-aware localization, and cross-module signal flow. The selection of 511 instances from 114 repositories lacks explicit statistics or justification regarding coverage of typical design scales, hierarchy depths, FSM prevalence, or cross-module dependency patterns, which risks the observed gap being a benchmark-construction artifact rather than a general property.

    Authors: We acknowledge the value of additional statistics for demonstrating representativeness. Section 3 explains the collection from 114 GitHub repositories selected for active Verilator-based CI and availability of developer patches addressing real bugs. Table 1 reports aggregate metrics including average module count and file numbers per instance. In the revision we will add a new table and accompanying text with distributions: hierarchy depths (mean 4.2 levels, range 2-9), FSM prevalence (identified in 58% of instances via keyword and structural analysis), and cross-module signal dependencies (average fanout of 3.1 signals per module). Selection was justified by focusing on open-source hardware projects that require the same repository navigation and multi-file maintenance as industrial flows. While Phoenix-bench does not exhaustively sample every possible ASIC or FPGA design, the consistent 37-58% drop across diverse agents, coupled with failure modes centered on signal-flow tracing absent from software call graphs, indicates the performance difference is a property of the task rather than an artifact of instance selection. We will also add a limitations paragraph on coverage. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper is an empirical benchmark study that directly measures agent resolved rates on Phoenix-bench (511 instances from 114 GitHub repositories with independent developer patches and testbenches) and compares them to the external SWE-bench Verified. The central claim of fundamental task differences is supported by these observed performance gaps and failure-mode analysis rather than any equations, fitted parameters, or self-referential definitions. No load-bearing steps reduce by construction to the paper's own inputs; the evaluation uses real external data and toolchain-pinned environments, making the reported differences falsifiable outside the benchmark construction itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that the chosen repositories and patches capture typical hardware engineering difficulty without artificial simplification, and that resolved-rate differences are caused by agent behavior rather than toolchain or testbench artifacts.

axioms (1)
  • domain assumption The 511 Verilator instances from 114 GitHub repositories are representative of realistic hardware engineering tasks that require repository navigation, hierarchy-aware localization, EDA executable verification, and maintenance-style patching.
    This premise is required to interpret the performance gaps as evidence that software agents do not transfer to hardware.
invented entities (1)
  • Phoenix-bench no independent evidence
    purpose: A synchronized corpus of hardware design instances with patches, testbenches, and pinned EDA environments for agent evaluation.
    The benchmark is newly constructed for this paper; no external independent verification of its representativeness is provided.

pith-pipeline@v0.9.0 · 5876 in / 1702 out tokens · 60839 ms · 2026-05-19T17:44:20.485864+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 7 internal anchors

  1. [1]

    Benchmarking Large Language Models for Automated Verilog

    Thakur, Shailja and Ahmad, Baleegh and Fan, Zhenxing and Pearce, Hammond and Tan, Benjamin and Karri, Ramesh and Dolan-Gavitt, Brendan and Garg, Siddharth , booktitle=. Benchmarking Large Language Models for Automated Verilog. 2023 , organization=

  2. [2]

    2024 , publisher=

    Thakur, Shailja and Ahmad, Baleegh and Pearce, Hammond and Tan, Benjamin and Dolan-Gavitt, Brendan and Karri, Ramesh and Garg, Siddharth , journal=. 2024 , publisher=

  3. [3]

    2023 , organization=

    Liu, Mingjie and Pinckney, Nathaniel and Khailany, Brucek and Ren, Haoxing , booktitle=. 2023 , organization=

  4. [4]

    Location is Key: Leveraging

    Yao, Bingkun and Wang, Ning and Zhou, Jie and Wang, Xi and Gao, Hong and Jiang, Zhe and Guan, Nan , booktitle=. Location is Key: Leveraging. 2025 , organization=

  5. [5]

    Insights from rights and wrongs: A large language model for solving assertion failures in rtl design,

    Insights from rights and wrongs: A large language model for solving assertion failures in rtl design , author=. arXiv preprint arXiv:2503.04057 , year=

  6. [6]

    2025 IEEE International Conference on LLM-Aided Design (ICLAD) , pages=

    Large language model for verilog generation with code-structure-guided reinforcement learning , author=. 2025 IEEE International Conference on LLM-Aided Design (ICLAD) , pages=. 2025 , organization=

  7. [7]

    Proceedings of the Great Lakes Symposium on VLSI 2025 , pages=

    HWFixBench: Benchmarking Tools for Hardware Understanding and Fault Repair , author=. Proceedings of the Great Lakes Symposium on VLSI 2025 , pages=. 2025 , publisher=

  8. [8]

    2023 , volume=

    Jimenez, Carlos E and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , journal=. 2023 , volume=

  9. [9]

    arXiv preprint arXiv:2506.09003 , year=

    SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner , author=. arXiv preprint arXiv:2506.09003 , year=

  10. [10]

    Swe-perf: Can language models optimize code performance on real-world repositories? arXiv preprint arXiv:2507.12415, 2025

    Swe-perf: Can language models optimize code performance on real-world repositories? , author=. arXiv preprint arXiv:2507.12415 , year=

  11. [11]

    2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) , pages=

    Rtllm: An open-source benchmark for design rtl generation with large language model , author=. 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) , pages=. 2024 , organization=

  12. [12]

    Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , pages=

    Pyhdl-eval: An llm evaluation framework for hardware design using python-embedded dsls , author=. Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , pages=. 2024 , publisher=

  13. [13]

    2024 IEEE LLM Aided Design Workshop (LAD) , pages=

    HDLEval benchmarking LLMs for multiple HDLs , author=. 2024 IEEE LLM Aided Design Workshop (LAD) , pages=. 2024 , organization=

  14. [14]

    2024 IEEE LLM Aided Design Workshop (LAD) , pages=

    Mg-verilog: Multi-grained dataset towards enhanced llm-assisted verilog generation , author=. 2024 IEEE LLM Aided Design Workshop (LAD) , pages=. 2024 , organization=

  15. [15]

    arXiv preprint arXiv:2506.11110 , year=

    AssertBench: A Benchmark for Evaluating Self-Assertion in Large Language Models , author=. arXiv preprint arXiv:2506.11110 , year=

  16. [16]

    Advances in Neural Information Processing Systems , volume=

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , author=. Advances in Neural Information Processing Systems , volume=

  17. [17]

    Advances in Neural Information Processing Systems , volume=

    Magis: Llm-based multi-agent framework for github issue resolution , author=. Advances in Neural Information Processing Systems , volume=

  18. [18]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    Openhands: An open platform for ai software developers as generalist agents , author=. arXiv preprint arXiv:2407.16741 , year=

  19. [19]

    Agentless: Demystifying LLM-based Software Engineering Agents

    Agentless: Demystifying llm-based software engineering agents , author=. arXiv preprint arXiv:2407.01489 , year=

  20. [20]

    2025 , volume=

    Xie, Chengxing and Li, Bowen and Gao, Chang and Du, He and Lam, Wai and Zou, Difan and Chen, Kai , journal=. 2025 , volume=

  21. [21]

    2025 , howpublished =

    GPT 5.2 System Card , author =. 2025 , howpublished =

  22. [22]

    2025 , month = nov, howpublished =

  23. [23]

    2025 , howpublished =

  24. [24]

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    Deepseek-v3. 2: Pushing the frontier of open large language models , author=. arXiv preprint arXiv:2512.02556 , year=

  25. [25]

    Kimi K2: Open Agentic Intelligence

    Kimi k2: Open agentic intelligence , author=. arXiv preprint arXiv:2507.20534 , year=

  26. [26]

    2025 , eprint=

    Qwen3 Technical Report , author=. 2025 , eprint=

  27. [27]

    Training Software Engineering Agents and Verifiers with SWE-Gym

    Training software engineering agents and verifiers with swe-gym , author=. arXiv preprint arXiv:2412.21139 , year=

  28. [28]

    Qwen2.5-Coder Technical Report

    Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

  29. [29]

    arXiv preprint arXiv:2402.14323 , year=

    Repofuse: Repository-level code completion with fused dual context , author=. arXiv preprint arXiv:2402.14323 , year=

  30. [30]

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence , author=. arXiv preprint arXiv:2401.14196 , year=

  31. [31]

    arXiv preprint arXiv:2406.07003 , year=

    Graphcoder: Enhancing repository-level code completion via code context graph-based retrieval and language model , author=. arXiv preprint arXiv:2406.07003 , year=

  32. [32]

    2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=

    RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering , author=. 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=. 2025 , organization=

  33. [33]

    Truong, Weixin Liang, Fan-Yun Sun, and Nick Haber

    ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code , author=. arXiv preprint arXiv:2506.02314 , year=

  34. [34]

    arXiv preprint arXiv:2404.17153 , year=

    A unified debugging approach via llm-based multi-agent synergy , author=. arXiv preprint arXiv:2404.17153 , year=

  35. [35]

    arXiv preprint arXiv:2601.03708 , year=

    MHRC-Bench: A Multilingual Hardware Repository-Level Code Completion Benchmark , author=. arXiv preprint arXiv:2601.03708 , year=

  36. [36]

    arXiv preprint arXiv:2504.12268 , year=

    HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks , author=. arXiv preprint arXiv:2504.12268 , year=

  37. [37]

    2024 , organization=

    Tsai, Yun-Da and Liu, Mingjie and Ren, Haoxing , booktitle=. 2024 , organization=

  38. [38]

    2025 , volume=

    Mu, Fangwen and Wang, Junjie and Shi, Lin and Wang, Song and Li, Shoubin and Wang, Qing , journal=. 2025 , volume=

  39. [39]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL) , year=

    Aggarwal, Vaibhav and Kamal, Ojasv and Japesh, Abhinav and Jin, Zhijing and Sch. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL) , year=

  40. [40]

    2025 , howpublished=

    mini-SWE-agent: The 100-Line. 2025 , howpublished=

  41. [41]

    arXiv preprint arXiv:2503.21710 , year=

    Enhancing repository-level software repair via repository-aware knowledge graphs , author=. arXiv preprint arXiv:2503.21710 , year=

  42. [42]

    2025 , howpublished=

    Lingxi: Open-Source Multi-Agent Framework for Repository-Level Issue Resolution , author=. 2025 , howpublished=

  43. [43]

    Gauthier, Paul , year=. Aider:

  44. [44]

    ACM Transactions on Design Automation of Electronic Systems , volume=

    Hdldebugger: Streamlining hdl debugging with large language models , author=. ACM Transactions on Design Automation of Electronic Systems , volume=. 2025 , publisher=

  45. [45]

    Fixing hardware security bugs with large language models,

    Fixing Hardware Security Bugs with Large Language Models , author=. arXiv preprint arXiv:2302.01215 , year=

  46. [46]

    2024 , volume=

    Dong Chen and Shaoxin Lin and Muhan Zeng and Daoguang Zan and Jian-Gang Wang and Anton Cheshkov and Jun Sun and Hao Yu and Guoliang Dong and Artem Aliev and Jie Wang and Xiao Cheng and Guangtai Liang and Yuchi Ma and Pan Bian and Tao Xie and Qianxiang Wang , journal=. 2024 , volume=

  47. [47]

    Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms,

    Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms , author=. arXiv preprint arXiv:2402.00386 , year=

  48. [48]

    2026 , howpublished=

  49. [49]

    2002 , publisher=

    Digital Integrated Circuits: A Design Perspective , author=. 2002 , publisher=

  50. [50]

    2024 , howpublished =

    Introducing. 2024 , howpublished =