Recognition: unknown
A PennyLane-Centric Dataset to Enhance LLM-based Quantum Code Generation using RAG
read the original abstract
Large Language Models (LLMs) offer powerful capabilities in code generation, natural language understanding, and domain-specific reasoning. Their application to quantum software development remains limited, in part because of the lack of high-quality datasets both for LLM training and as dependable knowledge sources. To bridge this gap, we introduce \textit{PennyLang}, an off-the-shelf, high-quality dataset of 3,347 PennyLane-specific quantum code samples with contextual descriptions, curated from textbooks, official documentation, and open-source repositories. Our contributions are threefold: (1) the creation and open-source release of PennyLang, a purpose-built dataset for quantum programming with PennyLane; (2) a framework for automated quantum code dataset construction that systematizes curation, annotation, and formatting to maximize downstream LLM usability; and (3) a baseline evaluation of the dataset across multiple open-source and commercial models, including ablation studies, all conducted within a retrieval-augmented generation (RAG) pipeline. Using PennyLang with RAG substantially improves performance: for example, Qwen 7B's success rate rises from 8.7% without retrieval to 41.7% with full-context augmentation, and LLaMa 4 improves from 78.8% to 84.8%, while also reducing hallucinations and enhancing quantum code correctness. Moving beyond Qiskit-focused studies, we bring LLM-based tools and reproducible methods to PennyLane for advancing AI-assisted quantum development.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation
Q-SAGE iteratively refines LLM-generated quantum solver scripts by comparing outputs to classical results, improving success rates while exposing persistent numerical accuracy limits.
-
Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation
Iterative refinement boosts LLM success in generating quantum solvers that match classical results, but more advanced models shift from execution errors to hard-to-detect numerical inaccuracies.
-
Automated Quantum Software and AI Engineering
A systematic literature review maps trends in automated approaches to quantum software engineering and quantum AI, highlighting their role in hybrid quantum-classical systems.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.