StabilizerBench is a new benchmark for evaluating AI agents on generating, optimizing, and making fault-tolerant stabilizer circuits for quantum error correction, with efficient verification and multi-tier scoring.
Quanbench: Benchmarking quan- tum code generation with large language models
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
Introduces QASM-Eval, the first dataset targeting OpenQASM-3 hardware-facing features for LLM training and evaluation, with an extended verifier for syntax, states, and timelines.
A taxonomy-guided RAG system with LLMs reduces hallucinations and improves migration suggestions for Qiskit code compared to unconstrained retrieval.
Adapts QuantumKatas to Qiskit yielding a 350-task benchmark across 26 categories and evaluates 16 LLMs in 39,200 runs, reporting performance gaps and prompting effects.
PennySynth raises pass@5 success on QHack quantum coding challenges by 25-28 points over a base LLM by retrieving from a curated PennyLane dataset using code-aware embeddings.
Iterative refinement boosts LLM success in generating quantum solvers that match classical results, but more advanced models shift from execution errors to hard-to-detect numerical inaccuracies.
citing papers explorer
-
PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation
PennySynth raises pass@5 success on QHack quantum coding challenges by 25-28 points over a base LLM by retrieving from a curated PennyLane dataset using code-aware embeddings.