Not Your Usual FFT: QFT$\rightarrow$FFT via Classical Quantum-Circuit Simulation

Frej Larssen; Gilbert Netzer; Ivy Peng; Luca Pennati; Stefano Markidis

arxiv: 2606.18981 · v1 · pith:JMR5ZZNUnew · submitted 2026-06-17 · 💻 cs.ET

Not Your Usual FFT: QFTrightarrowFFT via Classical Quantum-Circuit Simulation

Stefano Markidis , Gilbert Netzer , Luca Pennati , Frej Larssen , Ivy Peng This is my paper

Pith reviewed 2026-06-26 18:30 UTC · model grok-4.3

classification 💻 cs.ET

keywords QFT simulationFFT implementationquantum circuitclassical simulatorCUDA performancediscrete Fourier transformapproximate QFTgate fusion

0 comments

The pith

Classical simulation of QFT circuits computes the discrete Fourier transform as a drop-in FFT replacement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the discrete Fourier transform can be obtained by mapping input arrays directly to normalized quantum state amplitudes and then simulating the corresponding quantum Fourier transform circuit on classical hardware. A backend-agnostic planner fuses gates and adapts memory layouts to raise arithmetic intensity across OpenMP, AVX, and CUDA implementations. On an NVIDIA A100 the CUDA backend records more than four times lower runtime than AVX or multithreaded FFTW on AMD EPYC Zen2 at larger sizes, while an approximate QFT variant further reduces depth with controlled accuracy. A sympathetic reader would care because the method supplies an alternative FFT primitive whose performance profile may differ from conventional libraries and whose circuit structure might later connect to quantum-inspired algorithms.

Core claim

By mapping input arrays directly to normalized state amplitudes with explicit indexing and executing the QFT circuit through a fused-gate schedule on classical simulators, the QFT→FFT approach computes the discrete Fourier transform and delivers over 4× lower runtime on the CUDA backend than AVX or FFTW on AMD EPYC Zen2 at larger problem sizes.

What carries the argument

The QFT→FFT mapping that treats input arrays as quantum state amplitudes together with the backend-agnostic planner that builds fused-gate schedules and memory-layout adapters.

If this is right

The AVX backend matches the performance of 64-thread FFTW on AMD EPYC Zen2.
The CUDA backend achieves more than 4× lower time than both AVX and FFTW on AMD EPYC Zen2 at larger sizes.
Truncating small-angle controlled rotations in the approximate QFT reduces circuit depth and runtime while preserving accuracy.
The planner produces a fused-gate schedule that increases arithmetic intensity and reduces memory data movement across backends.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same amplitude-mapping technique could be applied to other quantum algorithms whose classical simulation might yield new classical primitives.
Hybrid codes could switch between direct FFT calls and QFT simulation depending on hardware and problem size without changing the calling interface.
Further reductions in simulation overhead might appear when the planner is extended to exploit additional GPU-specific memory hierarchies.

Load-bearing premise

Mapping input arrays directly to normalized state amplitudes lets the QFT circuit simulation act as an accurate and efficient drop-in replacement for classical FFT without hidden accuracy loss or prohibitive simulation overhead at the tested sizes.

What would settle it

Run the same input arrays through both the QFT→FFT CUDA implementation and a standard FFT library, then compare relative error and wall-clock time; the claim fails if relative error exceeds machine precision or if the 4× runtime advantage on A100 disappears.

Figures

Figures reproduced from arXiv: 2606.18981 by Frej Larssen, Gilbert Netzer, Ivy Peng, Luca Pennati, Stefano Markidis.

**Figure 1.** Figure 1: Three-wire QFT for DFT8 (MSB 𝑞2 at bottom, LSB 𝑞0 at top). 𝐻 is the 2×2 DFT; 𝑅2 and 𝑅3 are controlled phases. Swaps implement the output bit-reversal 𝑃 = Î 𝑖 𝐿𝑖 (equivalently, permute outputs by 𝑃). In AQFT with cutoff 𝑘=1, keep 𝑅2 and drop 𝑅3. This is a single two-level update. Entries selected by 𝐸1 pass unchanged, while those selected by 𝐸2 acquire the diagonal 𝑅2 = diag(1, 𝜔4). The second diagonal is l… view at source ↗

**Figure 2.** Figure 2: Diagram of the QFT planner. layout and Simulator to a parallel for (implemented with OpenMP) over contiguous index ranges. On the AVX path, the same plan is executed by StateSpaceAVX/SimulatorAVX. This exposes the same abstract vector interface but realizes kernels with wide loads, lane permutations, and fused multiply-adds. On CUDA, StateSpaceCUDA manages device allocation and stream ownership. SimulatorC… view at source ↗

**Figure 4.** Figure 4: Execution time vs. qubits (𝑛) for QFT→FFT(AVX), QFT→FFT (CUDA), and FFTW. Times are medians with standard deviation; 𝑦-axis is log-scale. CUDA is close to FFTW at small 𝑛, becomes fastest by 𝑛 = 14, and is 4.3–4.9× faster at 𝑛 = 22. CUDA times report kernel execution only (no host–device transfers). ≈ 0.032 s (∼15×). FFTW is fastest overall, but at high thread counts AVX narrows the gap: at 64 threads, QF… view at source ↗

**Figure 5.** Figure 5: Median execution time for a 22-qubit AQFT vs. cutoff [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

We introduce QFT$\rightarrow$FFT, a family of HPC FFT libraries that compute the discrete Fourier transform by executing a quantum Fourier transform (QFT) circuit on classical quantum computer simulators. Input arrays are mapped directly to state amplitudes with explicit normalization/indexing, making QFT a drop-in replacement for FFT primitives. A backend-agnostic planner builds a fused-gate schedule and memory layout adapters to increase arithmetic intensity and reduce memory data movement. We implement this design on top of Google's C++ \texttt{qsim} and evaluate OpenMP, AVX, and CUDA backends. On an AMD EPYC Zen2 processor, our AVX performance is on par with that of multithreaded FFTW, utilizing 64 threads. On an NVIDIA A100, the CUDA backend achieves more than $4\times$ lower time than both AVX and FFTW on AMD EPYC Zen2 at larger sizes. We also employ an approximate QFT (AQFT) that truncates small-angle controlled rotations beyond a cutoff $k$, reducing circuit depth and runtime while preserving accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs classical FFT via QFT circuit simulation on qsim with a fused planner, but the main speed claim compares A100 CUDA to EPYC AVX/FFTW so it does not show the method wins on equal hardware.

read the letter

The paper's main move is to map input arrays straight to normalized quantum state amplitudes and run a QFT circuit on classical simulators to get the DFT. They add a backend-agnostic planner that fuses gates and handles memory layout, then ship OpenMP, AVX, and CUDA versions on top of qsim, plus an approximate QFT that drops small-angle rotations past a cutoff.

What is actually new is packaging this as a drop-in FFT library family with those specific optimizations. The AVX backend matching multithreaded FFTW on the EPYC Zen2 and the CUDA numbers on A100 are concrete implementation results.

The soft spot is the performance evidence. The claim of more than 4x lower time on A100 CUDA versus both AVX and FFTW on EPYC Zen2 mixes GPU and CPU platforms. That does not test whether the amplitude mapping, circuit overhead, and planner actually reduce time compared with a direct FFT on the same device. The abstract also gives no accuracy verification, error bars, or details on how normalization preserves the DFT property.

If the full paper supplies same-platform baselines and checks that the output matches a standard FFT within expected precision, the efficiency case would be stronger. As written, the drop-in replacement argument rests on unshown implementation details.

This is for readers who follow quantum circuit simulators or hunt for unusual FFT paths inside HPC code. It is not positioned as a general FFT advance.

The work shows clear engineering choices and honest engagement with the simulator stack, so it deserves a serious referee who can request the missing fair comparisons and accuracy data.

Referee Report

2 major / 0 minor

Summary. The paper introduces QFT→FFT, a family of HPC FFT libraries that compute the discrete Fourier transform by executing a quantum Fourier transform (QFT) circuit on classical quantum computer simulators. Input arrays are mapped directly to normalized state amplitudes, a backend-agnostic planner builds fused-gate schedules and memory adapters, and implementations on Google's qsim with OpenMP, AVX, and CUDA backends are evaluated. Performance claims include AVX on par with multithreaded FFTW on AMD EPYC Zen2 and CUDA on NVIDIA A100 achieving >4× lower time than AVX/FFTW at larger sizes; an approximate QFT (AQFT) variant is also presented that truncates small-angle rotations.

Significance. If the central claims hold after verification, the work would demonstrate a viable alternative FFT implementation route that reuses quantum-circuit simulation infrastructure, fused-gate planning, and approximation techniques (AQFT) for classical signal processing. The multi-backend design and explicit normalization/indexing mapping are concrete engineering contributions that could be of interest to both quantum simulation and HPC communities.

major comments (2)

[Abstract] Abstract: the performance claim that 'the CUDA backend achieves more than 4× lower time than both AVX and FFTW on AMD EPYC Zen2 at larger sizes' compares a GPU platform (A100) against a CPU platform (EPYC Zen2), so it does not test whether the QFT-circuit approach incurs lower, equal, or higher overhead than a direct FFT when executed on the same device; this mismatch is load-bearing for the 'efficient drop-in replacement' thesis.
[Abstract] Abstract: the manuscript states performance numbers and an approximate variant but supplies no error bars, no accuracy verification details (e.g., maximum relative error, L2 norm), no specific problem sizes (N values), and no description of how the explicit normalization/indexing mapping preserves the DFT property; these omissions make the soundness of both the exact and AQFT claims impossible to assess from the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen clarity and verifiability while preserving the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claim that 'the CUDA backend achieves more than 4× lower time than both AVX and FFTW on AMD EPYC Zen2 at larger sizes' compares a GPU platform (A100) against a CPU platform (EPYC Zen2), so it does not test whether the QFT-circuit approach incurs lower, equal, or higher overhead than a direct FFT when executed on the same device; this mismatch is load-bearing for the 'efficient drop-in replacement' thesis.

Authors: We agree the 4× claim is a cross-platform comparison and does not isolate the QFT-circuit overhead versus a native FFT on identical hardware. The AVX vs. FFTW comparison is performed on the same AMD EPYC Zen2 platform and shows parity. The CUDA result demonstrates the method's performance when leveraging GPU-accelerated qsim. To address the concern, we will revise the abstract and add text clarifying platform distinctions; we will also include same-device GPU comparisons (e.g., against cuFFT on A100) if space and new runs permit. revision: partial
Referee: [Abstract] Abstract: the manuscript states performance numbers and an approximate variant but supplies no error bars, no accuracy verification details (e.g., maximum relative error, L2 norm), no specific problem sizes (N values), and no description of how the explicit normalization/indexing mapping preserves the DFT property; these omissions make the soundness of both the exact and AQFT claims impossible to assess from the provided text.

Authors: The full manuscript contains the requested details: specific N values, accuracy metrics (maximum relative error and L2 norms) for both exact and AQFT cases, and an explicit description of the normalization/indexing mapping that preserves DFT equivalence. We acknowledge these elements are absent from the abstract. We will revise the abstract to include concise statements on example N values, error metrics, and a brief note on the mapping, making the claims more self-contained and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity; claims are empirical measurements of an implementation

full rationale

The paper describes a software implementation that maps arrays to QFT state amplitudes and measures runtime on specific backends (OpenMP, AVX, CUDA) against FFTW. No mathematical derivation, fitted parameters, or first-principles prediction is presented whose output reduces to the input by construction. Performance numbers are reported as direct timings rather than derived quantities. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core results. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.1-grok · 5728 in / 1077 out tokens · 39162 ms · 2026-06-26T18:30:09.402542+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 3 linked inside Pith

[1]

2015.Digital signal processing and spectral analysis for scientists: concepts and applications

Silvia Maria Alessio. 2015.Digital signal processing and spectral analysis for scientists: concepts and applications. Springer

2015
[2]

Ali Asadi and et al. 2024. Hybrid quantum programming with Pennylane lightning on HPC platforms.arXiv preprint arXiv:2403.02512(2024)

arXiv 2024
[3]

Ryo Asaka and et al. 2020. Quantum circuit for the fast Fourier transform.Quantum Information Processing19, 8 (2020), 277

2020
[4]

Alan Ayala and et al. 2020. heFFTe: Highly efficient FFT for exascale. InInternational Conference on Computational Science. Springer, 262–275

2020
[5]

Harun Bayraktar and et al. 2023. CuQuantum SDK: A high-performance library for accelerating quantum science. In2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 1. IEEE, 1050–1061

2023
[6]

Aleksandr Berezutskii and et al. 2025. Tensor networks for quantum computing.Nature Reviews Physics(2025), 1–13

2025
[7]

Daan Camps and et al. 2021. Quantum Fourier transform revisited.Numerical Linear Algebra with Applications28, 1 (2021), e2331

2021
[8]

Jielun Chen and et al. 2023. Quantum Fourier transform has small entanglement.PRX Quantum4, 4 (2023), 040318

2023
[9]

Don Coppersmith. 2002. An approximate Fourier transform useful in quantum factoring.arXiv preprint quant-ph/0201067(2002)

Pith/arXiv arXiv 2002
[10]

Pedro MQ Cruz and et al. 2020. Optimizing quantum phase estimation for the simulation of Hamiltonian eigenstates.Quantum Science and Technology5, 4 (2020), 044005

2020
[11]

Artur Ekert and Richard Jozsa. 1998. Quantum algorithms: entanglement–enhanced information processing.Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences356, 1743 (1998), 1769–1782

1998
[12]

Franz Franchetti and et al. 2005. Efficient utilization of SIMD extensions.Proc. IEEE93, 2 (2005), 409–425

2005
[13]

Franz Franchetti and et al. 2006. FFT program generation for shared memory: SMP and multicore. InSC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM, Article 115, 12 pages

2006
[14]

Franz Franchetti and Markus Puschel. 2003. Short vector code generation for the discrete Fourier transform. InProceedings International Parallel and Distributed Processing Symposium. IEEE, 10–pp

2003
[15]

Franz Franchetti and Markus Püschel. 2011. FFT (fast Fourier transform). InEncyclopedia of Parallel Computing. Springer, 658–671

2011
[16]

Matteo Frigo and Steven G Johnson. 1998. FFTW: An adaptive software architecture for the FFT. InProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), Vol. 3. IEEE, 1381–1384

1998
[17]

Amir Gholami and et al. 2015. AccFFT: A library for distributed-memory FFT on CPU and GPU architectures.arXiv preprint arXiv:1506.07933(2015)

Pith/arXiv arXiv 2015
[18]

Gian Giacomo Guerreschi and et al. 2020. Intel Quantum Simulator: A cloud-ready high-performance simulator of quantum circuits.Quantum Science and Technology5, 3 (2020), 034007

2020
[19]

Torsten Hoefler and et al. 2023. Disentangling hype from practicality: On realistically achieving quantum advantage.Commun. ACM66, 5 (2023), 82–87

2023
[20]

Sergei V Isakov and et al. 2021. Simulations of quantum circuits with approximate noise using qsim and Cirq.arXiv preprint arXiv:2111.02396(2021)

arXiv 2021
[21]

Nishant Jain and et al. 2024. Quantum Fourier networks for solving parametric PDEs.Quantum Science and Technology9, 3 (2024), 035026

2024
[22]

Tyson Jones and et al. 2019. QuEST and high performance simulation of quantum computers.Scientific reports9, 1 (2019), 10736

2019
[23]

Bastian Köpcke and et al. 2019. Generating efficient fft gpu code with lift. InProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing. 1–13

2019
[24]

Binrui Li and et al. 2021. tcfft: A fast half-precision FFT library for NVIDIA tensor cores. In2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1–11

2021
[25]

Fang Xi Lin. 2014. Shor’s algorithm and the quantum Fourier transform.McGill University(2014)

2014
[26]

Stefano Markidis. 2024. What is quantum parallelism, anyhow?. InISC High Performance 2024 Research Paper Proceedings (39th International Conference). Prometeus GmbH, 1–12

2024
[27]

Damian R Musk. 2020. A comparison of quantum and traditional Fourier transform computations.Computing in Science & Engineering22, 6 (2020), 103–110

2020
[28]

Yunseong Nam and et al. 2020. Approximate quantum Fourier transform with O (n log (n)) T gates.NPJ Quantum Information6, 1 (2020), 26

2020
[29]

Thien Nguyen and et al. 2022. Tensor network quantum virtual machine for simulating quantum circuits at exascale.ACM Transactions on Quantum Computing4, 1 (2022), 1–21

2022
[30]

2010.Quantum computation and quantum information

Michael A Nielsen and Isaac L Chuang. 2010.Quantum computation and quantum information. Cambridge university press

2010
[31]

Dmitry Pekurovsky. 2012. P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions.SIAM Journal on Scientific Computing34, 4 (2012), C192–C209

2012
[32]

Ben Rudiak-Gould. 2006. The sum-over-histories formulation of quantum computing.arXiv preprint quant-ph/0607151(2006)

Pith/arXiv arXiv 2006
[33]

1992.Computational frameworks for the fast Fourier transform

Charles Van Loan. 1992.Computational frameworks for the fast Fourier transform. SIAM

1992
[34]

Endong Wang and et al. 2014. Intel math kernel library. InHigh-Performance Computing on the Intel®Xeon Phi™: How to Fully Exploit MIC Architectures. Springer, 167–188

2014

[1] [1]

2015.Digital signal processing and spectral analysis for scientists: concepts and applications

Silvia Maria Alessio. 2015.Digital signal processing and spectral analysis for scientists: concepts and applications. Springer

2015

[2] [2]

Ali Asadi and et al. 2024. Hybrid quantum programming with Pennylane lightning on HPC platforms.arXiv preprint arXiv:2403.02512(2024)

arXiv 2024

[3] [3]

Ryo Asaka and et al. 2020. Quantum circuit for the fast Fourier transform.Quantum Information Processing19, 8 (2020), 277

2020

[4] [4]

Alan Ayala and et al. 2020. heFFTe: Highly efficient FFT for exascale. InInternational Conference on Computational Science. Springer, 262–275

2020

[5] [5]

Harun Bayraktar and et al. 2023. CuQuantum SDK: A high-performance library for accelerating quantum science. In2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 1. IEEE, 1050–1061

2023

[6] [6]

Aleksandr Berezutskii and et al. 2025. Tensor networks for quantum computing.Nature Reviews Physics(2025), 1–13

2025

[7] [7]

Daan Camps and et al. 2021. Quantum Fourier transform revisited.Numerical Linear Algebra with Applications28, 1 (2021), e2331

2021

[8] [8]

Jielun Chen and et al. 2023. Quantum Fourier transform has small entanglement.PRX Quantum4, 4 (2023), 040318

2023

[9] [9]

Don Coppersmith. 2002. An approximate Fourier transform useful in quantum factoring.arXiv preprint quant-ph/0201067(2002)

Pith/arXiv arXiv 2002

[10] [10]

Pedro MQ Cruz and et al. 2020. Optimizing quantum phase estimation for the simulation of Hamiltonian eigenstates.Quantum Science and Technology5, 4 (2020), 044005

2020

[11] [11]

Artur Ekert and Richard Jozsa. 1998. Quantum algorithms: entanglement–enhanced information processing.Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences356, 1743 (1998), 1769–1782

1998

[12] [12]

Franz Franchetti and et al. 2005. Efficient utilization of SIMD extensions.Proc. IEEE93, 2 (2005), 409–425

2005

[13] [13]

Franz Franchetti and et al. 2006. FFT program generation for shared memory: SMP and multicore. InSC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM, Article 115, 12 pages

2006

[14] [14]

Franz Franchetti and Markus Puschel. 2003. Short vector code generation for the discrete Fourier transform. InProceedings International Parallel and Distributed Processing Symposium. IEEE, 10–pp

2003

[15] [15]

Franz Franchetti and Markus Püschel. 2011. FFT (fast Fourier transform). InEncyclopedia of Parallel Computing. Springer, 658–671

2011

[16] [16]

Matteo Frigo and Steven G Johnson. 1998. FFTW: An adaptive software architecture for the FFT. InProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), Vol. 3. IEEE, 1381–1384

1998

[17] [17]

Amir Gholami and et al. 2015. AccFFT: A library for distributed-memory FFT on CPU and GPU architectures.arXiv preprint arXiv:1506.07933(2015)

Pith/arXiv arXiv 2015

[18] [18]

Gian Giacomo Guerreschi and et al. 2020. Intel Quantum Simulator: A cloud-ready high-performance simulator of quantum circuits.Quantum Science and Technology5, 3 (2020), 034007

2020

[19] [19]

Torsten Hoefler and et al. 2023. Disentangling hype from practicality: On realistically achieving quantum advantage.Commun. ACM66, 5 (2023), 82–87

2023

[20] [20]

Sergei V Isakov and et al. 2021. Simulations of quantum circuits with approximate noise using qsim and Cirq.arXiv preprint arXiv:2111.02396(2021)

arXiv 2021

[21] [21]

Nishant Jain and et al. 2024. Quantum Fourier networks for solving parametric PDEs.Quantum Science and Technology9, 3 (2024), 035026

2024

[22] [22]

Tyson Jones and et al. 2019. QuEST and high performance simulation of quantum computers.Scientific reports9, 1 (2019), 10736

2019

[23] [23]

Bastian Köpcke and et al. 2019. Generating efficient fft gpu code with lift. InProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing. 1–13

2019

[24] [24]

Binrui Li and et al. 2021. tcfft: A fast half-precision FFT library for NVIDIA tensor cores. In2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 1–11

2021

[25] [25]

Fang Xi Lin. 2014. Shor’s algorithm and the quantum Fourier transform.McGill University(2014)

2014

[26] [26]

Stefano Markidis. 2024. What is quantum parallelism, anyhow?. InISC High Performance 2024 Research Paper Proceedings (39th International Conference). Prometeus GmbH, 1–12

2024

[27] [27]

Damian R Musk. 2020. A comparison of quantum and traditional Fourier transform computations.Computing in Science & Engineering22, 6 (2020), 103–110

2020

[28] [28]

Yunseong Nam and et al. 2020. Approximate quantum Fourier transform with O (n log (n)) T gates.NPJ Quantum Information6, 1 (2020), 26

2020

[29] [29]

Thien Nguyen and et al. 2022. Tensor network quantum virtual machine for simulating quantum circuits at exascale.ACM Transactions on Quantum Computing4, 1 (2022), 1–21

2022

[30] [30]

2010.Quantum computation and quantum information

Michael A Nielsen and Isaac L Chuang. 2010.Quantum computation and quantum information. Cambridge university press

2010

[31] [31]

Dmitry Pekurovsky. 2012. P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions.SIAM Journal on Scientific Computing34, 4 (2012), C192–C209

2012

[32] [32]

Ben Rudiak-Gould. 2006. The sum-over-histories formulation of quantum computing.arXiv preprint quant-ph/0607151(2006)

Pith/arXiv arXiv 2006

[33] [33]

1992.Computational frameworks for the fast Fourier transform

Charles Van Loan. 1992.Computational frameworks for the fast Fourier transform. SIAM

1992

[34] [34]

Endong Wang and et al. 2014. Intel math kernel library. InHigh-Performance Computing on the Intel®Xeon Phi™: How to Fully Exploit MIC Architectures. Springer, 167–188

2014