Better Pauli Channel Learning with Maximum Likelihood Estimation

Alireza Seif; Bryan K. Clark; Daniel Belkin; Ewout van den Berg; Faisal Alam; Matthew Thibodeau

arxiv: 2606.04096 · v1 · pith:GDPDLJTUnew · submitted 2026-06-02 · 🪐 quant-ph

Better Pauli Channel Learning with Maximum Likelihood Estimation

Daniel Belkin , Faisal Alam , Matthew Thibodeau , Alireza Seif , Ewout van den Berg , Bryan K. Clark This is my paper

Pith reviewed 2026-06-28 09:31 UTC · model grok-4.3

classification 🪐 quant-ph

keywords quantum channel tomographyPauli-Lindblad channelmaximum likelihood estimationBayesian networkerror mitigationprobabilistic error cancellation

0 comments

The pith

For 1D-local sparse Pauli-Lindblad channels, the likelihood in maximum likelihood estimation reduces to an efficiently-evaluable Bayesian network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Error mitigation in quantum devices needs accurate estimates of the noise channel, yet conventional tomography often demands too many samples to reach high precision. The paper establishes that maximum likelihood estimation, which is statistically optimal, can be made practical for the common case of 1D-local sparse Pauli-Lindblad noise because the likelihood function factors into a Bayesian network that can be evaluated efficiently. This tractability yields tomography results that are substantially more accurate than prior methods. Simulations further show that the improved estimates translate into lower overhead for downstream error mitigation protocols such as probabilistic error cancellation.

Core claim

For the common case of a 1D-local sparse Pauli-Lindblad channel, the likelihood function reduces to an efficiently-evaluable Bayesian network. The resulting computation leads to substantially improved tomography. This can produce meaningful reductions in the overhead of error mitigation, with possible extensions discussed for non-1D circuits and non-Pauli errors.

What carries the argument

Reduction of the likelihood function to an efficiently-evaluable Bayesian network for 1D-local sparse Pauli-Lindblad channels.

If this is right

Substantially improved accuracy in estimating 1D-local sparse Pauli-Lindblad channels.
Lower sample overhead for probabilistic error cancellation when using the resulting noise estimates.
Tractable maximum likelihood estimation becomes available for a practically relevant noise class where it was previously intractable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If real hardware exhibits this exact noise structure, the method could reduce the total resources required to reach a target fidelity in quantum algorithms.
Similar network reductions might exist for other sparse local models even when the circuit is not strictly one-dimensional.
Hardware experiments that inject known 1D-local Pauli-Lindblad noise would directly test whether the predicted efficiency gains materialize.

Load-bearing premise

The noise channel is exactly a 1D-local sparse Pauli-Lindblad model.

What would settle it

A controlled simulation or experiment on a device whose noise matches the 1D-local sparse Pauli-Lindblad model in which the new MLE procedure produces no accuracy gain over standard tomography methods.

Figures

Figures reproduced from arXiv: 2606.04096 by Alireza Seif, Bryan K. Clark, Daniel Belkin, Ewout van den Berg, Faisal Alam, Matthew Thibodeau.

**Figure 1.** Figure 1: FIG. 1. Schematic illustration of our work. We give an algorithm that allows MLE to be used for channel learning. This gives [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: FIG. 2. Average magnetization over time in a simulation of a Trotterized transverse-field Ising model circuit on 10 qubits. The [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Schematic illustrating a typical data collection setup for learning gate-based Pauli errors. A layer of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. (a) Original tensor network. (b) After commuting the basis rotations through the errors. (c) Reduction rules map [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. (a) Tensor network describing classical probability distribution preparation and measurement. (b) Bayesian network [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. (a) MLE reaches the same accuracy as EPF with roughly one third the sample size. Here the channel is a 10-qubit [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. Simplifying the multi-cycle case. As we commute the [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. (a) Circuit model used for learning SPAM. Measurement errors [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. Typical error of our estimated parameters vs. circuit depth at which data is collected. We use the Fisher information [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10. (a) Most informative depth at which to take data for many channels with [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: FIG. 11. Feasible set for shot allocation. Allocating shots uniformly over some range of depths, as has been done in previous [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

Error mitigation in a noisy quantum device requires a very good estimate of the noise channel. The accuracy of probabilistic error cancellation is often limited by the high sample complexity of channel tomography. In principle, optimal sample complexity is attained by maximum likelihood estimation (MLE), but MLE is computationally challenging. We show that MLE can be made computationally tractable in certain cases of interest. For the common case of a 1D-local sparse Pauli-Lindblad channel, the likelihood function reduces to an efficiently-evaluable Bayesian network. We show that the resulting computation leads to substantially improved tomography. In addition, we demonstrate by simulation that this can lead to meaningful improvements to the overhead of error mitigation. We also discuss possible extensions of our algorithm to more general settings, such as non-1D circuits and non-Pauli errors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces MLE to a Bayesian network for exactly 1D-local sparse Pauli-Lindblad channels, making optimal tomography feasible in that narrow case, but the gains rest on the model matching reality and the abstract shows no numbers.

read the letter

The main point is that for 1D-local sparse Pauli-Lindblad channels the likelihood in maximum likelihood estimation reduces to an efficiently evaluable Bayesian network. This turns a usually intractable optimal estimator into something computable for that channel class.

The reduction itself is the concrete new piece. It takes standard MLE and shows how the structure of these channels lets you factor the problem into a network that can be handled with existing inference tools. The simulations are presented as evidence that this yields better channel estimates and lower overhead for error mitigation than previous approaches.

The limitation is built in. Everything stops working if the noise deviates from 1D-local, sparse, and Pauli-Lindblad; the abstract itself flags this as applying to the common case of that model. The stress-test note is accurate on that point. In addition, the abstract claims substantially improved tomography and meaningful improvements from simulations, yet supplies no quantitative results, baselines, or error bars. Without those details it is hard to judge how large the practical gain actually is.

This is for people working on quantum noise characterization and probabilistic error cancellation in devices with structured noise. A reader focused on efficient tomography for local Pauli channels would find the Bayesian-network step worth examining.

It should go to peer review. The algorithmic reformulation is specific enough to be checked and potentially useful within its stated scope.

Referee Report

2 major / 1 minor

Summary. The paper claims that maximum likelihood estimation (MLE) for Pauli channel tomography, while optimal in sample complexity, is made computationally tractable for the common case of 1D-local sparse Pauli-Lindblad channels because the likelihood function reduces to an efficiently-evaluable Bayesian network; simulations are said to show substantially improved tomography and meaningful reductions in error-mitigation overhead, with extensions discussed for non-1D circuits and non-Pauli errors.

Significance. If the Bayesian-network reduction is correctly derived and the simulations establish concrete gains over existing tomography methods, the work would offer a practical route to near-optimal sample complexity for noise characterization under a frequently invoked noise model, directly benefiting probabilistic error cancellation. The explicit restriction to the 'common case' of 1D-local sparse Pauli-Lindblad channels is a strength that avoids overclaiming generality.

major comments (2)

[Abstract] Abstract: the claims of 'substantially improved tomography' and 'meaningful improvements to the overhead of error mitigation' are asserted on the basis of simulations, yet the abstract supplies no quantitative metrics, error bars, or baseline comparisons; the simulations section must supply these numbers (e.g., sample-complexity ratios or mitigation-overhead factors versus standard methods) for the improvement claim to be assessable.
[Abstract / common-case section] The reduction of the likelihood to a Bayesian network is derived under the exact assumption that the channel is 1D-local and sparse Pauli-Lindblad; any deviation (non-local terms, non-Pauli errors, or denser support) invalidates both the tractability argument and the claimed sample-complexity gains, and this load-bearing modeling assumption should be stated with a precise theorem or proposition number.

minor comments (1)

[Abstract] The abstract refers to 'the common case' without a forward reference to the precise definition or section that enumerates the locality and sparsity conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claims of 'substantially improved tomography' and 'meaningful improvements to the overhead of error mitigation' are asserted on the basis of simulations, yet the abstract supplies no quantitative metrics, error bars, or baseline comparisons; the simulations section must supply these numbers (e.g., sample-complexity ratios or mitigation-overhead factors versus standard methods) for the improvement claim to be assessable.

Authors: We agree that the abstract would be strengthened by including quantitative metrics. In the revised version we will update the abstract to report specific sample-complexity ratios, mitigation-overhead reduction factors, error bars, and direct comparisons to standard tomography methods, drawing these numbers explicitly from the simulations section. revision: yes
Referee: [Abstract / common-case section] The reduction of the likelihood to a Bayesian network is derived under the exact assumption that the channel is 1D-local and sparse Pauli-Lindblad; any deviation (non-local terms, non-Pauli errors, or denser support) invalidates both the tractability argument and the claimed sample-complexity gains, and this load-bearing modeling assumption should be stated with a precise theorem or proposition number.

Authors: We concur that the tractability result and associated gains hold only under the stated modeling assumptions. We will introduce a new, numbered theorem (or proposition) in the common-case section that formally states the 1D-local sparse Pauli-Lindblad assumption and the precise conditions under which the Bayesian-network reduction and sample-complexity claims are valid. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is a model-specific reduction of standard MLE

full rationale

The paper derives that, under the explicit assumption of a 1D-local sparse Pauli-Lindblad channel, the likelihood function for MLE reduces to an efficiently evaluable Bayesian network. This is a direct structural consequence of the noise model (not a fit, self-definition, or self-citation chain). No load-bearing step equates a prediction to its own input by construction, and the contribution is an algorithmic reformulation rather than a tautological renaming or imported uniqueness theorem. The result remains self-contained against external benchmarks once the model assumption is granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the noise takes the exact 1D-local sparse Pauli-Lindblad form; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption The noise channel is a 1D-local sparse Pauli-Lindblad channel
This is the setting in which the likelihood reduces to a Bayesian network.

pith-pipeline@v0.9.1-grok · 5679 in / 1052 out tokens · 23063 ms · 2026-06-28T09:31:09.460525+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 1 canonical work pages

[1]

We now study the structure ofp prep

Structure ofp prep This probability distribution still requires 2 n real numbers to specify, so we haven’t improved computational tractability very much yet. We now study the structure ofp prep. Recall thatCis a single layer of CZ gates. Consider first a qubit that is not acted on by any of the gates. This qubit is prepared in a stateR †R|0⟩=|0⟩, so the c...
[2]

However, in general such a network can’t be efficiently evaluated, since E⃗ℓ involves a sum over all 2 m possible configurations of the latent variables

Simplifying the network Equation (23) describes a Bayesian network. However, in general such a network can’t be efficiently evaluated, since E⃗ℓ involves a sum over all 2 m possible configurations of the latent variables. We will now show that this description of the network has a great deal of redundancy. Uninformative bitsSuppose a bit is prepared in an...
[3]

Thenp prep(b) = 1 2 for both bits, i.e., they are each in an equal mixture of 0 and 1

Suppose aCZ gate acts on two sites for whichR 1, R2 are both nontrivial. Thenp prep(b) = 1 2 for both bits, i.e., they are each in an equal mixture of 0 and 1. It follows that these bits will be independent of the others at all later times. No matter what latent variables act, they will end up in a distributionp out(b) = 1
[4]

uninformative

These bits are termed “uninformative”, since measurement outcomes on them carry no information at all about the structure of the channel. We may drop them from our network entirely. Latent variables that affect only uninformative bits are likewise dropped. The surviving bits all satisfyp (i) prep(b) =δ b0. 9 (1 − r45)I + r45(X ⊗ X ) CZ ( ⋅) CZ † δ δ δ δ δ...
[5]

We may write explicitly p(i) out bi|L(i) =p (i) prep " M s:i∈s Ls ! ⊕b i # (28) where⊕is the elementwise XOR

Explicit form of conditional distribution We know that the conditional distribution decomposes over qubits. We may write explicitly p(i) out bi|L(i) =p (i) prep " M s:i∈s Ls ! ⊕b i # (28) where⊕is the elementwise XOR. In other words, the conditional distribution ofb i depends only on whether the number of active latent variables touching siteiis odd or ev...
[6]

Optimization is made more efficient by access to gradients of the likelihood function

Optimization and convergence We can use belief propagation to evaluate the likelihood function at a single point[37]. Optimization is made more efficient by access to gradients of the likelihood function. We use the jax python library to obtain gradients and the scipy implementation of L-BFGS-B to do our optimization[38, 39]. Although the likelihood funct...
[7]

cut” only a small fraction of the edges in order to obtain a decomposition into treelike subgraphs. With 2D geometric locality, a second possibility is what we term “patching

The channel is sampled randomly with maximum measurement error rate 10 −2 and maximum gate-based error rate 10 −3. 14 VI. DISCUSSION We have given an efficient noise-learning algorithm using MLE. MLE is known to be asymptotically optimal; we have shown that its advantage over other available methods is large enough to be important in practice. There are a...
[8]

Filippov, M

S. Filippov, M. Leahy, M. A. C. Rossi, and G. Garc´ ıa-P´ erez, Scalable tensor-network error mitigation for near-term quantum computing (2023), arXiv:2307.11740 [quant-ph]

arXiv 2023
[9]

Guo and S

Y. Guo and S. Yang, Quantum Error Mitigation via Matrix Product Operators, PRX Quantum3, 040313 (2022)

2022
[10]

Katabarwa, K

A. Katabarwa, K. Gratsea, A. Caesura, and P. D. Johnson, Early Fault-Tolerant Quantum Computing, PRX Quantum5, 020101 (2024)

2024
[11]

Wagner, H

T. Wagner, H. Kampermann, D. Bruß, and M. Kliesch, Learning Logical Pauli Noise in Quantum Error Correction, Physical Review Letters130, 200601 (2023)

2023
[12]

Zheng, C.-T

H. Zheng, C.-T. Chu, S. Chen, A. G. Manes, S.-u. Lee, S. Zhou, and L. Jiang, Efficient learning of logical noise from syndrome data (2026), arXiv:2601.22286 [quant-ph]

arXiv 2026
[13]

Z. Zhou, S. Pexton, A. Kubica, and Y. Ding, Error Mitigation of Fault-Tolerant Quantum Circuits with Soft Information (2025), arXiv:2512.09863 [quant-ph]

arXiv 2025
[14]

Temme, S

K. Temme, S. Bravyi, and J. M. Gambetta, Error Mitigation for Short-Depth Quantum Circuits, Physical Review Letters 119, 180509 (2017)

2017
[15]

van den Berg, Z

E. van den Berg, Z. K. Minev, A. Kandala, and K. Temme, Probabilistic error cancellation with sparse Pauli–Lindblad models on noisy quantum processors, Nature Physics19, 1116 (2023)

2023
[16]

R. S. Gupta, E. van den Berg, M. Takita, D. Rist` e, K. Temme, and A. Kandala, Probabilistic error cancellation for dynamic quantum circuits, Physical Review A109, 062617 (2024)

2024
[17]

E. H. Chen, S. Chen, L. E. Fischer, A. Eddins, L. C. G. Govia, B. Mitchell, A. He, Y. Kim, L. Jiang, and A. Seif, Disambiguating Pauli noise in quantum computers (2026), arXiv:2505.22629 [quant-ph]

arXiv 2026
[18]

Li and S

Y. Li and S. C. Benjamin, Efficient Variational Quantum Simulator Incorporating Active Error Minimization, Physical Review X7, 021050 (2017)

2017
[19]

Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. van den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zaletel, K. Temme, and A. Kandala, Evidence for the utility of quantum computing before fault tolerance, Nature618, 500 (2023)

2023
[20]

Aharonov, O

D. Aharonov, O. Alberton, I. Arad, Y. Atia, E. Bairey, M. B. Dov, A. Berkovitch, Z. Brakerski, I. Cohen, E. Fuchs, O. Golan, O. Golan, B. D. Gur, I. Gurwich, A. Haber, R. Haber, D. Halbertal, Y. Itkin, B. A. Katzir, O. Kenneth, S. Kotler, R. Levi, E. Leviatan, Y. Y. Lifshitz, A. Ludmer, S. Matityahu, R. A. Melcer, A. Meyer, O. Ovdat, A. Panahi, G. Ron, I....

Pith/arXiv arXiv 2026
[21]

J. F. Poyatos, J. I. Cirac, and P. Zoller, Complete Characterization of a Quantum Process: The Two-Bit Quantum Gate, Physical Review Letters78, 390 (1997)

1997
[22]

Sarovar, T

M. Sarovar, T. Proctor, K. Rudinger, K. Young, E. Nielsen, and R. Blume-Kohout, Detecting crosstalk errors in quantum information processors, Quantum4, 321 (2020)

2020
[23]

Carignan-Dugas, D

A. Carignan-Dugas, D. Dahlen, I. Hincks, E. Ospadov, S. J. Beale, S. Ferracin, J. Skanes-Norman, J. Emerson, and J. J. Wallman, The Error Reconstruction and Compiled Calibration of Quantum Computing Cycles (2023), arXiv:2303.17714 [quant-ph]

arXiv 2023
[24]

M. R. Geller and Z. Zhou, Efficient error models for fault-tolerant architectures and the Pauli twirling approximation, Physical Review A88, 012314 (2013)

2013
[25]

Hashim, R

A. Hashim, R. K. Naik, A. Morvan, J.-L. Ville, B. Mitchell, J. M. Kreikebaum, M. Davis, E. Smith, C. Iancu, K. P. O’Brien, I. Hincks, J. J. Wallman, J. Emerson, and I. Siddiqi, Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor, Physical Review X11, 041039 (2021). 16

2021
[26]

J. J. Wallman and J. Emerson, Noise tailoring for scalable quantum computation via randomized compiling, Physical Review A94, 052325 (2016)

2016
[27]

Nielsen, J

E. Nielsen, J. K. Gamble, K. Rudinger, T. Scholten, K. Young, and R. Blume-Kohout, Gate Set Tomography, Quantum 5, 557 (2021)

2021
[28]

H. P. Nautrup, N. Delfosse, V. Dunjko, H. J. Briegel, and N. Friis, Optimizing Quantum Error Correction Codes with Reinforcement Learning, Quantum3, 215 (2019)

2019
[29]

Sivak, M

V. Sivak, M. Newman, and P. Klimov, Optimization of Decoder Priors for Accurate Quantum Error Correction, Physical Review Letters133, 150603 (2024)

2024
[30]

S. T. Flammia and J. J. Wallman, Efficient estimation of Pauli channels, ACM Transactions on Quantum Computing1, 1 (2020), arXiv:1907.12976 [quant-ph]

arXiv 2020
[31]

D’Ambrosio, Inference in Bayesian Networks, AI Magazine20, 21 (1999)

B. D’Ambrosio, Inference in Bayesian Networks, AI Magazine20, 21 (1999)

1999
[32]

E. P. Xing, L. Zhu, and M. Lee, Probabilistic Graphical Models 6 : Learning Fully Observed Bayesian Networks
[33]

J. Pearl, Reverend bayes on inference engines: a distributed hierarchical approach, inProceedings of the Second AAAI Conference on Artificial Intelligence, AAAI’82 (AAAI Press, Pittsburgh, Pennsylvania, 1982) pp. 133–136

1982
[34]

A. Seif, H. Liao, V. Tripathi, K. Krsulich, M. Malekakhlagh, M. Amico, P. Jurcevic, and A. Javadi-Abhari, Suppressing Correlated Noise in Quantum Computers via Context-Aware Compiling, in2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)(2024) pp. 310–324

2024
[35]

X. Li, J. Wang, Y.-Y. Jiang, G.-M. Xue, X. Cai, J. Zhou, M. Gong, Z.-F. Liu, S.-Y. Zheng, D.-K. Ma, M. Chen, W.-J. Sun, S. Yang, F. Yan, Y.-R. Jin, S. P. Zhao, X.-F. Ding, and H.-F. Yu, Cosmic-ray-induced correlated errors in superconducting qubit array, Nature Communications16, 4677 (2025)

2025
[36]

H. P. Binney, H. D. Pinckney, K. Azar, P. M. Harrington, S. Jha, M. Li, J. Yang, F. Contipelli, R. D. Pi˜ nero, M. Gingras, B. M. Niedzielski, H. Stickler, M. E. Schwartz, J. A. Grover, M. Hays, K. Serniak, J. A. Formaggio, and W. D. Oliver, Distinguishing types of correlated errors in superconducting qubits (2026), arXiv:2603.16494 [quant-ph]

arXiv 2026
[37]

Acharya, D

R. Acharya, D. A. Abanin, L. Aghababaie-Beni, I. Aleiner, T. I. Andersen, M. Ansmann, F. Arute, K. Arya, A. Asfaw, N. Astrakhantsev, J. Atalaya, R. Babbush, D. Bacon, B. Ballard, J. C. Bardin, J. Bausch, A. Bengtsson, A. Bilmes, S. Blackwell, S. Boixo, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, D. A. Browne, B. Buchea, B. B. Buckley, D. ...

2025
[38]

Alam, alam-faisal/qaravan (2026), original-date: 2025-04-01T23:15:23Z

F. Alam, alam-faisal/qaravan (2026), original-date: 2025-04-01T23:15:23Z

2026
[39]

S. Chen, J. Li, and A. Liu, An Optimal Tradeoff between Entanglement and Copy Complexity for State Tomography, inProceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024 (Association for Computing Machinery, New York, NY, USA, 2024) pp. 1331–1342

2024
[40]

S. Chen, W. Gong, and S. Zhou, Instance-optimal high-precision shadow tomography with few-copy measurements: A metrological approach (2026), arXiv:2602.04952 [quant-ph]

arXiv 2026
[41]

S. Chen, B. Huang, J. Li, A. Liu, and M. Sellke, When Does Adaptivity Help for Quantum State Learning? (2023), arXiv:2206.05265 [quant-ph]

arXiv 2023
[42]

Erhard, J

A. Erhard, J. J. Wallman, L. Postler, M. Meth, R. Stricker, E. A. Martinez, P. Schindler, T. Monz, J. Emerson, and R. Blatt, Characterizing large-scale quantum computers via cycle benchmarking, Nature Communications10, 5347 (2019)

2019
[43]

H. Cao, D. Feng, C. Ye, and F. Pan, Differentiable Maximum Likelihood Noise Estimation for Quantum Error Correction (2026), arXiv:2602.19722 [quant-ph]

arXiv 2026
[44]

J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding belief propagation and its generalizations, inExploring artificial intelligence in the new millennium(Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003) pp. 239–269. 17

2003
[45]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, ˙I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen,...

2020
[46]

D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming 45, 503 (1989)

1989
[47]

Harper, I

R. Harper, I. Hincks, C. Ferrie, S. T. Flammia, and J. J. Wallman, Statistical analysis of randomized benchmarking, Physical Review A99, 052350 (2019)

2019
[48]

K. C. Chang and R. M. Fung, Node Aggregation for Distributed Inference in Bayesian Networks, inProceedings of the Eleventh International Joint Conference on Artificial Intelligence(International Joint Conferences on Artificial Intelligence, Menlo Park, CA, 1989) pp. 265–270

1989
[49]

van den Berg, Z

E. van den Berg, Z. K. Minev, and K. Temme, Model-free readout-error mitigation for quantum expectation values, Physical Review A105, 032620 (2022)

2022
[50]

Magesan, J

E. Magesan, J. M. Gambetta, and J. Emerson, Characterizing Quantum Gates via Randomized Benchmarking, Physical Review A85, 042311 (2012), arXiv:1109.6887 [quant-ph]

Pith/arXiv arXiv 2012
[51]

Helsen, X

J. Helsen, X. Xue, L. M. K. Vandersypen, and S. Wehner, A new class of efficient randomized benchmarking protocols (2019), arXiv:1806.02048 [quant-ph]

arXiv 2019
[52]

J. C. Napp, Efficient Classical Simulation of Random Shallow 2D Quantum Circuits, Physical Review X12, 10.1103/Phys- RevX.12.021021 (2022)

work page doi:10.1103/phys- 2022
[53]

Torlai, C

G. Torlai, C. J. Wood, A. Acharya, G. Carleo, J. Carrasquilla, and L. Aolita, Quantum process tomography with unsu- pervised learning and tensor networks, Nature Communications14, 2858 (2023)

2023
[54]

Mangini, M

S. Mangini, M. Cattaneo, D. Cavalcanti, S. Filippov, M. A. C. Rossi, and G. Garc´ ıa-P´ erez, Tensor network noise charac- terization for near-term quantum computers, Physical Review Research6, 033217 (2024)

2024
[55]

Govia, S

L. Govia, S. Majumder, S. Barron, B. Mitchell, A. Seif, Y. Kim, C. Wood, E. Pritchett, S. Merkel, and D. McKay, Bounding the Systematic Error in Quantum Error Mitigation due to Model Violation, PRX Quantum6, 010354 (2025)

2025
[56]

H. Kwon, S. H. Lie, and L. Jiang, Universal Sample Complexity Bounds in Quantum Learning Theory via Fisher Infor- mation Matrix (2026), version Number: 2. Appendix A: Empirical Pauli Fidelities are not a sufficient statistic We will show that the empirical Pauli fidelities do not capture all of the important information in a dataset. In other words, they ...

2026
[57]

seems to scale roughly as the inverse of the average basis error probability, roughly 41· 1 # params P i wi −1

This data is for a 10-qubit channel withw i drawn from the uniform distribution over (0,10 −3). seems to scale roughly as the inverse of the average basis error probability, roughly 41· 1 # params P i wi −1 . This is shown in Figure 10a. Figure 10b compares this approach against the more standard strategy of distributing shots uniformly over some range of...
[58]

turned off

Shot allocation with measurement error Simultaneous learning of measurement error rates and internal error rates requires data from multiple depths. We may again use the Fisher information to determine how a fixed shot budget ought to be allocated across depths. The natural choice of cost metric here is not so simple. For a single layer of gates, measurem...

[1] [1]

We now study the structure ofp prep

Structure ofp prep This probability distribution still requires 2 n real numbers to specify, so we haven’t improved computational tractability very much yet. We now study the structure ofp prep. Recall thatCis a single layer of CZ gates. Consider first a qubit that is not acted on by any of the gates. This qubit is prepared in a stateR †R|0⟩=|0⟩, so the c...

[2] [2]

However, in general such a network can’t be efficiently evaluated, since E⃗ℓ involves a sum over all 2 m possible configurations of the latent variables

Simplifying the network Equation (23) describes a Bayesian network. However, in general such a network can’t be efficiently evaluated, since E⃗ℓ involves a sum over all 2 m possible configurations of the latent variables. We will now show that this description of the network has a great deal of redundancy. Uninformative bitsSuppose a bit is prepared in an...

[3] [3]

Thenp prep(b) = 1 2 for both bits, i.e., they are each in an equal mixture of 0 and 1

Suppose aCZ gate acts on two sites for whichR 1, R2 are both nontrivial. Thenp prep(b) = 1 2 for both bits, i.e., they are each in an equal mixture of 0 and 1. It follows that these bits will be independent of the others at all later times. No matter what latent variables act, they will end up in a distributionp out(b) = 1

[4] [4]

uninformative

These bits are termed “uninformative”, since measurement outcomes on them carry no information at all about the structure of the channel. We may drop them from our network entirely. Latent variables that affect only uninformative bits are likewise dropped. The surviving bits all satisfyp (i) prep(b) =δ b0. 9 (1 − r45)I + r45(X ⊗ X ) CZ ( ⋅) CZ † δ δ δ δ δ...

[5] [5]

We may write explicitly p(i) out bi|L(i) =p (i) prep " M s:i∈s Ls ! ⊕b i # (28) where⊕is the elementwise XOR

Explicit form of conditional distribution We know that the conditional distribution decomposes over qubits. We may write explicitly p(i) out bi|L(i) =p (i) prep " M s:i∈s Ls ! ⊕b i # (28) where⊕is the elementwise XOR. In other words, the conditional distribution ofb i depends only on whether the number of active latent variables touching siteiis odd or ev...

[6] [6]

Optimization is made more efficient by access to gradients of the likelihood function

Optimization and convergence We can use belief propagation to evaluate the likelihood function at a single point[37]. Optimization is made more efficient by access to gradients of the likelihood function. We use the jax python library to obtain gradients and the scipy implementation of L-BFGS-B to do our optimization[38, 39]. Although the likelihood funct...

[7] [7]

cut” only a small fraction of the edges in order to obtain a decomposition into treelike subgraphs. With 2D geometric locality, a second possibility is what we term “patching

The channel is sampled randomly with maximum measurement error rate 10 −2 and maximum gate-based error rate 10 −3. 14 VI. DISCUSSION We have given an efficient noise-learning algorithm using MLE. MLE is known to be asymptotically optimal; we have shown that its advantage over other available methods is large enough to be important in practice. There are a...

[8] [8]

Filippov, M

S. Filippov, M. Leahy, M. A. C. Rossi, and G. Garc´ ıa-P´ erez, Scalable tensor-network error mitigation for near-term quantum computing (2023), arXiv:2307.11740 [quant-ph]

arXiv 2023

[9] [9]

Guo and S

Y. Guo and S. Yang, Quantum Error Mitigation via Matrix Product Operators, PRX Quantum3, 040313 (2022)

2022

[10] [10]

Katabarwa, K

A. Katabarwa, K. Gratsea, A. Caesura, and P. D. Johnson, Early Fault-Tolerant Quantum Computing, PRX Quantum5, 020101 (2024)

2024

[11] [11]

Wagner, H

T. Wagner, H. Kampermann, D. Bruß, and M. Kliesch, Learning Logical Pauli Noise in Quantum Error Correction, Physical Review Letters130, 200601 (2023)

2023

[12] [12]

Zheng, C.-T

H. Zheng, C.-T. Chu, S. Chen, A. G. Manes, S.-u. Lee, S. Zhou, and L. Jiang, Efficient learning of logical noise from syndrome data (2026), arXiv:2601.22286 [quant-ph]

arXiv 2026

[13] [13]

Z. Zhou, S. Pexton, A. Kubica, and Y. Ding, Error Mitigation of Fault-Tolerant Quantum Circuits with Soft Information (2025), arXiv:2512.09863 [quant-ph]

arXiv 2025

[14] [14]

Temme, S

K. Temme, S. Bravyi, and J. M. Gambetta, Error Mitigation for Short-Depth Quantum Circuits, Physical Review Letters 119, 180509 (2017)

2017

[15] [15]

van den Berg, Z

E. van den Berg, Z. K. Minev, A. Kandala, and K. Temme, Probabilistic error cancellation with sparse Pauli–Lindblad models on noisy quantum processors, Nature Physics19, 1116 (2023)

2023

[16] [16]

R. S. Gupta, E. van den Berg, M. Takita, D. Rist` e, K. Temme, and A. Kandala, Probabilistic error cancellation for dynamic quantum circuits, Physical Review A109, 062617 (2024)

2024

[17] [17]

E. H. Chen, S. Chen, L. E. Fischer, A. Eddins, L. C. G. Govia, B. Mitchell, A. He, Y. Kim, L. Jiang, and A. Seif, Disambiguating Pauli noise in quantum computers (2026), arXiv:2505.22629 [quant-ph]

arXiv 2026

[18] [18]

Li and S

Y. Li and S. C. Benjamin, Efficient Variational Quantum Simulator Incorporating Active Error Minimization, Physical Review X7, 021050 (2017)

2017

[19] [19]

Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. van den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zaletel, K. Temme, and A. Kandala, Evidence for the utility of quantum computing before fault tolerance, Nature618, 500 (2023)

2023

[20] [20]

Aharonov, O

D. Aharonov, O. Alberton, I. Arad, Y. Atia, E. Bairey, M. B. Dov, A. Berkovitch, Z. Brakerski, I. Cohen, E. Fuchs, O. Golan, O. Golan, B. D. Gur, I. Gurwich, A. Haber, R. Haber, D. Halbertal, Y. Itkin, B. A. Katzir, O. Kenneth, S. Kotler, R. Levi, E. Leviatan, Y. Y. Lifshitz, A. Ludmer, S. Matityahu, R. A. Melcer, A. Meyer, O. Ovdat, A. Panahi, G. Ron, I....

Pith/arXiv arXiv 2026

[21] [21]

J. F. Poyatos, J. I. Cirac, and P. Zoller, Complete Characterization of a Quantum Process: The Two-Bit Quantum Gate, Physical Review Letters78, 390 (1997)

1997

[22] [22]

Sarovar, T

M. Sarovar, T. Proctor, K. Rudinger, K. Young, E. Nielsen, and R. Blume-Kohout, Detecting crosstalk errors in quantum information processors, Quantum4, 321 (2020)

2020

[23] [23]

Carignan-Dugas, D

A. Carignan-Dugas, D. Dahlen, I. Hincks, E. Ospadov, S. J. Beale, S. Ferracin, J. Skanes-Norman, J. Emerson, and J. J. Wallman, The Error Reconstruction and Compiled Calibration of Quantum Computing Cycles (2023), arXiv:2303.17714 [quant-ph]

arXiv 2023

[24] [24]

M. R. Geller and Z. Zhou, Efficient error models for fault-tolerant architectures and the Pauli twirling approximation, Physical Review A88, 012314 (2013)

2013

[25] [25]

Hashim, R

A. Hashim, R. K. Naik, A. Morvan, J.-L. Ville, B. Mitchell, J. M. Kreikebaum, M. Davis, E. Smith, C. Iancu, K. P. O’Brien, I. Hincks, J. J. Wallman, J. Emerson, and I. Siddiqi, Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor, Physical Review X11, 041039 (2021). 16

2021

[26] [26]

J. J. Wallman and J. Emerson, Noise tailoring for scalable quantum computation via randomized compiling, Physical Review A94, 052325 (2016)

2016

[27] [27]

Nielsen, J

E. Nielsen, J. K. Gamble, K. Rudinger, T. Scholten, K. Young, and R. Blume-Kohout, Gate Set Tomography, Quantum 5, 557 (2021)

2021

[28] [28]

H. P. Nautrup, N. Delfosse, V. Dunjko, H. J. Briegel, and N. Friis, Optimizing Quantum Error Correction Codes with Reinforcement Learning, Quantum3, 215 (2019)

2019

[29] [29]

Sivak, M

V. Sivak, M. Newman, and P. Klimov, Optimization of Decoder Priors for Accurate Quantum Error Correction, Physical Review Letters133, 150603 (2024)

2024

[30] [30]

S. T. Flammia and J. J. Wallman, Efficient estimation of Pauli channels, ACM Transactions on Quantum Computing1, 1 (2020), arXiv:1907.12976 [quant-ph]

arXiv 2020

[31] [31]

D’Ambrosio, Inference in Bayesian Networks, AI Magazine20, 21 (1999)

B. D’Ambrosio, Inference in Bayesian Networks, AI Magazine20, 21 (1999)

1999

[32] [32]

E. P. Xing, L. Zhu, and M. Lee, Probabilistic Graphical Models 6 : Learning Fully Observed Bayesian Networks

[33] [33]

J. Pearl, Reverend bayes on inference engines: a distributed hierarchical approach, inProceedings of the Second AAAI Conference on Artificial Intelligence, AAAI’82 (AAAI Press, Pittsburgh, Pennsylvania, 1982) pp. 133–136

1982

[34] [34]

A. Seif, H. Liao, V. Tripathi, K. Krsulich, M. Malekakhlagh, M. Amico, P. Jurcevic, and A. Javadi-Abhari, Suppressing Correlated Noise in Quantum Computers via Context-Aware Compiling, in2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)(2024) pp. 310–324

2024

[35] [35]

X. Li, J. Wang, Y.-Y. Jiang, G.-M. Xue, X. Cai, J. Zhou, M. Gong, Z.-F. Liu, S.-Y. Zheng, D.-K. Ma, M. Chen, W.-J. Sun, S. Yang, F. Yan, Y.-R. Jin, S. P. Zhao, X.-F. Ding, and H.-F. Yu, Cosmic-ray-induced correlated errors in superconducting qubit array, Nature Communications16, 4677 (2025)

2025

[36] [36]

H. P. Binney, H. D. Pinckney, K. Azar, P. M. Harrington, S. Jha, M. Li, J. Yang, F. Contipelli, R. D. Pi˜ nero, M. Gingras, B. M. Niedzielski, H. Stickler, M. E. Schwartz, J. A. Grover, M. Hays, K. Serniak, J. A. Formaggio, and W. D. Oliver, Distinguishing types of correlated errors in superconducting qubits (2026), arXiv:2603.16494 [quant-ph]

arXiv 2026

[37] [37]

Acharya, D

R. Acharya, D. A. Abanin, L. Aghababaie-Beni, I. Aleiner, T. I. Andersen, M. Ansmann, F. Arute, K. Arya, A. Asfaw, N. Astrakhantsev, J. Atalaya, R. Babbush, D. Bacon, B. Ballard, J. C. Bardin, J. Bausch, A. Bengtsson, A. Bilmes, S. Blackwell, S. Boixo, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, D. A. Browne, B. Buchea, B. B. Buckley, D. ...

2025

[38] [38]

Alam, alam-faisal/qaravan (2026), original-date: 2025-04-01T23:15:23Z

F. Alam, alam-faisal/qaravan (2026), original-date: 2025-04-01T23:15:23Z

2026

[39] [39]

S. Chen, J. Li, and A. Liu, An Optimal Tradeoff between Entanglement and Copy Complexity for State Tomography, inProceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024 (Association for Computing Machinery, New York, NY, USA, 2024) pp. 1331–1342

2024

[40] [40]

S. Chen, W. Gong, and S. Zhou, Instance-optimal high-precision shadow tomography with few-copy measurements: A metrological approach (2026), arXiv:2602.04952 [quant-ph]

arXiv 2026

[41] [41]

S. Chen, B. Huang, J. Li, A. Liu, and M. Sellke, When Does Adaptivity Help for Quantum State Learning? (2023), arXiv:2206.05265 [quant-ph]

arXiv 2023

[42] [42]

Erhard, J

A. Erhard, J. J. Wallman, L. Postler, M. Meth, R. Stricker, E. A. Martinez, P. Schindler, T. Monz, J. Emerson, and R. Blatt, Characterizing large-scale quantum computers via cycle benchmarking, Nature Communications10, 5347 (2019)

2019

[43] [43]

H. Cao, D. Feng, C. Ye, and F. Pan, Differentiable Maximum Likelihood Noise Estimation for Quantum Error Correction (2026), arXiv:2602.19722 [quant-ph]

arXiv 2026

[44] [44]

J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding belief propagation and its generalizations, inExploring artificial intelligence in the new millennium(Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003) pp. 239–269. 17

2003

[45] [45]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, ˙I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen,...

2020

[46] [46]

D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming 45, 503 (1989)

1989

[47] [47]

Harper, I

R. Harper, I. Hincks, C. Ferrie, S. T. Flammia, and J. J. Wallman, Statistical analysis of randomized benchmarking, Physical Review A99, 052350 (2019)

2019

[48] [48]

K. C. Chang and R. M. Fung, Node Aggregation for Distributed Inference in Bayesian Networks, inProceedings of the Eleventh International Joint Conference on Artificial Intelligence(International Joint Conferences on Artificial Intelligence, Menlo Park, CA, 1989) pp. 265–270

1989

[49] [49]

van den Berg, Z

E. van den Berg, Z. K. Minev, and K. Temme, Model-free readout-error mitigation for quantum expectation values, Physical Review A105, 032620 (2022)

2022

[50] [50]

Magesan, J

E. Magesan, J. M. Gambetta, and J. Emerson, Characterizing Quantum Gates via Randomized Benchmarking, Physical Review A85, 042311 (2012), arXiv:1109.6887 [quant-ph]

Pith/arXiv arXiv 2012

[51] [51]

Helsen, X

J. Helsen, X. Xue, L. M. K. Vandersypen, and S. Wehner, A new class of efficient randomized benchmarking protocols (2019), arXiv:1806.02048 [quant-ph]

arXiv 2019

[52] [52]

J. C. Napp, Efficient Classical Simulation of Random Shallow 2D Quantum Circuits, Physical Review X12, 10.1103/Phys- RevX.12.021021 (2022)

work page doi:10.1103/phys- 2022

[53] [53]

Torlai, C

G. Torlai, C. J. Wood, A. Acharya, G. Carleo, J. Carrasquilla, and L. Aolita, Quantum process tomography with unsu- pervised learning and tensor networks, Nature Communications14, 2858 (2023)

2023

[54] [54]

Mangini, M

S. Mangini, M. Cattaneo, D. Cavalcanti, S. Filippov, M. A. C. Rossi, and G. Garc´ ıa-P´ erez, Tensor network noise charac- terization for near-term quantum computers, Physical Review Research6, 033217 (2024)

2024

[55] [55]

Govia, S

L. Govia, S. Majumder, S. Barron, B. Mitchell, A. Seif, Y. Kim, C. Wood, E. Pritchett, S. Merkel, and D. McKay, Bounding the Systematic Error in Quantum Error Mitigation due to Model Violation, PRX Quantum6, 010354 (2025)

2025

[56] [56]

H. Kwon, S. H. Lie, and L. Jiang, Universal Sample Complexity Bounds in Quantum Learning Theory via Fisher Infor- mation Matrix (2026), version Number: 2. Appendix A: Empirical Pauli Fidelities are not a sufficient statistic We will show that the empirical Pauli fidelities do not capture all of the important information in a dataset. In other words, they ...

2026

[57] [57]

seems to scale roughly as the inverse of the average basis error probability, roughly 41· 1 # params P i wi −1

This data is for a 10-qubit channel withw i drawn from the uniform distribution over (0,10 −3). seems to scale roughly as the inverse of the average basis error probability, roughly 41· 1 # params P i wi −1 . This is shown in Figure 10a. Figure 10b compares this approach against the more standard strategy of distributing shots uniformly over some range of...

[58] [58]

turned off

Shot allocation with measurement error Simultaneous learning of measurement error rates and internal error rates requires data from multiple depths. We may again use the Fisher information to determine how a fixed shot budget ought to be allocated across depths. The natural choice of cost metric here is not so simple. For a single layer of gates, measurem...