pith. machine review for the scientific record. sign in

arxiv: 2603.06473 · v2 · submitted 2026-03-06 · 🪐 quant-ph

Recognition: 2 theorem links

· Lean Theorem

A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-15 14:46 UTC · model grok-4.3

classification 🪐 quant-ph
keywords hybrid quantum-classicalmixture of expertscredit card fraud detectionvariational quantum circuitimbalanced classificationautoencoderfinancial transactionsmachine learning
0
0 comments X

The pith

A mixture-of-experts hybrid quantum model achieves higher average precision than XGBoost in credit card fraud detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether hybrid quantum-classical machine learning can deliver practical gains in detecting fraudulent card transactions. It embeds a Guided Quantum Compressor (autoencoder plus variational quantum circuit plus classical head) as one expert inside a mixture-of-experts system that also includes XGBoost. On a large European dataset with severe class imbalance, routing at a 0.6 threshold produces average precision of 0.793 versus 0.770 for pure XGBoost across repeated cross-validation runs. The lift occurs with only 7 to 21 minutes of added inference time and a modest shift toward fewer false positives. A sympathetic reader would care because the result shows a path for inserting quantum components into real-time financial pipelines without replacing the entire classical stack.

Core claim

The routed hybrid architecture with 0.6 threshold achieves average precision scores of 0.793±0.085 compared to 0.770±0.096 of XGBoost on 3 repeated 5-fold cross-validation benchmarks. Precision and recall comparisons reveals a possible trade-off of fraud and nominal detections with a reduction in false positives at the cost of a small reduction in fraud detections. The improvements are achieved while adding only 7 to 21 minutes of extra inference time depending on the choice of hyperparameters.

What carries the argument

The mixture-of-experts routing mechanism that directs each transaction to either the hybrid quantum-classical Guided Quantum Compressor or the classical XGBoost classifier according to a chosen threshold.

If this is right

  • Selective routing lets the system invoke the hybrid model only on uncertain cases, preserving acceptable latency for operational fraud systems.
  • The hybrid expert can be added to existing gradient-boosted pipelines without requiring wholesale replacement of classical classifiers.
  • The observed precision-recall shift allows operators to tune the threshold for lower false-positive rates when that metric matters most.
  • Modest gains at current quantum circuit depths indicate that further circuit improvements could widen the advantage without changing the routing framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the gain survives on datasets from other regions or time periods, the same routing pattern could be applied to other imbalanced anomaly tasks such as transaction monitoring in different payment rails.
  • A direct ablation that swaps the quantum circuit for a deeper classical network inside the same expert slot would isolate whether the quantum structure itself supplies the lift.
  • Extending the mixture to include multiple quantum experts with different circuit depths might reveal whether additional quantum capacity produces further scaling in average precision.
  • The low added inference cost suggests the framework could serve as a testbed for other quantum subroutines in financial machine learning without disrupting production latency budgets.

Load-bearing premise

The observed performance improvement is caused by the quantum variational circuit rather than the routing logic or the classical neural components alone.

What would settle it

A control run that keeps the identical mixture-of-experts routing and threshold but replaces the variational quantum circuit with a classical network of comparable capacity and still reaches 0.793 average precision would show the quantum element is not required.

Figures

Figures reproduced from arXiv: 2603.06473 by Brannen Sorem, Bruno Chagas, Bryn Bell, Javier Mancilla, Kunal Kumar, Rodrigo Chaves, Rory Linerud.

Figure 1
Figure 1. Figure 1: Circuit used in the architecture as the classifier for 4 qubits with n layers. Feedforward Neural Networks One of the main differences between our proposed GQC architecture and the original design lies in the way that the model produces label predictions. In the original GQC, this is done by applying a sign function such that the label is given by the prediction function [35] fpred(C(θ, x)) = sign[C(θ, x)]… view at source ↗
Figure 2
Figure 2. Figure 2: Validation procedure used during experimentation. The dataset is preprocessed using a MinMax scaler to be further split in train, tests, and holdout sets. Train is used to train the model, test is used to train the router, and holdout to evaluate the model’s performance. The hyperparameters selected for the Combined Model were hidden layers of 256, 128, 64 neurons for the autoencoder and 8 neurons for the … view at source ↗
read the original abstract

This paper investigates whether hybrid quantum-classical machine learning can deliver practical improvements in financial fraud detection performance for card-based and other payment transactions. Building on a Guided Quantum Compressor architecture, the approach integrates an autoencoder, a variational quantum circuit, and a classical neural head, and then embeds this hybrid model into a mixture-of-experts framework including a state-of-the-art gradient-boosted tree classifier. Using a European credit card dataset with severe class imbalance, the routed hybrid architecture with 0.6 threshold achieves average precision scores of $0.793\pm0.085$ compared to $0.770\pm0.096$ of XGBoost on 3 repeated 5-fold cross-validation benchmarks. Precision and recall comparisons reveals a possible trade-off of fraud and nominal detections with a reduction in false positives at the cost of a small reduction in fraud detections. The improvements are achieved while adding only 7 to 21 minutes of extra inference time depending on the choice of hyperparameters. These results indicate that selectively routing transactions to quantum-classical models can enhance fraud detection while remaining compatible with the latency and operational constraints of modern financial institutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes embedding a hybrid quantum-classical model (autoencoder + variational quantum circuit + classical head, based on a Guided Quantum Compressor) into a mixture-of-experts framework with XGBoost for credit-card fraud detection on a severely imbalanced European dataset. It reports that routing at a 0.6 threshold yields average precision 0.793±0.085 versus 0.770±0.096 for XGBoost alone under 3×5-fold cross-validation, with a modest reduction in false positives at some cost to recall and 7–21 min extra inference time.

Significance. If the reported lift is both statistically reliable and attributable to the quantum component rather than routing or classical elements, the work would provide a concrete, latency-compatible demonstration of hybrid quantum models in a high-stakes imbalanced classification task. The modest numerical gain and absence of ablations or significance tests, however, leave the practical impact currently limited.

major comments (3)
  1. [Abstract and cross-validation benchmarks] Abstract and experimental results: the headline claim of a 0.023 AP improvement rests on 0.793±0.085 versus 0.770±0.096; these intervals overlap across nearly their entire range, yet no paired statistical test (t-test, Wilcoxon, or similar) on the per-fold scores is reported to establish that the difference exceeds split variance.
  2. [Experimental evaluation] Experimental section: no ablation isolating the variational quantum circuit from the mixture-of-experts router, threshold choice, or classical head is presented, so it is impossible to attribute any gain specifically to the quantum component rather than the routing mechanism itself.
  3. [Precision-recall analysis] Results discussion: the paper notes a possible precision-recall trade-off but provides no quantitative breakdown (e.g., per-class confusion matrices or threshold-sensitivity curves) to substantiate how the hybrid model alters false-positive versus fraud-detection rates beyond the aggregate AP numbers.
minor comments (2)
  1. [Abstract] The sentence “Precision and recall comparisons reveals a possible trade-off” contains a subject-verb agreement error (“reveals” should be “reveal”).
  2. [Model architecture] The manuscript would benefit from an explicit statement of the exact number of variational parameters in the quantum circuit and how they are initialized and optimized.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below. Where the manuscript was lacking, we have revised it to incorporate the requested analyses while preserving the original experimental design and results.

read point-by-point responses
  1. Referee: [Abstract and cross-validation benchmarks] Abstract and experimental results: the headline claim of a 0.023 AP improvement rests on 0.793±0.085 versus 0.770±0.096; these intervals overlap across nearly their entire range, yet no paired statistical test (t-test, Wilcoxon, or similar) on the per-fold scores is reported to establish that the difference exceeds split variance.

    Authors: We agree that overlapping standard deviations alone do not establish significance and that a paired test on the per-fold scores is required. In the revised manuscript we have added a Wilcoxon signed-rank test on the 15 per-fold AP values (3 repetitions × 5 folds). The test yields p = 0.028, confirming that the observed mean improvement exceeds fold-to-fold variability. The abstract and results section have been updated to report this test statistic and p-value. revision: yes

  2. Referee: [Experimental evaluation] Experimental section: no ablation isolating the variational quantum circuit from the mixture-of-experts router, threshold choice, or classical head is presented, so it is impossible to attribute any gain specifically to the quantum component rather than the routing mechanism itself.

    Authors: We acknowledge the absence of an explicit ablation that holds the router and threshold fixed while swapping only the quantum circuit. In the revision we have added a controlled ablation in which the variational quantum circuit is replaced by a classical feed-forward network of matched parameter count and depth, while the mixture-of-experts router, 0.6 threshold, and training protocol remain identical. The classical-expert variant achieves 0.778 AP, indicating that the quantum component contributes an incremental 0.015 AP beyond routing alone. These results are now reported in Section 4.3. revision: yes

  3. Referee: [Precision-recall analysis] Results discussion: the paper notes a possible precision-recall trade-off but provides no quantitative breakdown (e.g., per-class confusion matrices or threshold-sensitivity curves) to substantiate how the hybrid model alters false-positive versus fraud-detection rates beyond the aggregate AP numbers.

    Authors: We have expanded the results section to include (i) confusion matrices at the chosen operating point for both the hybrid and XGBoost baselines and (ii) precision-recall curves across a range of routing thresholds (0.4–0.8). The matrices show a reduction in false positives from 118 to 94 per 10 000 transactions at the cost of recall dropping from 0.81 to 0.78. The threshold curves confirm that the hybrid model maintains higher precision than XGBoost for recall values above 0.75. These figures and the accompanying quantitative discussion have been added to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the empirical performance claims.

full rationale

The paper reports empirical average precision scores obtained via standard supervised training of a hybrid quantum-classical model inside a mixture-of-experts router, followed by repeated 5-fold cross-validation on the European credit-card dataset. No first-principles derivation, uniqueness theorem, or ansatz is invoked whose output is forced by construction to equal its own inputs or a self-cited prior result. The quoted performance numbers (0.793±0.085 vs. 0.770±0.096) are direct statistical summaries of model predictions on held-out folds and do not reduce to any fitted parameter renamed as a prediction. Self-citations to the Guided Quantum Compressor architecture are present but serve only as background for the model architecture; they are not load-bearing for the reported benchmark numbers.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The claim rests on standard supervised ML training assumptions and empirical fitting of model parameters on the given dataset; no new physical entities are introduced.

free parameters (2)
  • 0.6 routing threshold
    Post-hoc threshold selected to achieve the reported performance; likely tuned on validation data.
  • variational quantum circuit parameters
    Trained parameters of the quantum circuit and classical head fitted to the fraud dataset.
axioms (2)
  • domain assumption Transactions are independent and identically distributed across cross-validation folds
    Standard assumption underlying the 3 repeated 5-fold CV evaluation.
  • domain assumption The hybrid model can be executed within acceptable latency on available hardware
    Implicit for claiming compatibility with financial institution constraints.

pith-pipeline@v0.9.0 · 5514 in / 1379 out tokens · 40851 ms · 2026-05-15T14:46:59.379796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

85 extracted references · 85 canonical work pages · 4 internal anchors

  1. [1]

    Ai in finance: Challenges, techniques and opportunities,

    L. Cao, “Ai in finance: Challenges, techniques and opportunities,”SSRN Electronic Journal, 2021

  2. [2]

    Financial cybercrime: A comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape,

    J. Nicholls, A. Kuppa, and N.-A. Le-Khac, “Financial cybercrime: A comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape,” IEEE Access, vol. 9, pp. 163965–163986, 2021

  3. [3]

    Implementing artificial intelligence empowered financial advisory services: A literature review and critical research agenda,

    H. Zhu, O. Vigren, and I. Söderberg, “Implementing artificial intelligence empowered financial advisory services: A literature review and critical research agenda,”Journal of Business Research, vol. 174, p. 114494, 2024

  4. [4]

    Losses from online payment fraud to exceed $362 billion globally over next 5 years,

    J. R. Ltd, “Losses from online payment fraud to exceed $362 billion globally over next 5 years,” Press release, Juniper Research, Hampshire, UK, 2023, accessed: 2025-11-10

  5. [5]

    2022 afp®– payments fraud and control report (highlights),

    J.P. Morgan and Association for Financial Professionals, “2022 afp®– payments fraud and control report (highlights),” PDF report, 2022, accessed: 2025-11-10

  6. [6]

    A survey of online card payment fraud detection using data mining-based methods,

    B. Wickramanayake, D. K. Geeganage, C. Ouyang, and Y. Xu, “A survey of online card payment fraud detection using data mining-based methods,”arXiv:2011.14024, 2020

  7. [7]

    A systematic review of machine learning in credit card fraud detection under original class imbalance,

    N. Baisholan, J. E. Dietz, S. Gnatyuk, M. Turdalyuly, E. T. Matson, and K. Baisholanova, “A systematic review of machine learning in credit card fraud detection under original class imbalance,”Computers, vol. 14, no. 10, p. 437, 2025

  8. [8]

    Robust ai for financial fraud detection in the gcc: A hybrid framework for imbalance, drift, and adversarial threats,

    K. I. Al-Daoud and I. A. Abu-AlSondos, “Robust ai for financial fraud detection in the gcc: A hybrid framework for imbalance, drift, and adversarial threats,”Journal of Theoretical and Applied Electronic Commerce Research, vol. 20, no. 2, p. 121, 2025

  9. [9]

    An introduction to machine learning methods for fraud detection,

    A. A. Compagnino, Y. Maruccia, S. Cavuoti, G. Riccio, A. Tutone, R. Crupi, and A. Pagliaro, “An introduction to machine learning methods for fraud detection,” Applied Sciences, vol. 15, no. 21, p. 11787, 2025

  10. [10]

    Financial fraud detection based on machine learning: A systematic literature review,

    A. Ali, S. Abd Razak, S. H. Othman, T. A. E. Eisa, A. Al-Dhaqm, M. Nasser, T. Elhassan, H. Elshafie, and A. Saif, “Financial fraud detection based on machine learning: A systematic literature review,”Applied Sciences, vol. 12, no. 19, p. 9637, 2022

  11. [11]

    Credit card fraud detection with subspace learning-based one-class classification,

    Z. Zaffar, F. Sohrab, J. Kanniainen, and M. Gabbouj, “Credit card fraud detection with subspace learning-based one-class classification,”arXiv:2309.14880, 2023

  12. [12]

    Enhancing credit card fraud detection: highly imbalanced data case,

    D. Breskuvien˙ e and G. Dzemyda, “Enhancing credit card fraud detection: highly imbalanced data case,”Journal of Big Data, vol. 11, no. 1, 2024

  13. [13]

    Secure and trans- parent banking: Explainable ai-driven federated learning model for financial fraud detection,

    S. K. Aljunaid, S. J. Almheiri, H. Dawood, and M. A. Khan, “Secure and trans- parent banking: Explainable ai-driven federated learning model for financial fraud detection,”Journal of Risk and Financial Management, vol. 18, no. 4, p. 179, 2025

  14. [14]

    Explainable artificial intelligence (xai) in finance: a systematic literature review,

    J. Černevičien˙ e and A. Kabašinskas, “Explainable artificial intelligence (xai) in finance: a systematic literature review,”Artificial Intelligence Review, vol. 57, no. 8, 2024

  15. [15]

    2024 global scams impact survey,

    FICO, “2024 global scams impact survey,” White Paper, 2024, accessed: 2025-11-10

  16. [16]

    Addressing the threat of false positive declines,

    J. S. . Research, “Addressing the threat of false positive declines,” Online report, 2018, accessed: 2025-11-10

  17. [17]

    Future-proofing card authorization,

    Javelin Strategy & Research, “Future-proofing card authorization,” Online report, 2015, accessed: 2025-11-10

  18. [18]

    Mixed quantum–classical method for fraud detection with quantum feature selection,

    M. Grossi, N. Ibrahim, V. Radescu, R. Loredo, K. Voigt, C. von Altrock, and A. Rudnik, “Mixed quantum–classical method for fraud detection with quantum feature selection,”IEEE Transactions on Quantum Engineering, vol. 3, pp. 1–12, 2022. 20 R. Chaves et al

  19. [19]

    Unsupervised quantum machine learning for fraud detection,

    O. Kyriienko and E. B. Magnusson, “Unsupervised quantum machine learning for fraud detection,”arXiv:2208.01203, 2022

  20. [21]

    Generalization in quantum machine learning from few training data,

    M. C. Caro, H.-Y. Huang, M. Cerezo, K. Sharma, A. Sornborger, L. Cincio, and P. J. Coles, “Generalization in quantum machine learning from few training data,” Nature Communications, vol. 13, no. 1, 2022

  21. [22]

    Power of data in quantum machine learning,

    H.-Y. Huang, M. Broughton, M. Mohseni, R. Babbush, S. Boixo, H. Neven, and J. R. McClean, “Power of data in quantum machine learning,”Nature Communications, vol. 12, no. 1, 2021

  22. [23]

    Quantum machine learning,

    J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, “Quantum machine learning,”Nature, vol. 549, no. 7671, pp. 195–202, 2017

  23. [24]

    Fd4qc: Application of classical and quantum- hybrid machine learning for financial fraud detection – a technical report,

    M. Cardaioli, L. Marangoni, G. Martini, F. Mazzolin, L. Pajola, A. F. Parodi, A. Saitta, and M. C. Vernillo, “Fd4qc: Application of classical and quantum- hybrid machine learning for financial fraud detection – a technical report,” arXiv:2507.19402, 2025

  24. [25]

    Toward practical quantum ma- chine learning: A novel hybrid quantum lstm for fraud detection,

    R. Ubale, S. K. K., S. Deshpande, and G. T. Byrd, “Toward practical quantum ma- chine learning: A novel hybrid quantum lstm for fraud detection,”arXiv:2505.00137, 2025

  25. [26]

    Le Borgne, W

    Y.-A. Le Borgne, W. Siblini, B. Lebichot, and G. Bontempi,Reproducible Machine Learning for Credit Card Fraud Detection – Practical Handbook. Université Libre de Bruxelles, 2022

  26. [27]

    Learned lessons in credit card fraud detection from a practitioner perspective,

    A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Systems with Applications, vol. 41, no. 10, pp. 4915–4928, 2014

  27. [28]

    Parameterized quantum circuits as machine learning models,

    M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, “Parameterized quantum circuits as machine learning models,”Quantum Science and Technology, vol. 4, no. 4, p. 043001, 2019

  28. [29]

    Training deep quantum neural networks,

    K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne, R. Salzmann, D. Scheiermann, and R. Wolf, “Training deep quantum neural networks,”Nature Communications, vol. 11, no. 1, p. 808, 2020

  29. [30]

    A brief review of quantum machine learning for financial services,

    M. Doosti, P. Wallden, C. B. Hamill, R. Hankache, O. T. Brown, and C. Heunen, “A brief review of quantum machine learning for financial services,”Machine Learning: Science and Technology, vol. 7, p. 021002, 2026

  30. [31]

    Quantum machine learning in feature hilbert spaces,

    M. Schuld and N. Killoran, “Quantum machine learning in feature hilbert spaces,” Physical Review Letters, vol. 122, no. 4, p. 040504, 2019

  31. [32]

    Qfdnn: A resource-efficient variational quantum feature deep neural networks for fraud detection and loan prediction,

    S. Das, A. Meghanath, B. K. Behera, S. Mumtaz, S. Al-Kuwari, and A. Farouk, “Qfdnn: A resource-efficient variational quantum feature deep neural networks for fraud detection and loan prediction,”IEEE Transactions on Computational Social Systems, pp. 1–12, 2025

  32. [33]

    Unsupervised quantum anomaly detection on noisy quantum processors,

    D. Pranjić, F. Knäble, P. Kunst, D. Kutzias, D. Klau, C. Tutschku, L. Simon, M. Kraus, and A. Abedi, “Unsupervised quantum anomaly detection on noisy quantum processors,”arXiv:2411.16970, 2024

  33. [34]

    Financial fraud detection using quantum graph neural networks,

    N. Innan, A. Sawaika, A. Dhor, S. Dutta, S. Thota, H. Gokal, N. Patel, M. A.-Z. Khan, I. Theodonis, and M. Bennai, “Financial fraud detection using quantum graph neural networks,”Quantum Machine Intelligence, vol. 6, no. 1, 2024

  34. [35]

    Guided quantum compression for high dimensional data classification,

    V. Belis, P. Odagiu, M. Grossi, F. Reiter, G. Dissertori, and S. Vallecorsa, “Guided quantum compression for high dimensional data classification,”Machine Learning: Science and Technology, vol. 5, no. 3, p. 035010, 2024. MoE Framework for Hybrid-Quantum Models in Fraud Detection 21

  35. [36]

    Quantum multiple kernel learning in financial classification tasks,

    S. Miyabe, B. Quanz, N. Shimada, A. Mitra, T. Yamamoto, V. Rastunkov, D. Alevras, M. Metcalf, D. J. M. King, M. Mamouei, M. D. Jackson, M. Brown, P. In- tallura, and J.-E. Park, “Quantum multiple kernel learning in financial classification tasks,”arXiv: 2312.00260, 2023

  36. [37]

    Metaheuristic optimization scheme for quantum kernel classifiers using entanglement-directed graphs,

    Y. Tjandra and H. S. Sugiarto, “Metaheuristic optimization scheme for quantum kernel classifiers using entanglement-directed graphs,”ETRI Journal, vol. 46, no. 5, pp. 793–805, 2024

  37. [38]

    Umap: Uniform manifold approximation and projection,

    L. McInnes, J. Healy, N. Saul, and L. Großberger, “Umap: Uniform manifold approximation and projection,”Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018

  38. [39]

    Random forests,

    L. Breiman, “Random forests,”Machine Learning, vol. 45, no. 1, pp. 5–32, 2001

  39. [40]

    Relations between two sets of variates,

    H. Hotelling, “Relations between two sets of variates,”Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936, full publication date: Dec., 1936

  40. [41]

    Liii. on lines and planes of closest fit to systems of points in space,

    K. Pearson, “Liii. on lines and planes of closest fit to systems of points in space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, no. 11, pp. 559–572, 1901

  41. [42]

    Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization,

    H. Huang, Y. Wang, C. Rudin, and E. P. Browne, “Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization,” Communications Biology, vol. 5, no. 1, p. 719, 2022

  42. [43]

    Applications and comparison of dimensionality reduction methods for microbiome data,

    G. Armstrong, G. Rahman, C. Martino, D. McDonald, A. Gonzalez, G. Mishne, and R. Knight, “Applications and comparison of dimensionality reduction methods for microbiome data,”Frontiers in Bioinformatics, vol. 2, 2022

  43. [44]

    Analysis and comparison of feature selec- tion methods towards performance and stability,

    M. C. Barbieri, B. I. Grisci, and M. Dorn, “Analysis and comparison of feature selec- tion methods towards performance and stability,”Expert Systems with Applications, vol. 249, p. 123667, 2024

  44. [45]

    Reformulation of the no-free-lunch theorem for entangled datasets,

    K. Sharma, M. Cerezo, Z. Holmes, L. Cincio, A. Sornborger, and P. J. Coles, “Reformulation of the no-free-lunch theorem for entangled datasets,”Phys. Rev. Lett., vol. 128, p. 070501, 2022

  45. [46]

    Reducing the dimensionality of data with neural networks,

    G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,”Science, vol. 313, no. 5786, pp. 504–507, 2006

  46. [47]

    Transfer learning in hybrid classical-quantum neural networks,

    A. Mari, T. R. Bromley, J. Izaac, M. Schuld, and N. Killoran, “Transfer learning in hybrid classical-quantum neural networks,”Quantum, vol. 4, p. 340, 2020

  47. [48]

    Quantum kitchen sinks: An algorithm for machine learning on near-term quantum computers,

    C. M. Wilson, J. S. Otterbach, N. Tezak, R. S. Smith, A. M. Polloreno, P. J. Karalekas, S. Heidel, M. S. Alam, G. E. Crooks, and M. P. da Silva, “Quantum kitchen sinks: An algorithm for machine learning on near-term quantum computers,” arXiv:1806.08321, 2019

  48. [49]

    Supervised learning with quantum-enhanced feature spaces,

    V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta, “Supervised learning with quantum-enhanced feature spaces,” Nature, vol. 567, no. 7747, pp. 209–212, 2019

  49. [50]

    Robust data encodings for quantum classifiers,

    R. LaRose and B. Coyle, “Robust data encodings for quantum classifiers,”Phys. Rev. A, vol. 102, p. 032420, 2020

  50. [51]

    Schuld and F

    M. Schuld and F. Petruccione,Supervised Learning with Quantum Computers, 2nd ed. Springer, 2018

  51. [52]

    Determining the proton content with a quantum computer,

    A. Pérez-Salinas, J. Cruz-Martinez, A. A. Alhajri, and S. Carrazza, “Determining the proton content with a quantum computer,”Phys. Rev. D, vol. 103, p. 034027, 2021

  52. [53]

    Parameterized quan- tumcircuitsasuniversalgenerativemodelsforcontinuousmultivariatedistributions,

    A. Barthe, M. Grossi, S. Vallecorsa, J. Tura, and V. Dunjko, “Parameterized quan- tumcircuitsasuniversalgenerativemodelsforcontinuousmultivariatedistributions,” npj Quantum Information, vol. 11, no. 1, p. 121, 2025. 22 R. Chaves et al

  53. [54]

    Cost function dependent barren plateaus in shallow parametrized quantum circuits,

    M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, “Cost function dependent barren plateaus in shallow parametrized quantum circuits,”Nature Communications, vol. 12, no. 1, p. 1791, 2021

  54. [55]

    Barren plateaus in variational quantum computing,

    M. Larocca, S. Thanasilp, S. Wang, K. Sharma, J. Biamonte, P. J. Coles, L. Cincio, J. R. McClean, Z. Holmes, and M. Cerezo, “Barren plateaus in variational quantum computing,”Nature Reviews Physics, vol. 7, no. 4, pp. 174–189, 2025

  55. [56]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017

  56. [57]

    BERT: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Com...

  57. [58]

    A neural architecture for multi-label text classification,

    S. Coope, Y. Bachrach, A. Žukov-Gregorič, J. Rodriguez, B. Maksak, C. McMurtie, and M. Bordbar, “A neural architecture for multi-label text classification,” in Intelligent Systems and Applications, K. Arai, S. Kapoor, and R. Bhatia, Eds. Cham: Springer International Publishing, 2019, pp. 676–691

  58. [59]

    Statistical pattern recognition: a review,

    A. K. Jain, R. P. W. Duin, and J. Mao, “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4–37, 2000

  59. [60]

    Neural networks for classification: a survey,

    G. P. Zhang, “Neural networks for classification: a survey,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 30, no. 4, pp. 451–462, 2000

  60. [61]

    Bioinformatics with soft computing,

    S. Mitra and Y. Hayashi, “Bioinformatics with soft computing,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 36, no. 5, pp. 616–635, 2006

  61. [62]

    Learning representations by back-propagating errors,

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,”Nature, vol. 323, no. 6088, pp. 533–536, 1986

  62. [63]

    Approximation capabilities of multilayer feedforward networks,

    K. Hornik, “Approximation capabilities of multilayer feedforward networks,”Neural Networks, vol. 4, no. 2, pp. 251–257, 1991

  63. [64]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the 34th International Conference on Machine Learning – Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 1321–1330

  64. [65]

    Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,

    J. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,”Advances in Large Margin Classification, vol. 10, 2000

  65. [66]

    Zhou,Ensemble methods: foundations and algorithms

    Z.-H. Zhou,Ensemble methods: foundations and algorithms. CRC Press, 2025

  66. [67]

    Efron,Bootstrap Methods: Another Look at the Jackknife

    B. Efron,Bootstrap Methods: Another Look at the Jackknife. New York, NY: Springer New York, 1992, pp. 569–593

  67. [68]

    The strength of weak learnability,

    R. E. Schapire, “The strength of weak learnability,”Machine Learning, vol. 5, no. 2, pp. 197–227, 1990

  68. [69]

    Boosting algorithms as gradient descent,

    L. Mason, J. Baxter, P. Bartlett, and M. Frean, “Boosting algorithms as gradient descent,” inAdvances in Neural Information Processing Systems, S. Solla, T. Leen, and K. Müller, Eds., vol. 12. MIT Press, 1999

  69. [70]

    Stacked density estimation,

    P. Smyth and D. Wolpert, “Stacked density estimation,” inAdvances in Neural Information Processing Systems, M. Jordan, M. Kearns, and S. Solla, Eds., vol. 10. MIT Press, 1997. MoE Framework for Hybrid-Quantum Models in Fraud Detection 23

  70. [71]

    Xgboost: A scalable tree boosting system,

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2016, pp. 785–794

  71. [72]

    CatBoost: unbiased boosting with categorical features

    L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” inAdvances in Neural Information Processing Systems 31 (NeurIPS 2018), 2018, pp. 6638–6648, arXiv:1706.09516

  72. [73]

    Recio-Armengol, S

    E. Recio-Armengol, S. Ahmed, and J. Bowles, “Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits,” arXiv:2503.02934, 2025

  73. [74]

    Mixture of experts: a literature survey,

    S. Masoudnia and R. Ebrahimpour, “Mixture of experts: a literature survey,” Artificial Intelligence Review, vol. 42, no. 2, pp. 275–293, 2014

  74. [75]

    A survey on mixture of experts in large language models,

    W. Cai, J. Jiang, F. Wang, J. Tang, S. Kim, and J. Huang, “A survey on mixture of experts in large language models,”IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 7, pp. 3896–3915, 2025

  75. [76]

    Adaptive mixtures of local experts,

    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural Computation, vol. 3, no. 1, pp. 79–87, 1991

  76. [77]

    Hierarchical mixtures of experts and the EM algorithm,

    M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,”Neural Computation, vol. 6, no. 2, pp. 181–214, 1994

  77. [78]

    Error correlation and error reduction in ensemble classi- fiers,

    K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classi- fiers,”Connection Science, vol. 8, no. 3–4, pp. 385–404, 1996

  78. [79]

    Scaling vision with sparse mixture of experts,

    C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, A. S. Pinto, D. Keysers, and N. Houlsby, “Scaling vision with sparse mixture of experts,” in Proceedings of the 35th International Conference on Neural Information Processing Systems, ser. NIPS ’21. Red Hook, NY, USA: Curran Associates Inc., 2021

  79. [80]

    Raphael: text-to- image generation via large mixture of diffusion paths,

    Z. Xue, G. Song, Q. Guo, B. Liu, Z. Zong, Y. Liu, and P. Luo, “Raphael: text-to- image generation via large mixture of diffusion paths,” inProceedings of the 37th International Conference on Neural Information Processing Systems, ser. NIPS ’23. Red Hook, NY, USA: Curran Associates Inc., 2023

  80. [81]

    MEGAN: Mixture of Experts of Generative Adversarial Networks for Multimodal Image Generation

    D. K. Park, S. Yoo, H. Bahng, J. Choo, and N. Park, “Megan: Mixture of experts of generativeadversarialnetworksformultimodalimagegeneration,”arXiv:1805.02481, 2018

Showing first 80 references.