pith. machine review for the scientific record. sign in

arxiv: 2604.06541 · v1 · submitted 2026-04-08 · ✦ hep-ph · cs.LG· quant-ph

Recognition: 2 theorem links

· Lean Theorem

Quantum-Inspired Tensor Network Autoencoders for Anomaly Detection: A MERA-Based Approach

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:34 UTC · model grok-4.3

classification ✦ hep-ph cs.LGquant-ph
keywords tensor network autoencodersMERA architectureanomaly detectioncollider jetsmultiscale compressionquantum-inspired methodshigh-energy physicsbackground-only training
0
0 comments X

The pith

A MERA-inspired tensor network autoencoder improves jet anomaly detection by matching their multiscale branching structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether a tensor-network autoencoder modeled on the multiscale entanglement renormalization ansatz can exploit the hierarchical, locality-preserving structure of collider jets for reconstruction-based anomaly detection. Jets arise from branching cascades, so the architecture first applies disentangling layers to short-range correlations among ordered constituents before performing hierarchical compression. Benchmarks against dense autoencoders and tree-tensor networks, together with a local-compressibility diagnostic and disentangler ablation, show that the full MERA structure outperforms simpler alternatives precisely when the compression bottleneck is tightest. The method trains exclusively on background events and treats reconstruction error as the anomaly score, without modeling any signal. This supports the view that embedding the physics of QCD cascades as an inductive bias can flag potential deviations from the standard model.

Core claim

The authors establish that the locality-preserving multiscale structure of a MERA-inspired autoencoder is well matched to jet data, and that its disentangling layers contribute most when the information bottleneck is strongest. This follows from direct comparisons to dense autoencoders and tree-tensor-network limits within a background-only reconstruction framework, reinforced by a training-free compressibility diagnostic and an identity-disentangler ablation.

What carries the argument

MERA-inspired autoencoder: a tensor network that applies unitary disentanglers to reorganize short-range correlations in ordered jet constituents before coarse-graining them with isometries.

Load-bearing premise

Reconstruction error after training on background-only jets serves as a reliable anomaly score without labeled signal examples or explicit signal modeling.

What would settle it

Finding that a dense autoencoder achieves lower background reconstruction error and better signal-background separation than the MERA version at strong compression would falsify the claimed advantage of the multiscale structure.

read the original abstract

We investigate whether a multiscale tensor-network architecture can provide a useful inductive bias for reconstruction-based anomaly detection in collider jets. Jets are produced by a branching cascade, so their internal structure is naturally organised across angular and momentum scales. This motivates an autoencoder that compresses information hierarchically and can reorganise short-range correlations before coarse-graining. Guided by this picture, we formulate a MERA-inspired autoencoder acting directly on ordered jet constituents. To the best of our knowledge, a MERA-inspired autoencoder has not previously been proposed, and this architecture has not been explored in collider anomaly detection. We compare this architecture to a dense autoencoder, the corresponding tree-tensor-network limit, and standard classical baselines within a common background-only reconstruction framework. The paper is organised around two main questions: whether locality-aware hierarchical compression is genuinely supported by the data, and whether the disentangling layers of MERA contribute beyond a simpler tree hierarchy. To address these questions, we combine benchmark comparisons with a training-free local-compressibility diagnostic and a direct identity-disentangler ablation. The resulting picture is that the locality-preserving multiscale structure is well matched to jet data, and that the MERA disentanglers become beneficial precisely when the compression bottleneck is strongest. Overall, the study supports locality-aware hierarchical compression as a useful inductive bias for jet anomaly detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a MERA-inspired tensor-network autoencoder for reconstruction-based anomaly detection on collider jets. It posits that the multiscale, locality-preserving structure of MERA provides a suitable inductive bias for the hierarchical branching structure of jets. The work compares this architecture against dense autoencoders, the corresponding tree-tensor-network limit, and classical baselines in a background-only training setting. Evidence is drawn from benchmark performance comparisons, a training-free local-compressibility diagnostic, and an identity-disentangler ablation. The central conclusions are that the hierarchical compression matches jet data and that the disentangling layers become beneficial precisely when the compression bottleneck is strongest.

Significance. If the quantitative results hold, the manuscript introduces a new class of locality-aware hierarchical inductive biases into jet anomaly detection, potentially improving reconstruction-based scores in background-only regimes. The training-free local-compressibility diagnostic and the direct disentangler ablation are constructive elements that help separate architectural contributions from training artifacts. These features could be reusable in other multiscale HEP datasets. The overall significance is moderate because the claims rest on direct empirical comparisons rather than parameter-free derivations or machine-checked proofs.

major comments (1)
  1. The claim that MERA disentanglers become beneficial 'precisely when the compression bottleneck is strongest' is load-bearing for the second main question. The ablation must demonstrate that bottleneck strength (latent dimension or number of coarse-graining layers) was varied while holding bond dimension, total depth, and training protocol fixed, and that the performance gap versus the tree-TN baseline grows monotonically. If multiple architectural factors were changed simultaneously or only isolated operating points are shown, the qualifier 'precisely when' is not secured by the reported experiments.
minor comments (2)
  1. Ensure all benchmark tables report error bars, data-split details, and the exact definition of the anomaly score (reconstruction error on which observables).
  2. The local-compressibility diagnostic is described as training-free and independent; a short appendix deriving its explicit formula and confirming its independence from the fitted model parameters would strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our work. We are pleased that the referee recognizes the potential of the MERA-based approach for providing a locality-aware hierarchical inductive bias in jet anomaly detection. We address the major comment in detail below, and we will incorporate revisions to strengthen the manuscript accordingly.

read point-by-point responses
  1. Referee: The claim that MERA disentanglers become beneficial 'precisely when the compression bottleneck is strongest' is load-bearing for the second main question. The ablation must demonstrate that bottleneck strength (latent dimension or number of coarse-graining layers) was varied while holding bond dimension, total depth, and training protocol fixed, and that the performance gap versus the tree-TN baseline grows monotonically. If multiple architectural factors were changed simultaneously or only isolated operating points are shown, the qualifier 'precisely when' is not secured by the reported experiments.

    Authors: We appreciate the referee's emphasis on rigorously supporting the 'precisely when' claim. In the original experiments, the ablation was performed by varying the latent dimension (which controls the bottleneck strength) while keeping the bond dimension, total network depth, and training protocol fixed. The results show that the performance gap between the MERA autoencoder and the tree-TN baseline widens as the latent dimension is reduced. To further secure the monotonic aspect and address the concern about isolated points, we will include an expanded figure in the revised manuscript that plots the anomaly detection performance gap explicitly as a function of bottleneck strength across a range of latent dimensions. This will demonstrate the trend more clearly without altering other architectural factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent empirical comparisons and diagnostics

full rationale

The paper advances an empirical architecture proposal and validates it through direct benchmark comparisons (MERA autoencoder vs. dense AE, tree-TN limit, and classical baselines) within a background-only reconstruction framework. It further employs a training-free local-compressibility diagnostic and an identity-disentangler ablation. None of these reduce by construction to fitted parameters, self-definitions, or self-citation chains; the local diagnostic is explicitly independent of the trained model. The claim that disentanglers help precisely when the bottleneck is strongest is presented as an outcome of systematic variation rather than a definitional or statistical tautology. The reconstruction-error anomaly score follows standard practice and does not create circularity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard autoencoder assumptions plus domain knowledge about jet branching; no new physical entities are postulated.

free parameters (1)
  • MERA bond dimension and layer count
    Hyperparameters controlling network capacity and depth, chosen to achieve the desired compression bottleneck.
axioms (2)
  • domain assumption Jet constituents can be ordered such that short-range angular and momentum correlations dominate before coarse-graining
    Invoked to justify the locality-preserving hierarchical compression.
  • domain assumption Reconstruction error on background-only training data is a valid proxy for anomaly scoring
    Standard assumption in unsupervised anomaly detection frameworks.

pith-pipeline@v0.9.0 · 5549 in / 1467 out tokens · 88265 ms · 2026-05-10T18:34:38.245830+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Belis, P

    V. Belis, P. Odagiu, and T. K. Aarrestad, “Machine learning for anomaly detection in particle physics,” Rev. Phys.12(2024) 100091, [arXiv:2312.14190 [physics.data-an]]

  2. [2]

    Kasieczkaet al., Rept

    G. Kasieczkaet al., “The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics,” Rept. Prog. Phys.84no. 12, (2021) 124201, [arXiv:2101.08320 [hep-ph]]

  3. [3]

    Challenges for unsupervised anomaly detection in particle physics,

    K. Fraser, S. Homiller, R. K. Mishra, B. Ostdiek, and M. D. Schwartz, “Challenges for unsupervised anomaly detection in particle physics,” JHEP03(2022) 066, [arXiv:2110.06948 [hep-ph]]

  4. [4]

    QCD or What?,

    T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson, “QCD or What?,” SciPost Phys.6 no. 3, (2019) 030, [arXiv:1808.08979 [hep-ph]]

  5. [5]

    Searching for New Physics with Deep Autoencoders,

    M. Farina, Y. Nakai, and D. Shih, “Searching for New Physics with Deep Autoencoders,” Phys. Rev. D101no. 7, (2020) 075021, [arXiv:1808.08992 [hep-ph]]

  6. [6]

    Comparing weak- and unsupervised methods for resonant anomaly detection,

    J. H. Collins, P. Martin-Ramiro, B. Nachman, and D. Shih, “Comparing weak- and unsupervised methods for resonant anomaly detection,” Eur. Phys. J. C81(2021) 617

  7. [7]

    Unsupervised hadronic SUEP at the LHC,

    J. Barron, D. Curtin, G. Kasieczka, T. Plehn, and A. Spourdalakis, “Unsupervised hadronic SUEP at the LHC,” JHEP12(2021) 129. – 24 –

  8. [8]

    Adversarially-trained autoencoders for robust unsupervised new physics searches,

    A. Blance, M. Spannowsky, and P. Waite, “Adversarially-trained autoencoders for robust unsupervised new physics searches,” JHEP10(2019) 047

  9. [9]

    A normalized autoencoder for LHC triggers,

    B. M. Dillon, L. Favaro, T. Plehn, P. Sorrenson, and M. Kr¨ amer, “A normalized autoencoder for LHC triggers,” SciPost Phys. Core6(2023) 074

  10. [10]

    IRC-Safe Graph Autoencoder for Unsupervised Anomaly Detection,

    O. Atkinson, A. Bhardwaj, C. Englert, P. Konar, V. S. Ngairangbam, and M. Spannowsky, “IRC-Safe Graph Autoencoder for Unsupervised Anomaly Detection,” Front. Artif. Intell.5 (2022) 943135

  11. [11]

    Autoencoders for unsupervised anomaly detection in high energy physics,

    T. Finke, M. Kr¨ amer, A. Morandini, A. M¨ uck, and I. Oleksiyuk, “Autoencoders for unsupervised anomaly detection in high energy physics,” JHEP06(2021) 161, [arXiv:2104.09051 [hep-ph]]

  12. [12]

    Tree-based algorithms for weakly supervised anomaly detection,

    T. Finke, M. Hein, G. Kasieczka, M. Kr¨ amer, A. M¨ uck, P. Prangchaikul, T. Quadfasel, D. Shih, and M. Sommerhalder, “Tree-based algorithms for weakly supervised anomaly detection,” Phys. Rev. D109(2024) 034033

  13. [13]

    Jet substructure at the Large Hadron Collider: a review of recent advances in theory and machine learning,

    A. J. Larkoski, I. Moult, and B. Nachman, “Jet substructure at the Large Hadron Collider: a review of recent advances in theory and machine learning,” Phys. Rept.841(2020) 1–63, [arXiv:1709.04464 [hep-ph]]

  14. [14]

    Entanglement renormalization,

    G. Vidal, “Entanglement renormalization,” Phys. Rev. Lett.99no. 22, (2007) 220405

  15. [15]

    Class of quantum many-body states that can be efficiently simulated,

    G. Vidal, “Class of quantum many-body states that can be efficiently simulated,” Phys. Rev. Lett.101no. 11, (2008) 110501

  16. [16]

    Miles and Schwab, David J

    E. M. Stoudenmire and D. J. Schwab, “Supervised Learning with Tensor Networks,” Adv. Neural Inf. Process. Syst.29(2016) 4799–4807, [arXiv:1605.05775 [cs.LG]]

  17. [17]

    Quantum-inspired event reconstruction with Tensor Networks: Matrix Product States,

    J. Y. Araz and M. Spannowsky, “Quantum-inspired event reconstruction with Tensor Networks: Matrix Product States,” JHEP08(2021) 112, [arXiv:2106.08334 [hep-ph]]

  18. [18]

    Classical versus quantum: Comparing tensor-network-based quantum circuits on Large Hadron Collider data,

    J. Y. Araz and M. Spannowsky, “Classical versus quantum: Comparing tensor-network-based quantum circuits on Large Hadron Collider data,” Phys. Rev. A106 no. 6, (2022) 062423, [arXiv:2202.10471 [quant-ph]]

  19. [19]

    Quantum-probabilistic Hamiltonian learning for generative modeling and anomaly detection,

    J. Y. Araz and M. Spannowsky, “Quantum-probabilistic Hamiltonian learning for generative modeling and anomaly detection,” Phys. Rev. A108no. 6, (2023) 062422, [arXiv:2211.03803 [quant-ph]]

  20. [20]

    Anomaly detection in high-energy physics using a quantum autoencoder,

    V. S. Ngairangbam, M. Spannowsky, and M. Takeuchi, “Anomaly detection in high-energy physics using a quantum autoencoder,” Phys. Rev. D105no. 9, (2022) 095004

  21. [21]

    Tensor Network for Anomaly Detection in the Latent Space of Proton Collision Events at the LHC,

    E. Puljak, M. Pierini, and A. Garcia-Saez, “Tensor Network for Anomaly Detection in the Latent Space of Proton Collision Events at the LHC,” Mach. Learn. Sci. Technol.6no. 4, (2025) 045001, [arXiv:2506.00102 [stat.ML]]

  22. [22]

    Butteret al., The Machine Learning landscape of top taggers, SciPost Phys.7, 014 (2019), arXiv:1902.09914 [hep-ph]

    G. Kasieczkaet al., “The Machine Learning landscape of top taggers,” SciPost Phys.7no. 1, (2019) 014, [arXiv:1902.09914 [hep-ph]]

  23. [23]

    Orus, A Practical Introduction to Tensor Networks: Ma- trix Product States and Projected Entangled Pair States

    R. Or´ us, “A practical introduction to tensor networks: Matrix product states and projected entangled pair states,” Annals Phys.349(2014) 117–158, [arXiv:1306.2164 [cond-mat.str-el]]

  24. [24]

    Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives,

    A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. Oseledets, M. Sugiyama, and D. P. Mandic, “Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives,” Found. Trends Mach. Learn.9no. 4-5, (2017) 431–673. – 25 –

  25. [25]

    Learning relevant features of data with multi-scale tensor networks,

    E. M. Stoudenmire, “Learning relevant features of data with multi-scale tensor networks,” Quantum Sci. Technol.3no. 3, (2018) 034003

  26. [26]

    Multi-scale tensor network architecture for machine learning,

    J. A. Reyes and E. M. Stoudenmire, “Multi-scale tensor network architecture for machine learning,” Mach. Learn. Sci. Technol.2no. 3, (2021) 035036

  27. [27]

    Machine learning by unitary tensor network of hierarchical tree structure,

    D. Liu, S.-J. Ran, P. Wittek, C. Peng, R. Bl´ azquez Garc´ ıa, G. Su, and M. Lewenstein, “Machine learning by unitary tensor network of hierarchical tree structure,” New J. Phys.21 no. 7, (2019) 073059, [arXiv:1710.04833 [stat.ML]]

  28. [28]

    Unsupervised Generative Modeling Using Matrix Product States,

    Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, “Unsupervised Generative Modeling Using Matrix Product States,” Phys. Rev. X8no. 3, (2018) 031012

  29. [29]

    Tree tensor networks for generative modeling,

    S. Cheng, L. Wang, T. Xiang, and P. Zhang, “Tree tensor networks for generative modeling,” Phys. Rev. B99no. 15, (2019) 155131, [arXiv:1901.02217 [cond-mat.str-el]]

  30. [30]

    The Geometry of Algorithms with Orthogonality Constraints,

    A. Edelman, T. A. Arias, and S. T. Smith, “The Geometry of Algorithms with Orthogonality Constraints,” SIAM J. Matrix Anal. Appl.20no. 2, (1998) 303–353

  31. [31]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, 2008

  32. [32]

    A Multilinear Singular Value Decomposition,

    L. De Lathauwer, B. De Moor, and J. Vandewalle, “A Multilinear Singular Value Decomposition,” SIAM J. Matrix Anal. Appl.21no. 4, (2000) 1253–1278

  33. [33]

    A Generalized Solution of the Orthogonal Procrustes Problem,

    P. H. Sch¨ onemann, “A Generalized Solution of the Orthogonal Procrustes Problem,” Psychometrika31no. 1, (1966) 1–10

  34. [34]

    MERACLE: Constructive Layer-Wise Conversion of a Tensor Train into a MERA,

    K. Batselier, A. Cichocki, and N. Wong, “MERACLE: Constructive Layer-Wise Conversion of a Tensor Train into a MERA,” Commun. Appl. Math. Comput.3no. 2, (2021) 257–279

  35. [35]

    Kasieczka, T

    G. Kasieczka, T. Plehn, J. Thompson, and M. Russel, “Top Quark Tagging Reference Dataset,” 3, 2019.https://doi.org/10.5281/zenodo.2603256. Version v0 (2018 03 27)

  36. [36]

    An Introduction to PYTHIA 8.2

    T. Sj¨ ostrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen, and P. Z. Skands, “An Introduction to PYTHIA 8.2” Comput. Phys. Commun.191(2015) 159–177, [arXiv:1410.3012 [hep-ph]]. [37]DELPHES 3Collaboration, J. de Favereauet al., “DELPHES 3, A modular framework for fast simulation of a generic collider e...

  37. [37]

    The anti-k_t jet clustering algorithm

    M. Cacciari, G. P. Salam, and G. Soyez, “The anti-k t jet clustering algorithm,” JHEP04 (2008) 063, [arXiv:0802.1189 [hep-ph]]

  38. [38]

    Reducing the Dimensionality of Data with Neural Networks,

    G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science313no. 5786, (2006) 504–507

  39. [39]

    LIII. On lines and planes of closest fit to systems of points in space,

    K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” Philos. Mag.2no. 11, (1901) 559–572

  40. [40]

    Analysis of a complex of statistical variables into principal components,

    H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol.24no. 6, (1933) 417–441

  41. [41]

    On the generalised distance in statistics,

    P. C. Mahalanobis, “On the generalised distance in statistics,” Proc. Natl. Inst. Sci. India2 (1936) 49–55

  42. [42]

    Isolation Forest,

    F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. 2008. – 26 –