arxiv: 2604.06541 · v1 · submitted 2026-04-08 · ✦ hep-ph · cs.LG· quant-ph

Recognition: 2 theorem links

· Lean Theorem

Quantum-Inspired Tensor Network Autoencoders for Anomaly Detection: A MERA-Based Approach

Emre Gurkanli , Michael Spannowsky

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:34 UTC · model grok-4.3

classification ✦ hep-ph cs.LGquant-ph

keywords tensor network autoencodersMERA architectureanomaly detectioncollider jetsmultiscale compressionquantum-inspired methodshigh-energy physicsbackground-only training

0 comments

The pith

A MERA-inspired tensor network autoencoder improves jet anomaly detection by matching their multiscale branching structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether a tensor-network autoencoder modeled on the multiscale entanglement renormalization ansatz can exploit the hierarchical, locality-preserving structure of collider jets for reconstruction-based anomaly detection. Jets arise from branching cascades, so the architecture first applies disentangling layers to short-range correlations among ordered constituents before performing hierarchical compression. Benchmarks against dense autoencoders and tree-tensor networks, together with a local-compressibility diagnostic and disentangler ablation, show that the full MERA structure outperforms simpler alternatives precisely when the compression bottleneck is tightest. The method trains exclusively on background events and treats reconstruction error as the anomaly score, without modeling any signal. This supports the view that embedding the physics of QCD cascades as an inductive bias can flag potential deviations from the standard model.

Core claim

The authors establish that the locality-preserving multiscale structure of a MERA-inspired autoencoder is well matched to jet data, and that its disentangling layers contribute most when the information bottleneck is strongest. This follows from direct comparisons to dense autoencoders and tree-tensor-network limits within a background-only reconstruction framework, reinforced by a training-free compressibility diagnostic and an identity-disentangler ablation.

What carries the argument

MERA-inspired autoencoder: a tensor network that applies unitary disentanglers to reorganize short-range correlations in ordered jet constituents before coarse-graining them with isometries.

Load-bearing premise

Reconstruction error after training on background-only jets serves as a reliable anomaly score without labeled signal examples or explicit signal modeling.

What would settle it

Finding that a dense autoencoder achieves lower background reconstruction error and better signal-background separation than the MERA version at strong compression would falsify the claimed advantage of the multiscale structure.

read the original abstract

We investigate whether a multiscale tensor-network architecture can provide a useful inductive bias for reconstruction-based anomaly detection in collider jets. Jets are produced by a branching cascade, so their internal structure is naturally organised across angular and momentum scales. This motivates an autoencoder that compresses information hierarchically and can reorganise short-range correlations before coarse-graining. Guided by this picture, we formulate a MERA-inspired autoencoder acting directly on ordered jet constituents. To the best of our knowledge, a MERA-inspired autoencoder has not previously been proposed, and this architecture has not been explored in collider anomaly detection. We compare this architecture to a dense autoencoder, the corresponding tree-tensor-network limit, and standard classical baselines within a common background-only reconstruction framework. The paper is organised around two main questions: whether locality-aware hierarchical compression is genuinely supported by the data, and whether the disentangling layers of MERA contribute beyond a simpler tree hierarchy. To address these questions, we combine benchmark comparisons with a training-free local-compressibility diagnostic and a direct identity-disentangler ablation. The resulting picture is that the locality-preserving multiscale structure is well matched to jet data, and that the MERA disentanglers become beneficial precisely when the compression bottleneck is strongest. Overall, the study supports locality-aware hierarchical compression as a useful inductive bias for jet anomaly detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MERA-inspired autoencoder brings a plausible hierarchical bias to jet anomaly detection, but the claim that disentanglers help precisely under strong bottlenecks needs isolated controls to hold up.

read the letter

The main takeaway is that this paper applies a MERA tensor network autoencoder directly to ordered jet constituents for background-only reconstruction anomaly detection. It argues the multiscale locality-preserving structure fits jet data and that the disentangling layers add value exactly when compression is tightest. The work is new in this specific combination: MERA disentanglers on jets, an identity-disentangler ablation, and a training-free local-compressibility diagnostic that does not depend on the trained model. That diagnostic is a useful addition because it lowers circularity risk compared to purely post-training metrics. The comparisons to dense autoencoders, tree-tensor-network limits, and classical baselines are relevant and stay within the standard reconstruction framework used in collider searches. The motivation from jet branching cascades is straightforward and the architecture choice follows logically from wanting to reorganize short-range correlations before coarse-graining. These elements give the paper a clear inductive-bias angle that is worth considering for people already working on hierarchical models in high-energy physics. The soft spot is the central claim about disentanglers helping precisely when the bottleneck is strongest. The stress-test note is on target here: to secure that qualifier, the experiments need to vary only the bottleneck strength (latent dimension or number of layers) while fixing bond dimension, depth, and training details, then show the performance gap versus the tree limit grows monotonically. If multiple factors shift together or only a few operating points are shown, the interpretation risks selection effects. The usual reconstruction-error assumption for anomaly scoring is also present and remains untested against explicit signal modeling. This paper is aimed at the collider phenomenology and ML-for-physics community, especially readers already exploring tensor networks or multiscale architectures for jets. Someone looking for new inductive biases in anomaly detection would find the architecture and diagnostic concrete enough to build on. I would send it to peer review. The controls are better than average for this area and the idea is grounded enough that referees can usefully tighten the ablation evidence.

Referee Report

1 major / 2 minor

Summary. The paper proposes a MERA-inspired tensor-network autoencoder for reconstruction-based anomaly detection on collider jets. It posits that the multiscale, locality-preserving structure of MERA provides a suitable inductive bias for the hierarchical branching structure of jets. The work compares this architecture against dense autoencoders, the corresponding tree-tensor-network limit, and classical baselines in a background-only training setting. Evidence is drawn from benchmark performance comparisons, a training-free local-compressibility diagnostic, and an identity-disentangler ablation. The central conclusions are that the hierarchical compression matches jet data and that the disentangling layers become beneficial precisely when the compression bottleneck is strongest.

Significance. If the quantitative results hold, the manuscript introduces a new class of locality-aware hierarchical inductive biases into jet anomaly detection, potentially improving reconstruction-based scores in background-only regimes. The training-free local-compressibility diagnostic and the direct disentangler ablation are constructive elements that help separate architectural contributions from training artifacts. These features could be reusable in other multiscale HEP datasets. The overall significance is moderate because the claims rest on direct empirical comparisons rather than parameter-free derivations or machine-checked proofs.

major comments (1)

The claim that MERA disentanglers become beneficial 'precisely when the compression bottleneck is strongest' is load-bearing for the second main question. The ablation must demonstrate that bottleneck strength (latent dimension or number of coarse-graining layers) was varied while holding bond dimension, total depth, and training protocol fixed, and that the performance gap versus the tree-TN baseline grows monotonically. If multiple architectural factors were changed simultaneously or only isolated operating points are shown, the qualifier 'precisely when' is not secured by the reported experiments.

minor comments (2)

Ensure all benchmark tables report error bars, data-split details, and the exact definition of the anomaly score (reconstruction error on which observables).
The local-compressibility diagnostic is described as training-free and independent; a short appendix deriving its explicit formula and confirming its independence from the fitted model parameters would strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our work. We are pleased that the referee recognizes the potential of the MERA-based approach for providing a locality-aware hierarchical inductive bias in jet anomaly detection. We address the major comment in detail below, and we will incorporate revisions to strengthen the manuscript accordingly.

read point-by-point responses

Referee: The claim that MERA disentanglers become beneficial 'precisely when the compression bottleneck is strongest' is load-bearing for the second main question. The ablation must demonstrate that bottleneck strength (latent dimension or number of coarse-graining layers) was varied while holding bond dimension, total depth, and training protocol fixed, and that the performance gap versus the tree-TN baseline grows monotonically. If multiple architectural factors were changed simultaneously or only isolated operating points are shown, the qualifier 'precisely when' is not secured by the reported experiments.

Authors: We appreciate the referee's emphasis on rigorously supporting the 'precisely when' claim. In the original experiments, the ablation was performed by varying the latent dimension (which controls the bottleneck strength) while keeping the bond dimension, total network depth, and training protocol fixed. The results show that the performance gap between the MERA autoencoder and the tree-TN baseline widens as the latent dimension is reduced. To further secure the monotonic aspect and address the concern about isolated points, we will include an expanded figure in the revised manuscript that plots the anomaly detection performance gap explicitly as a function of bottleneck strength across a range of latent dimensions. This will demonstrate the trend more clearly without altering other architectural factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent empirical comparisons and diagnostics

full rationale

The paper advances an empirical architecture proposal and validates it through direct benchmark comparisons (MERA autoencoder vs. dense AE, tree-TN limit, and classical baselines) within a background-only reconstruction framework. It further employs a training-free local-compressibility diagnostic and an identity-disentangler ablation. None of these reduce by construction to fitted parameters, self-definitions, or self-citation chains; the local diagnostic is explicitly independent of the trained model. The claim that disentanglers help precisely when the bottleneck is strongest is presented as an outcome of systematic variation rather than a definitional or statistical tautology. The reconstruction-error anomaly score follows standard practice and does not create circularity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard autoencoder assumptions plus domain knowledge about jet branching; no new physical entities are postulated.

free parameters (1)

MERA bond dimension and layer count
Hyperparameters controlling network capacity and depth, chosen to achieve the desired compression bottleneck.

axioms (2)

domain assumption Jet constituents can be ordered such that short-range angular and momentum correlations dominate before coarse-graining
Invoked to justify the locality-preserving hierarchical compression.
domain assumption Reconstruction error on background-only training data is a valid proxy for anomaly scoring
Standard assumption in unsupervised anomaly detection frameworks.

pith-pipeline@v0.9.0 · 5549 in / 1467 out tokens · 88265 ms · 2026-05-10T18:34:38.245830+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MERA-inspired autoencoder acting directly on ordered jet constituents... disentanglers become beneficial precisely when the compression bottleneck is strongest
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

locality-preserving multiscale structure... reconstruction error after training on background-only jets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 19 canonical work pages · 2 internal anchors

[1]

Belis, P

V. Belis, P. Odagiu, and T. K. Aarrestad, “Machine learning for anomaly detection in particle physics,” Rev. Phys.12(2024) 100091, [arXiv:2312.14190 [physics.data-an]]

work page arXiv 2024
[2]

Kasieczkaet al., Rept

G. Kasieczkaet al., “The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics,” Rept. Prog. Phys.84no. 12, (2021) 124201, [arXiv:2101.08320 [hep-ph]]

work page arXiv 2020
[3]

Challenges for unsupervised anomaly detection in particle physics,

K. Fraser, S. Homiller, R. K. Mishra, B. Ostdiek, and M. D. Schwartz, “Challenges for unsupervised anomaly detection in particle physics,” JHEP03(2022) 066, [arXiv:2110.06948 [hep-ph]]

work page arXiv 2022
[4]

QCD or What?,

T. Heimel, G. Kasieczka, T. Plehn, and J. M. Thompson, “QCD or What?,” SciPost Phys.6 no. 3, (2019) 030, [arXiv:1808.08979 [hep-ph]]

work page arXiv 2019
[5]

Searching for New Physics with Deep Autoencoders,

M. Farina, Y. Nakai, and D. Shih, “Searching for New Physics with Deep Autoencoders,” Phys. Rev. D101no. 7, (2020) 075021, [arXiv:1808.08992 [hep-ph]]

work page arXiv 2020
[6]

Comparing weak- and unsupervised methods for resonant anomaly detection,

J. H. Collins, P. Martin-Ramiro, B. Nachman, and D. Shih, “Comparing weak- and unsupervised methods for resonant anomaly detection,” Eur. Phys. J. C81(2021) 617

2021
[7]

Unsupervised hadronic SUEP at the LHC,

J. Barron, D. Curtin, G. Kasieczka, T. Plehn, and A. Spourdalakis, “Unsupervised hadronic SUEP at the LHC,” JHEP12(2021) 129. – 24 –

2021
[8]

Adversarially-trained autoencoders for robust unsupervised new physics searches,

A. Blance, M. Spannowsky, and P. Waite, “Adversarially-trained autoencoders for robust unsupervised new physics searches,” JHEP10(2019) 047

2019
[9]

A normalized autoencoder for LHC triggers,

B. M. Dillon, L. Favaro, T. Plehn, P. Sorrenson, and M. Kr¨ amer, “A normalized autoencoder for LHC triggers,” SciPost Phys. Core6(2023) 074

2023
[10]

IRC-Safe Graph Autoencoder for Unsupervised Anomaly Detection,

O. Atkinson, A. Bhardwaj, C. Englert, P. Konar, V. S. Ngairangbam, and M. Spannowsky, “IRC-Safe Graph Autoencoder for Unsupervised Anomaly Detection,” Front. Artif. Intell.5 (2022) 943135

2022
[11]

Autoencoders for unsupervised anomaly detection in high energy physics,

T. Finke, M. Kr¨ amer, A. Morandini, A. M¨ uck, and I. Oleksiyuk, “Autoencoders for unsupervised anomaly detection in high energy physics,” JHEP06(2021) 161, [arXiv:2104.09051 [hep-ph]]

work page arXiv 2021
[12]

Tree-based algorithms for weakly supervised anomaly detection,

T. Finke, M. Hein, G. Kasieczka, M. Kr¨ amer, A. M¨ uck, P. Prangchaikul, T. Quadfasel, D. Shih, and M. Sommerhalder, “Tree-based algorithms for weakly supervised anomaly detection,” Phys. Rev. D109(2024) 034033

2024
[13]

Jet substructure at the Large Hadron Collider: a review of recent advances in theory and machine learning,

A. J. Larkoski, I. Moult, and B. Nachman, “Jet substructure at the Large Hadron Collider: a review of recent advances in theory and machine learning,” Phys. Rept.841(2020) 1–63, [arXiv:1709.04464 [hep-ph]]

work page arXiv 2020
[14]

Entanglement renormalization,

G. Vidal, “Entanglement renormalization,” Phys. Rev. Lett.99no. 22, (2007) 220405

2007
[15]

Class of quantum many-body states that can be efficiently simulated,

G. Vidal, “Class of quantum many-body states that can be efficiently simulated,” Phys. Rev. Lett.101no. 11, (2008) 110501

2008
[16]

Miles and Schwab, David J

E. M. Stoudenmire and D. J. Schwab, “Supervised Learning with Tensor Networks,” Adv. Neural Inf. Process. Syst.29(2016) 4799–4807, [arXiv:1605.05775 [cs.LG]]

work page arXiv 2016
[17]

Quantum-inspired event reconstruction with Tensor Networks: Matrix Product States,

J. Y. Araz and M. Spannowsky, “Quantum-inspired event reconstruction with Tensor Networks: Matrix Product States,” JHEP08(2021) 112, [arXiv:2106.08334 [hep-ph]]

work page arXiv 2021
[18]

Classical versus quantum: Comparing tensor-network-based quantum circuits on Large Hadron Collider data,

J. Y. Araz and M. Spannowsky, “Classical versus quantum: Comparing tensor-network-based quantum circuits on Large Hadron Collider data,” Phys. Rev. A106 no. 6, (2022) 062423, [arXiv:2202.10471 [quant-ph]]

work page arXiv 2022
[19]

Quantum-probabilistic Hamiltonian learning for generative modeling and anomaly detection,

J. Y. Araz and M. Spannowsky, “Quantum-probabilistic Hamiltonian learning for generative modeling and anomaly detection,” Phys. Rev. A108no. 6, (2023) 062422, [arXiv:2211.03803 [quant-ph]]

work page arXiv 2023
[20]

Anomaly detection in high-energy physics using a quantum autoencoder,

V. S. Ngairangbam, M. Spannowsky, and M. Takeuchi, “Anomaly detection in high-energy physics using a quantum autoencoder,” Phys. Rev. D105no. 9, (2022) 095004

2022
[21]

Tensor Network for Anomaly Detection in the Latent Space of Proton Collision Events at the LHC,

E. Puljak, M. Pierini, and A. Garcia-Saez, “Tensor Network for Anomaly Detection in the Latent Space of Proton Collision Events at the LHC,” Mach. Learn. Sci. Technol.6no. 4, (2025) 045001, [arXiv:2506.00102 [stat.ML]]

work page arXiv 2025
[22]

Butteret al., The Machine Learning landscape of top taggers, SciPost Phys.7, 014 (2019), arXiv:1902.09914 [hep-ph]

G. Kasieczkaet al., “The Machine Learning landscape of top taggers,” SciPost Phys.7no. 1, (2019) 014, [arXiv:1902.09914 [hep-ph]]

work page arXiv 2019
[23]

Orus, A Practical Introduction to Tensor Networks: Ma- trix Product States and Projected Entangled Pair States

R. Or´ us, “A practical introduction to tensor networks: Matrix product states and projected entangled pair states,” Annals Phys.349(2014) 117–158, [arXiv:1306.2164 [cond-mat.str-el]]

work page arXiv 2014
[24]

Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives,

A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. Oseledets, M. Sugiyama, and D. P. Mandic, “Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives,” Found. Trends Mach. Learn.9no. 4-5, (2017) 431–673. – 25 –

2017
[25]

Learning relevant features of data with multi-scale tensor networks,

E. M. Stoudenmire, “Learning relevant features of data with multi-scale tensor networks,” Quantum Sci. Technol.3no. 3, (2018) 034003

2018
[26]

Multi-scale tensor network architecture for machine learning,

J. A. Reyes and E. M. Stoudenmire, “Multi-scale tensor network architecture for machine learning,” Mach. Learn. Sci. Technol.2no. 3, (2021) 035036

2021
[27]

Machine learning by unitary tensor network of hierarchical tree structure,

D. Liu, S.-J. Ran, P. Wittek, C. Peng, R. Bl´ azquez Garc´ ıa, G. Su, and M. Lewenstein, “Machine learning by unitary tensor network of hierarchical tree structure,” New J. Phys.21 no. 7, (2019) 073059, [arXiv:1710.04833 [stat.ML]]

work page arXiv 2019
[28]

Unsupervised Generative Modeling Using Matrix Product States,

Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, “Unsupervised Generative Modeling Using Matrix Product States,” Phys. Rev. X8no. 3, (2018) 031012

2018
[29]

Tree tensor networks for generative modeling,

S. Cheng, L. Wang, T. Xiang, and P. Zhang, “Tree tensor networks for generative modeling,” Phys. Rev. B99no. 15, (2019) 155131, [arXiv:1901.02217 [cond-mat.str-el]]

work page arXiv 2019
[30]

The Geometry of Algorithms with Orthogonality Constraints,

A. Edelman, T. A. Arias, and S. T. Smith, “The Geometry of Algorithms with Orthogonality Constraints,” SIAM J. Matrix Anal. Appl.20no. 2, (1998) 303–353

1998
[31]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, 2008

2008
[32]

A Multilinear Singular Value Decomposition,

L. De Lathauwer, B. De Moor, and J. Vandewalle, “A Multilinear Singular Value Decomposition,” SIAM J. Matrix Anal. Appl.21no. 4, (2000) 1253–1278

2000
[33]

A Generalized Solution of the Orthogonal Procrustes Problem,

P. H. Sch¨ onemann, “A Generalized Solution of the Orthogonal Procrustes Problem,” Psychometrika31no. 1, (1966) 1–10

1966
[34]

MERACLE: Constructive Layer-Wise Conversion of a Tensor Train into a MERA,

K. Batselier, A. Cichocki, and N. Wong, “MERACLE: Constructive Layer-Wise Conversion of a Tensor Train into a MERA,” Commun. Appl. Math. Comput.3no. 2, (2021) 257–279

2021
[35]

Kasieczka, T

G. Kasieczka, T. Plehn, J. Thompson, and M. Russel, “Top Quark Tagging Reference Dataset,” 3, 2019.https://doi.org/10.5281/zenodo.2603256. Version v0 (2018 03 27)

work page doi:10.5281/zenodo.2603256 2019
[36]

An Introduction to PYTHIA 8.2

T. Sj¨ ostrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen, and P. Z. Skands, “An Introduction to PYTHIA 8.2” Comput. Phys. Commun.191(2015) 159–177, [arXiv:1410.3012 [hep-ph]]. [37]DELPHES 3Collaboration, J. de Favereauet al., “DELPHES 3, A modular framework for fast simulation of a generic collider e...

work page internal anchor Pith review arXiv 2015
[37]

The anti-k_t jet clustering algorithm

M. Cacciari, G. P. Salam, and G. Soyez, “The anti-k t jet clustering algorithm,” JHEP04 (2008) 063, [arXiv:0802.1189 [hep-ph]]

work page internal anchor Pith review arXiv 2008
[38]

Reducing the Dimensionality of Data with Neural Networks,

G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science313no. 5786, (2006) 504–507

2006
[39]

LIII. On lines and planes of closest fit to systems of points in space,

K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” Philos. Mag.2no. 11, (1901) 559–572

1901
[40]

Analysis of a complex of statistical variables into principal components,

H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol.24no. 6, (1933) 417–441

1933
[41]

On the generalised distance in statistics,

P. C. Mahalanobis, “On the generalised distance in statistics,” Proc. Natl. Inst. Sci. India2 (1936) 49–55

1936
[42]

Isolation Forest,

F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. 2008. – 26 –

2008