Recognition: unknown
Quasi-Equivariant Metanetworks
Pith reviewed 2026-05-08 06:26 UTC · model grok-4.3
The pith
Quasi-equivariance relaxes strict symmetry constraints on metanetworks while still preserving the functional identity of the underlying neural weights.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing quasi-equivariance, metanetworks can incorporate symmetry principles to respect architectural symmetries without the rigidity that makes strictly equivariant models sparse and less expressive, thereby supplying a principled and broadly applicable framework for weight-space learning across feedforward, convolutional, and transformer networks.
What carries the argument
Quasi-equivariance: a relaxed equivariance relation on weight-space operators that preserves functional identity of the represented network while avoiding the rigid group-action constraints of strict equivariance.
If this is right
- Quasi-equivariant metanetworks apply directly to feedforward, convolutional, and transformer architectures.
- They produce measurable improvements in the symmetry-expressivity trade-off compared with strict equivariance.
- The framework supplies a theoretical basis for designing metanetworks that reason about functional rather than parametric identity.
- Empirical results indicate that the relaxation yields usable models without sacrificing the core advantages of symmetry awareness.
Where Pith is reading between the lines
- The same quasi-equivariance idea could be tested on recurrent or graph architectures where functional equivalences are also common.
- If the definition proves stable, it might simplify downstream tasks such as weight-space model merging or architecture search by automatically identifying equivalent representations.
- Scaling the method to very large pretrained models would test whether the relaxed constraints remain computationally tractable.
Load-bearing premise
A relaxed notion of equivariance can be defined so that it still guarantees preservation of functional identity without creating new inconsistencies or losing the original symmetry benefits.
What would settle it
An experiment in which a quasi-equivariant metanetwork fails to assign the same output to two weight sets that realize identical input-output functions, or in which its performance collapses to that of a non-equivariant baseline on a functional-identity task.
Figures
read the original abstract
Metanetworks are neural architectures designed to operate directly on pretrained weights to perform downstream tasks. However, the parameter space serves only as a proxy for the underlying function class, and the parameter-function mapping is inherently non-injective: distinct parameter configurations may yield identical input-output behaviors. As a result, metanetworks that rely solely on raw parameters risk overlooking the intrinsic symmetries of the architecture. Reasoning about functional identity is therefore essential for effective metanetwork design, motivating the development of equivariant metanetworks, which incorporate equivariance principles to respect architectural symmetries. Existing approaches, however, typically enforce strict equivariance, which imposes rigid constraints and often leads to sparse and less expressive models. To address this limitation, we introduce the novel concept of quasi-equivariance, which allows metanetworks to move beyond the rigidity of strict equivariance while still preserving functional identity. We lay down a principled basis for this framework and demonstrate its broad applicability across diverse neural architectures, including feedforward, convolutional, and transformer networks. Through empirical evaluation, we show that quasi-equivariant metanetworks achieve good trade-offs between symmetry preservation and representational expressivity. These findings advance the theoretical understanding of weight-space learning and provide a principled foundation for the design of more expressive and functionally robust metanetworks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces quasi-equivariance as a relaxation of strict equivariance for metanetworks operating on pretrained weights. It argues that the non-injective parameter-function map requires symmetry awareness beyond raw parameters, proposes a principled quasi-equivariant framework that trades rigidity for expressivity while preserving functional identity, and demonstrates applicability to feedforward, convolutional, and transformer architectures with empirical results showing favorable trade-offs.
Significance. If the definition of quasi-equivariance provably respects functional identity under the non-injective mapping and the empirical gains are robust, the work would meaningfully advance weight-space learning by offering a flexible middle ground between strict equivariant metanetworks and unconstrained baselines.
major comments (2)
- [§3.2] §3.2, Definition 2 (quasi-equivariance): the relaxation via tolerance parameter ε is introduced without a proof that it preserves functional identity for all reparameterizations leaving the input-output map unchanged. For transformers, the interaction with attention and positional symmetries is not shown to avoid mapping functionally identical weights to distinct features, directly engaging the central claim.
- [§5.3] §5.3, Table 2 (transformer experiments): the reported accuracy improvements over raw-parameter baselines are 1.8% on average, but no ablation isolates the contribution of the quasi-equivariant layers versus the choice of ε; without this, the claim of a 'good trade-off' between symmetry preservation and expressivity cannot be evaluated.
minor comments (2)
- [§2] Notation for the group action on weights is introduced in §2 but reused inconsistently in §4; a single consolidated definition would improve readability.
- [Abstract] The abstract claims 'broad applicability' but the experiments cover only three architectures; adding a brief discussion of limitations for other families (e.g., RNNs) would be helpful.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the work.
read point-by-point responses
-
Referee: [§3.2] §3.2, Definition 2 (quasi-equivariance): the relaxation via tolerance parameter ε is introduced without a proof that it preserves functional identity for all reparameterizations leaving the input-output map unchanged. For transformers, the interaction with attention and positional symmetries is not shown to avoid mapping functionally identical weights to distinct features, directly engaging the central claim.
Authors: We appreciate the referee pointing out this gap. Definition 2 introduces quasi-equivariance via the tolerance parameter ε precisely to relax strict equivariance while addressing the non-injective parameter-function mapping. The manuscript motivates this choice by construction and provides supporting empirical evidence across architectures, but we acknowledge that a complete formal proof—showing preservation of functional identity for arbitrary reparameterizations that leave the input-output map invariant, and specifically analyzing attention and positional symmetries in transformers to ensure functionally identical weights are not mapped to distinct features beyond ε—is not included. In the revised version we will add a formal proof sketch together with a dedicated transformer analysis demonstrating that the quasi-equivariant layers respect functional identity within the stated tolerance. revision: yes
-
Referee: [§5.3] §5.3, Table 2 (transformer experiments): the reported accuracy improvements over raw-parameter baselines are 1.8% on average, but no ablation isolates the contribution of the quasi-equivariant layers versus the choice of ε; without this, the claim of a 'good trade-off' between symmetry preservation and expressivity cannot be evaluated.
Authors: We agree that the current experiments do not fully isolate the contributions. The reported 1.8% average improvement over raw-parameter baselines is presented as evidence of a favorable trade-off, yet without ablations separating the quasi-equivariant layers from the specific choice of ε the claim cannot be rigorously evaluated. In the revision we will add ablation studies on the transformer experiments, including variants that disable the quasi-equivariant components and sweeps over different ε values, to quantify their individual effects and better substantiate the symmetry-expressivity trade-off. revision: yes
Circularity Check
No circularity: quasi-equivariance introduced as independent relaxation
full rationale
The paper defines quasi-equivariance as a novel relaxation of strict equivariance that preserves functional identity while improving expressivity for metanetworks on feedforward, convolutional, and transformer architectures. The abstract and high-level description present this as a new principled framework with empirical trade-offs, without any quoted equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claim to its own inputs by construction. The derivation chain remains self-contained against external equivariance literature and does not exhibit the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The parameter-function mapping is inherently non-injective
invented entities (1)
-
quasi-equivariance
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Classifying the classifier: Dissecting the weight space of neural networks
PMLR, 2019. David Steven Dummit and Richard M Foote.Abstract algebra, volume 3. Wiley Hoboken, 2004. Emilien Dupont, Hyunjik Kim, S. M. Ali Eslami, Danilo Jimenez Rezende, and Dan Rosenbaum. From data to functa: Your data point is a function and you can treat it like one. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimor...
-
[2]
Alex Krizhevsky and Geoffrey Hinton
URLhttps://openreview.net/forum?id=oO6FsMyDBt. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009. URLhttps://www.cs. toronto.edu/˜kriz/learning-features-2009-TR.pdf. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification wi...
-
[3]
Andreas M¨uller, Carlo Curino, and Raghu Ramakrishnan
URLhttps://openreview.net/forum?id=0DcZxeWfOPt. Andreas M¨uller, Carlo Curino, and Raghu Ramakrishnan. Mothernet: A foundational hypernetwork for tabular classification.arXiv preprint arXiv:2312.08598, 2023. Aviv Navon, Aviv Shamsian, Idan Achituve, Ethan Fetaya, Gal Chechik, and Haggai Maron. Equiv- ariant architectures for learning in deep weight spaces...
-
[4]
14 Published as a conference paper at ICLR 2026 Duy-Tung Pham, Viet-Hoang Tran, Thieu V o, and Tan Minh Nguyen
URLhttps://proceedings.neurips.cc/paper_files/paper/2024/ file/98082e6b4b97ab7d3af1134a5013304e-Paper-Conference.pdf. 14 Published as a conference paper at ICLR 2026 Duy-Tung Pham, Viet-Hoang Tran, Thieu V o, and Tan Minh Nguyen. Mixed-curvature tree-sliced wasserstein distance. InThe Fourteenth International Conference on Learning Representations,
2024
-
[5]
URLhttps://openreview.net/forum?id=e439wJl5sT. M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equa- tions.Journal of Computational Physics, 378:686–707, 2019. ISSN 0021-9991. doi: https://doi. org/10.1016/j.jcp.201...
-
[6]
For every choice of a sectionS⊂Θmeeting eachG-orbit exactly once, and for every seed maps:S/∫hortrightarrowXsatisfying the stabilizer constraint (see Proposition A.2), there exists a unique mapF: Θ/∫hortrightarrowXobeying the quasi–equivariance relation F(gθ) =α(g, θ)·F(θ),for allg∈G, θ∈Θ,(13) such thatFis independent of the choice of representative of a ...
-
[7]
Necessity.Fixg 1, g2, θ
The functionα:G×Θ/∫hortrightarrowHsatisfies, for allg1, g2 ∈Gandθ∈Θ, α(e, θ) =e H ,(14) α(g1g2, θ) =α(g 1, g 2θ)α(g 2, θ).(15) Proof.The proof proceeds as follows. Necessity.Fixg 1, g2, θ. Using Equation (13) twice, F(g 1g2θ) =α(g 1, g 2θ)·F(g 2θ) =α(g 1, g 2θ)α(g 2, θ)·F(θ).(16) On the other hand, applying Equation (13) once withg 1g2 gives F(g 1g2θ) =α(...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.