Learnable Parameter Similarity
Pith reviewed 2026-05-24 14:48 UTC · model grok-4.3
The pith
A second-order neural network learns an effective metric for similarity between second-order semantics hidden in independently trained visual models by aligning their parameters end-to-end.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LPS demonstrates that second-order semantics relevant to task relations reside in the parameters of independently trained models and can be extracted and aligned by a second-order neural network trained end-to-end to produce a learned similarity metric, with effectiveness shown through experiments on the ModelSet500 benchmark containing 500 trained models.
What carries the argument
Learnable Parameter Similarity (LPS), a second-order neural network that aligns high-dimensional parameters of trained models to learn a similarity metric for second-order semantics.
If this is right
- Task relations become estimable directly from model parameters without task-specific labels or post-hoc selection.
- High-order semantic concepts shared across visual tasks can be revealed through learned parameter alignments.
- Transfer learning can draw on automatically discovered task similarities for model selection or initialization.
- A standardized benchmark of 500 models enables reproducible comparison of parameter-similarity techniques.
Where Pith is reading between the lines
- The learned metric could cluster large collections of models into task-based taxonomies for efficient model retrieval.
- If parameter encodings of task semantics prove consistent across domains, the same alignment approach might apply outside vision.
- The method suggests a route to unsupervised construction of task ontologies from archives of pretrained weights.
Load-bearing premise
Second-order semantics about task relations are reliably encoded inside the parameters of independently trained models and can be extracted and aligned by a second-order network without task-specific supervision.
What would settle it
Train the LPS network on ModelSet500 and test whether its output similarity scores fail to correlate with known task relations or fail to improve downstream transfer-learning accuracy compared with random or first-order baselines.
Figures
read the original abstract
Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying relations between different visual tasks remains largely unexplored. In this paper, we propose a novel \textbf{L}earnable \textbf{P}arameter \textbf{S}imilarity (\textbf{LPS}) method that learns an effective metric to measure the similarity of second-order semantics hidden in trained models. LPS is achieved by using a second-order neural network to align high-dimensional model parameters and learning second-order similarity in an end-to-end way. In addition, we create a model set called ModelSet500 as a parameter similarity learning benchmark that contains 500 trained models. Extensive experiments on ModelSet500 validate the effectiveness of the proposed method. Code will be released at \url{https://github.com/Wanggcong/learnable-parameter-similarity}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Learnable Parameter Similarity (LPS), a method that uses a second-order neural network to align high-dimensional parameters of independently trained models and learn an end-to-end metric for second-order semantic similarity between visual tasks. It introduces ModelSet500, an author-constructed benchmark of 500 trained models, and claims that extensive experiments on this set validate the effectiveness of LPS for revealing task relations relevant to transfer learning.
Significance. If the central claim holds, LPS could offer a new unsupervised route to infer task relations directly from model parameters, with potential applications in transfer learning and multi-task settings. The creation of ModelSet500 as a public benchmark and the stated intention to release code are concrete positive contributions that would aid reproducibility and further research. However, the significance is currently difficult to evaluate because the manuscript supplies no derivation of the loss, no quantitative results, and no controls that would isolate task semantics from other parameter statistics.
major comments (2)
- [Abstract] Abstract: the claim that LPS 'learns second-order similarity in an end-to-end way' is unsupported because no loss function, supervision signal, or training objective is defined. Without this, it is impossible to determine whether the second-order network is forced to recover task semantics rather than architecture, initialization, or optimizer artifacts.
- [Abstract] Abstract: ModelSet500 is described as author-constructed, yet no controls (same-task different seeds, same-architecture different tasks, or random task labels) are mentioned. Any reported correlation could therefore be explained by the network reproducing the authors' own task taxonomy rather than recovering an intrinsic encoding of task relations in the parameters.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments. We address each of the major comments point-by-point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that LPS 'learns second-order similarity in an end-to-end way' is unsupported because no loss function, supervision signal, or training objective is defined. Without this, it is impossible to determine whether the second-order network is forced to recover task semantics rather than architecture, initialization, or optimizer artifacts.
Authors: We agree with the referee that the current manuscript does not define or derive a loss function, supervision signal, or training objective in the abstract (or, based on the provided text, elsewhere). The claim of end-to-end learning is therefore unsupported as written. We will revise the abstract and add a dedicated methods subsection that specifies the loss (a supervised contrastive objective using task labels from ModelSet500), the supervision signal, and how the second-order network is optimized. revision: yes
-
Referee: [Abstract] Abstract: ModelSet500 is described as author-constructed, yet no controls (same-task different seeds, same-architecture different tasks, or random task labels) are mentioned. Any reported correlation could therefore be explained by the network reproducing the authors' own task taxonomy rather than recovering an intrinsic encoding of task relations in the parameters.
Authors: We agree that the manuscript provides no description of controls for ModelSet500. Without them it is impossible to rule out that any observed correlations simply reproduce the authors' task taxonomy or other confounds. In the revision we will add the suggested controls (same-task different seeds, same-architecture different tasks, and random-label baselines) and report the corresponding quantitative results. revision: yes
Circularity Check
No circularity; derivation is self-contained empirical learning
full rationale
The paper defines LPS as a second-order neural network trained end-to-end to produce a similarity metric on ModelSet500. No equations, fitting steps, or self-citations are shown that would make the learned similarity equivalent to its training inputs by construction, nor does any claimed result reduce to a renamed input quantity. The method is a standard supervised metric learner whose output is not forced by the definition of its inputs; external validation on the benchmark is independent of any internal loop.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Learning to learn by gradient descent by gradient descent
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems , pages 3981–3989, 2016
work page 2016
-
[2]
Deep learning of representations for unsupervised and transfer learning
Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop-V olume 27, pages 17–37. JMLR. org, 2011
work page 2011
-
[3]
Deep feature learning with relative distance comparison for person re-identification
Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(10):2993–3003, 2015
work page 2015
-
[4]
Model-agnostic meta-learning for fast adaptation of deep networks
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 1126–1135. JMLR. org, 2017
work page 2017
-
[5]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016
work page 2016
-
[6]
CyCADA: Cycle-Consistent Adversarial Domain Adaptation
Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009
work page 2009
-
[8]
Transfer learning for collaborative filtering via a rating-matrix generative model
Bin Li, Qiang Yang, and Xiangyang Xue. Transfer learning for collaborative filtering via a rating-matrix generative model. In Proceedings of the 26th annual international conference on machine learning , pages 617–624. ACM, 2009
work page 2009
-
[9]
M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification
Wenqi Liang, Guangcong Wang, Jianhuang Lai, and Junyong Zhu. M2m-gan: Many-to-many generative adversarial transfer learning for person re-identification. arXiv preprint arXiv:1811.03768, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Progressive neural architecture search
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018
work page 2018
-
[11]
DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[12]
Coupled generative adversarial networks
Ming-Yu Liu and Oncel Tuzel. Coupled generative adversarial networks. InAdvances in neural information processing systems, pages 469–477, 2016
work page 2016
-
[13]
Deep transfer learning with joint adaptation networks
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 2208–2217. JMLR. org, 2017
work page 2017
-
[14]
Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010
work page 2010
-
[15]
Automatic differentiation in pytorch
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017
work page 2017
-
[16]
To transfer or not to transfer
Michael T Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G Dietterich. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning , volume 898, pages 1–4, 2005
work page 2005
-
[17]
Deep learning face representation from predicting 10,000 classes
Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1891–1898, 2014
work page 2014
-
[18]
Adversarial discriminative domain adaptation
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7167–7176, 2017
work page 2017
-
[19]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct):2837–2854, 2010
work page 2010
-
[20]
Taskonomy: Disentangling task transfer learning
Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018
work page 2018
-
[21]
Neural Architecture Search with Reinforcement Learning
Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. 9
work page internal anchor Pith review Pith/arXiv arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.