Learnable Parameter Similarity

Guangcong Wang; Guangrun Wang; Jianhuang Lai; Wenqi Liang

arxiv: 1907.11943 · v1 · pith:XKHLXEJ3new · submitted 2019-07-27 · 💻 cs.LG · cs.CV· stat.ML

Learnable Parameter Similarity

Guangcong Wang , Jianhuang Lai , Wenqi Liang , Guangrun Wang This is my paper

Pith reviewed 2026-05-24 14:48 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords learnable parameter similaritysecond-order semanticstask relationsmodel parameterstransfer learningModelSet500visual tasksparameter alignment

0 comments

The pith

A second-order neural network learns an effective metric for similarity between second-order semantics hidden in independently trained visual models by aligning their parameters end-to-end.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Learnable Parameter Similarity (LPS) as a way to estimate relations between visual tasks by measuring similarities encoded in the parameters of models trained on those tasks. Most prior work addresses tasks in isolation and does not exploit the underlying connections that could support transfer learning or discovery of higher-order concepts. LPS trains a second-order neural network to align high-dimensional parameters from separate models and optimizes the similarity measure directly in an end-to-end fashion. The authors also introduce ModelSet500, a collection of 500 trained models, to serve as a benchmark for evaluating such parameter-similarity methods. If the approach holds, task relations become recoverable from existing model weights alone without additional task labels or manual selection.

Core claim

LPS demonstrates that second-order semantics relevant to task relations reside in the parameters of independently trained models and can be extracted and aligned by a second-order neural network trained end-to-end to produce a learned similarity metric, with effectiveness shown through experiments on the ModelSet500 benchmark containing 500 trained models.

What carries the argument

Learnable Parameter Similarity (LPS), a second-order neural network that aligns high-dimensional parameters of trained models to learn a similarity metric for second-order semantics.

If this is right

Task relations become estimable directly from model parameters without task-specific labels or post-hoc selection.
High-order semantic concepts shared across visual tasks can be revealed through learned parameter alignments.
Transfer learning can draw on automatically discovered task similarities for model selection or initialization.
A standardized benchmark of 500 models enables reproducible comparison of parameter-similarity techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The learned metric could cluster large collections of models into task-based taxonomies for efficient model retrieval.
If parameter encodings of task semantics prove consistent across domains, the same alignment approach might apply outside vision.
The method suggests a route to unsupervised construction of task ontologies from archives of pretrained weights.

Load-bearing premise

Second-order semantics about task relations are reliably encoded inside the parameters of independently trained models and can be extracted and aligned by a second-order network without task-specific supervision.

What would settle it

Train the LPS network on ModelSet500 and test whether its output similarity scores fail to correlate with known task relations or fail to improve downstream transfer-learning accuracy compared with random or first-order baselines.

Figures

Figures reproduced from arXiv: 1907.11943 by Guangcong Wang, Guangrun Wang, Jianhuang Lai, Wenqi Liang.

**Figure 2.** Figure 2: Second-order neural network Let T denote a set of tasks {Ti} N i=1. For Ti ∈ T , we train mi deep models with a task label Yi . We obtain a model set D with M models {(θ ∗ j , Yj )}M j=1 from N tasks, where M = Pmj . θ ∗ j is a trained model, which can be regarded as a metadata point. We then use these M metadata points to train the second-order similarity learning by φt+1 = φt − βt∇h(D; φt) (2) where h is… view at source ↗

**Figure 3.** Figure 3: The effectiveness of learnable parameter similarity. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying relations between different visual tasks remains largely unexplored. In this paper, we propose a novel \textbf{L}earnable \textbf{P}arameter \textbf{S}imilarity (\textbf{LPS}) method that learns an effective metric to measure the similarity of second-order semantics hidden in trained models. LPS is achieved by using a second-order neural network to align high-dimensional model parameters and learning second-order similarity in an end-to-end way. In addition, we create a model set called ModelSet500 as a parameter similarity learning benchmark that contains 500 trained models. Extensive experiments on ModelSet500 validate the effectiveness of the proposed method. Code will be released at \url{https://github.com/Wanggcong/learnable-parameter-similarity}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes an end-to-end second-order network to learn similarity between model parameters and releases ModelSet500, but the results rest on an untested assumption that parameters encode task relations extractable without supervision.

read the letter

The main move is treating parameter similarity as something a second-order network can learn directly by aligning high-dimensional weights from independently trained models. They also put together ModelSet500, a collection of 500 models, as a benchmark for this kind of work. That combination is the concrete new piece; prior task-relation methods usually rely on hand-designed metrics or task labels rather than learning an alignment from the parameters themselves.

Referee Report

2 major / 0 minor

Summary. The paper proposes Learnable Parameter Similarity (LPS), a method that uses a second-order neural network to align high-dimensional parameters of independently trained models and learn an end-to-end metric for second-order semantic similarity between visual tasks. It introduces ModelSet500, an author-constructed benchmark of 500 trained models, and claims that extensive experiments on this set validate the effectiveness of LPS for revealing task relations relevant to transfer learning.

Significance. If the central claim holds, LPS could offer a new unsupervised route to infer task relations directly from model parameters, with potential applications in transfer learning and multi-task settings. The creation of ModelSet500 as a public benchmark and the stated intention to release code are concrete positive contributions that would aid reproducibility and further research. However, the significance is currently difficult to evaluate because the manuscript supplies no derivation of the loss, no quantitative results, and no controls that would isolate task semantics from other parameter statistics.

major comments (2)

[Abstract] Abstract: the claim that LPS 'learns second-order similarity in an end-to-end way' is unsupported because no loss function, supervision signal, or training objective is defined. Without this, it is impossible to determine whether the second-order network is forced to recover task semantics rather than architecture, initialization, or optimizer artifacts.
[Abstract] Abstract: ModelSet500 is described as author-constructed, yet no controls (same-task different seeds, same-architecture different tasks, or random task labels) are mentioned. Any reported correlation could therefore be explained by the network reproducing the authors' own task taxonomy rather than recovering an intrinsic encoding of task relations in the parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments. We address each of the major comments point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that LPS 'learns second-order similarity in an end-to-end way' is unsupported because no loss function, supervision signal, or training objective is defined. Without this, it is impossible to determine whether the second-order network is forced to recover task semantics rather than architecture, initialization, or optimizer artifacts.

Authors: We agree with the referee that the current manuscript does not define or derive a loss function, supervision signal, or training objective in the abstract (or, based on the provided text, elsewhere). The claim of end-to-end learning is therefore unsupported as written. We will revise the abstract and add a dedicated methods subsection that specifies the loss (a supervised contrastive objective using task labels from ModelSet500), the supervision signal, and how the second-order network is optimized. revision: yes
Referee: [Abstract] Abstract: ModelSet500 is described as author-constructed, yet no controls (same-task different seeds, same-architecture different tasks, or random task labels) are mentioned. Any reported correlation could therefore be explained by the network reproducing the authors' own task taxonomy rather than recovering an intrinsic encoding of task relations in the parameters.

Authors: We agree that the manuscript provides no description of controls for ModelSet500. Without them it is impossible to rule out that any observed correlations simply reproduce the authors' task taxonomy or other confounds. In the revision we will add the suggested controls (same-task different seeds, same-architecture different tasks, and random-label baselines) and report the corresponding quantitative results. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained empirical learning

full rationale

The paper defines LPS as a second-order neural network trained end-to-end to produce a similarity metric on ModelSet500. No equations, fitting steps, or self-citations are shown that would make the learned similarity equivalent to its training inputs by construction, nor does any claimed result reduce to a renamed input quantity. The method is a standard supervised metric learner whose output is not forced by the definition of its inputs; external validation on the benchmark is independent of any internal loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the high-level description of the second-order network itself.

pith-pipeline@v0.9.0 · 5692 in / 950 out tokens · 35560 ms · 2026-05-24T14:48:55.033351+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

[1]

Learning to learn by gradient descent by gradient descent

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems , pages 3981–3989, 2016

work page 2016
[2]

Deep learning of representations for unsupervised and transfer learning

Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop-V olume 27, pages 17–37. JMLR. org, 2011

work page 2011
[3]

Deep feature learning with relative distance comparison for person re-identiﬁcation

Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. Deep feature learning with relative distance comparison for person re-identiﬁcation. Pattern Recognition, 48(10):2993–3003, 2015

work page 2015
[4]

Model-agnostic meta-learning for fast adaptation of deep networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 1126–1135. JMLR. org, 2017

work page 2017
[5]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016

work page 2016
[6]

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009
[8]

Transfer learning for collaborative ﬁltering via a rating-matrix generative model

Bin Li, Qiang Yang, and Xiangyang Xue. Transfer learning for collaborative ﬁltering via a rating-matrix generative model. In Proceedings of the 26th annual international conference on machine learning , pages 617–624. ACM, 2009

work page 2009
[9]

M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Wenqi Liang, Guangcong Wang, Jianhuang Lai, and Junyong Zhu. M2m-gan: Many-to-many generative adversarial transfer learning for person re-identiﬁcation. arXiv preprint arXiv:1811.03768, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Progressive neural architecture search

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018

work page 2018
[11]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Coupled generative adversarial networks

Ming-Yu Liu and Oncel Tuzel. Coupled generative adversarial networks. InAdvances in neural information processing systems, pages 469–477, 2016

work page 2016
[13]

Deep transfer learning with joint adaptation networks

Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 2208–2217. JMLR. org, 2017

work page 2017
[14]

A survey on transfer learning

Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010

work page 2010
[15]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

work page 2017
[16]

To transfer or not to transfer

Michael T Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G Dietterich. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning , volume 898, pages 1–4, 2005

work page 2005
[17]

Deep learning face representation from predicting 10,000 classes

Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1891–1898, 2014

work page 2014
[18]

Adversarial discriminative domain adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7167–7176, 2017

work page 2017
[19]

Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance

Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct):2837–2854, 2010

work page 2010
[20]

Taskonomy: Disentangling task transfer learning

Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018

work page 2018
[21]

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. 9

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Learning to learn by gradient descent by gradient descent

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems , pages 3981–3989, 2016

work page 2016

[2] [2]

Deep learning of representations for unsupervised and transfer learning

Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop-V olume 27, pages 17–37. JMLR. org, 2011

work page 2011

[3] [3]

Deep feature learning with relative distance comparison for person re-identiﬁcation

Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao. Deep feature learning with relative distance comparison for person re-identiﬁcation. Pattern Recognition, 48(10):2993–3003, 2015

work page 2015

[4] [4]

Model-agnostic meta-learning for fast adaptation of deep networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 1126–1135. JMLR. org, 2017

work page 2017

[5] [5]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016

work page 2016

[6] [6]

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei A Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009

[8] [8]

Transfer learning for collaborative ﬁltering via a rating-matrix generative model

Bin Li, Qiang Yang, and Xiangyang Xue. Transfer learning for collaborative ﬁltering via a rating-matrix generative model. In Proceedings of the 26th annual international conference on machine learning , pages 617–624. ACM, 2009

work page 2009

[9] [9]

M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Wenqi Liang, Guangcong Wang, Jianhuang Lai, and Junyong Zhu. M2m-gan: Many-to-many generative adversarial transfer learning for person re-identiﬁcation. arXiv preprint arXiv:1811.03768, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Progressive neural architecture search

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018

work page 2018

[11] [11]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Coupled generative adversarial networks

Ming-Yu Liu and Oncel Tuzel. Coupled generative adversarial networks. InAdvances in neural information processing systems, pages 469–477, 2016

work page 2016

[13] [13]

Deep transfer learning with joint adaptation networks

Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning-V olume 70, pages 2208–2217. JMLR. org, 2017

work page 2017

[14] [14]

A survey on transfer learning

Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010

work page 2010

[15] [15]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017

work page 2017

[16] [16]

To transfer or not to transfer

Michael T Rosenstein, Zvika Marx, Leslie Pack Kaelbling, and Thomas G Dietterich. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning , volume 898, pages 1–4, 2005

work page 2005

[17] [17]

Deep learning face representation from predicting 10,000 classes

Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 1891–1898, 2014

work page 2014

[18] [18]

Adversarial discriminative domain adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7167–7176, 2017

work page 2017

[19] [19]

Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance

Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct):2837–2854, 2010

work page 2010

[20] [20]

Taskonomy: Disentangling task transfer learning

Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018

work page 2018

[21] [21]

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016. 9

work page internal anchor Pith review Pith/arXiv arXiv 2016