arxiv: 2605.08281 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: no theorem link

Is Class Signal Clustered or Routed in Task-Induced Implicit Neural Representation Weight Spaces?

Xinyi Guo , Mingyi He , Haobin Ding , Weiming Chen , Xinrui Chen , Jiawen Li , Di Zhang , Minxi Ouyang

show 2 more authors

Yizhi Wang Xitong Ling

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords implicit neural representationsweight spaceimage classificationmeta-learningSIRENclass signalroutingclustering

0 comments

The pith

Task-induced INR weights classify by class because the reader routes their signal through specific pathways rather than because the weights form geometric clusters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether fitting image-specific implicit neural representations causes their weight vectors to naturally cluster by class under classifier pressure. It shows that raw weight-space geometry and supervised clustering do not track or explain the accuracy of a trained reader network that classifies those weights. Instead, the reader actively constructs class alignment through its own token interactions, with the native SIREN bias column serving as a low-dimensional causal route that carries the class signal. Targeted changes that strengthen this routing or add explicit bias pathways can improve performance over the standard shared-anchor approach.

Core claim

In the SIREN-based Meta Weight Transformer regime, end-to-end meta-training does not produce weight-space geometry whose clustering reliably predicts trained-reader accuracy; local class consistency can increase while reader performance decreases. Token-flow diagnostics demonstrate that class-aligned neighborhoods become predictive only after late reader interactions, not in the input coordinate. The native SIREN bias column in the augmented weight token functions as a sample-dependent causal readout route, and controls rule out generic scalar or marginal artifacts. Route-directed interventions often outperform the baseline under the lane-specific training conventions tested.

What carries the argument

Token-flow diagnostics that track when class alignment emerges in the reader, together with the native SIREN bias column as a low-dimensional causal readout route for the trained reader.

Load-bearing premise

That the token-flow diagnostics and targeted controls on the bias column isolate the routing mechanism without being confounded by the SIREN architecture or the specific training conventions used.

What would settle it

Randomizing or masking only the bias column during reader training while leaving all other weight tokens intact, then measuring whether reader accuracy falls sharply compared with an unmasked control; if accuracy stays high, the routing claim is falsified.

Figures

Figures reproduced from arXiv: 2605.08281 by Di Zhang, Haobin Ding, Jiawen Li, Mingyi He, Minxi Ouyang, Weiming Chen, Xinrui Chen, Xinyi Guo, Xitong Ling, Yizhi Wang.

**Figure 1.** Figure 1: Overview: diagnosing whether class signal is clustered or routed in task-induced INR weights. The pipeline fits image-specific SIRENs from a shared anchor under classifier feedback, exposes reader-visible weight tokens, and then tests two competing explanations of classifiability. The paper contrasts two observables: exposed coordinate geometry, which can be tested directly, and reader-side states, which a… view at source ↗

**Figure 2.** Figure 2: In the broader family panel, cluster pressure can improve exposed local consistency while lowering trained-reader accuracy. Bars show family means with ±1 sample s.d. across emitter configurations (n = 12 non-cluster, n = 4 cluster-pressure); ∆ is cluster-pressure minus non-cluster. Full-offset 5-NN rises 23.75→25.70 and W-only 5-NN rises 22.32→24.73, while trained-reader Top-1 falls 61.65→59.24. This is a… view at source ↗

**Figure 3.** Figure 3: Input-side readouts do not rescue the clustered account. Top row: shallow inputside probes and the downstream trained weight reader; the MWT row is the trained reader, not an input-side probe. Bottom row: per-emitter probe trajectories across representation depth. The panel visualizes the input-readout pattern; the prose and Appendix D report the cross-seed pivot used for quantitative contrasts. Thus [PI… view at source ↗

**Figure 4.** Figure 4: Reader construction of class geometry and function-response control. Left: token-flow 5-NN rises from the reader input to late reader states; the plotted six-emitter panel sits in the mid-20s at input and the low-to-mid-50s late in the reader. Center: late-state 5-NN tracks trained-reader Top-1 on the train-resolution slice, with the band showing the six-resolution envelope. Right: the class-only function-… view at source ↗

**Figure 5.** Figure 5: The SIREN bias entry is compact and causal. Left: the explicit bias route has effective ranks rank0.9 = 2 and rank0.99 = 5. Center/right: targeted b-coordinate interventions are shown with matched-weight, distributional, shift-split, and layer-locality controls; [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Implicit neural representations (INRs) encode images as neural-network weights, making image classification a problem of weight-space classifiability. A natural geometric hypothesis is that classifier feedback should make image-specific weights cluster by class in the shared-anchor coordinate. We test this hypothesis in the SIREN-based Meta Weight Transformer (MWT) regime, where end-to-end training meta-learns a shared initialization and inner-loop update schedule for fitting image-specific SIRENs. We find that this prediction fails. Exposed weight-space geometry and supervised clustering pressure do not reliably track trained-reader accuracy; clustering can even make local neighborhoods more class-consistent while making the trained reader worse. Crucially, the reader constructs rather than inherits class-aligned geometry: token-flow diagnostics show that class-aligned neighborhoods become strongly predictive of trained-reader accuracy only after late reader interactions, not in the input coordinate. We further identify the native SIREN bias column in the augmented weight token as a low-dimensional, sample-dependent causal readout route for the trained reader; targeted controls rule out generic scalar-column and marginal-distribution artifacts. The diagnosis motivates interventions that strengthen reader routing, add an explicit bias route, or use denser inner-loop fitting; under the lane-specific training conventions used here, route-directed variants often outperform the shared-anchor baseline but interact non-additively. Task-induced INR weights are classifiable not because they form raw geometric clusters, but because their class signal is routed through the reader.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds that in SIREN-MWT setups, class signals in INR weights are routed by the reader rather than clustered in raw weight geometry, with the bias column as a key causal path.

read the letter

The central observation is that task-induced INR weights do not form class clusters in the shared coordinate that a downstream reader can simply read off. Instead the reader builds the alignment itself, and the work traces this to late-layer token interactions plus a specific low-dimensional route through the native SIREN bias column. Token-flow diagnostics show class-neighborhood consistency only becomes predictive of reader accuracy after those late interactions, not at the input weights. Targeted ablations on the bias column rule out generic scalar or marginal effects, and the paper notes that clustering pressure can even degrade reader performance in some regimes. Interventions that strengthen the route or add an explicit bias path sometimes beat the baseline, though the gains are non-additive under their lane-specific training schedule. This is a clean negative result on the geometric clustering hypothesis plus a mechanistic alternative grounded in the same experimental regime. The diagnostics are empirical and avoid circular fitting claims. The main limitation is that all controls stay inside SIREN + augmented-token MWT with its inner-loop schedule and shared-anchor initialization. No cross-checks appear on ReLU MLPs, Fourier-feature INRs, or other tokenization schemes, so it remains open whether the routing pattern is general to task-induced fitting or tied to SIREN bias dynamics and the particular meta-training conventions. The abstract-only review leaves quantitative details on error bars and data splits unverified, but the reported pattern is internally consistent. This is useful for researchers working on weight-space meta-learning and INR classification who want a more precise picture of where the class signal actually lives. It deserves peer review because the hypothesis test is direct, the negative finding is reproducible in principle, and the routing diagnosis points to concrete next experiments even if the architecture scope needs widening.

Referee Report

2 major / 2 minor

Summary. The paper tests the hypothesis that task-induced INR weights (in the SIREN-based Meta Weight Transformer regime) form class-aligned geometric clusters in the shared-anchor coordinate. Using token-flow diagnostics and targeted bias-column ablations, it reports a negative result: raw weight-space geometry and clustering pressure do not track reader accuracy, and class alignment emerges only after late reader interactions. The class signal is instead routed through the reader, with the native SIREN bias column identified as a low-dimensional causal route; interventions that strengthen routing or add explicit bias paths can outperform the baseline under the reported training conventions.

Significance. If the empirical diagnostics hold, the work supplies a concrete negative result against the clustering hypothesis for meta-learned INR weight spaces and shifts attention to reader routing mechanisms. The token-flow analysis and bias-column controls constitute reusable tools for probing weight-space classifiability. The finding that clustering can improve local consistency while harming reader performance is a useful cautionary observation for future meta-learning designs.

major comments (2)

[§4 and §5.1] §4 (Token-flow diagnostics) and §5.1 (bias ablation): both sets of results are obtained exclusively inside the SIREN + augmented-token MWT regime with its specific inner-loop schedule and shared-anchor initialization. No cross-architecture controls (ReLU MLPs, Fourier-feature INRs, or alternative bases) are reported, so the observed decoupling of raw clustering from reader accuracy could be produced by SIREN sinusoidal bias dynamics or the particular tokenization rather than by task-induced fitting in general.
[§5.2] §5.2 (intervention experiments): the reported gains from route-directed variants are described as non-additive and dependent on lane-specific conventions. Without an explicit statement of how many random seeds, data splits, or hyperparameter sweeps were used to establish these interactions, it is difficult to assess whether the routing advantage is robust or sensitive to the exact MWT training protocol.

minor comments (2)

[Abstract] Abstract: the phrase 'lane-specific training conventions' is used without a one-sentence gloss or pointer to the methods section; a brief clarification would improve accessibility.
[Figure 3, Table 1] Figure 3 and Table 1: axis labels and legend entries for the token-flow plots and clustering metrics should be enlarged or clarified to ensure they remain legible when the paper is viewed at standard column width.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the scope and robustness of our findings. We address each major comment below.

read point-by-point responses

Referee: [§4 and §5.1] §4 (Token-flow diagnostics) and §5.1 (bias ablation): both sets of results are obtained exclusively inside the SIREN + augmented-token MWT regime with its specific inner-loop schedule and shared-anchor initialization. No cross-architecture controls (ReLU MLPs, Fourier-feature INRs, or alternative bases) are reported, so the observed decoupling of raw clustering from reader accuracy could be produced by SIREN sinusoidal bias dynamics or the particular tokenization rather than by task-induced fitting in general.

Authors: We agree that our empirical results are specific to the SIREN-based MWT regime, which is the focus of the study as it represents a standard approach for meta-learning task-induced INRs. The token-flow diagnostics and bias ablations are tailored to this architecture to isolate the routing mechanism. While we acknowledge that cross-architecture validation would strengthen the generality claim, the negative result against the clustering hypothesis holds within this widely-used setup. We will add a dedicated limitations paragraph discussing the potential influence of SIREN-specific dynamics and outlining future work on other INR bases. revision: yes
Referee: [§5.2] §5.2 (intervention experiments): the reported gains from route-directed variants are described as non-additive and dependent on lane-specific conventions. Without an explicit statement of how many random seeds, data splits, or hyperparameter sweeps were used to establish these interactions, it is difficult to assess whether the routing advantage is robust or sensitive to the exact MWT training protocol.

Authors: We appreciate this point on reporting standards. The intervention experiments were conducted over 3 independent random seeds with fixed data splits corresponding to the standard train/validation/test partitions of the datasets used. Hyperparameters were selected based on a limited grid search around the baseline configuration, and we observed consistent trends across seeds. To address the concern, we will expand the experimental details section to explicitly state the number of seeds, the variance in performance metrics, and the hyperparameter ranges explored, thereby demonstrating the robustness under the reported conventions. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical hypothesis test with no self-referential derivations

full rationale

The paper tests a geometric clustering hypothesis for task-induced INR weights via token-flow diagnostics, bias-column ablations, and reader accuracy correlations inside the SIREN+MWT regime. No equations, first-principles derivations, or predictions are claimed; all load-bearing steps are experimental observations (e.g., class alignment becomes predictive only after late reader layers). No self-citations, fitted parameters renamed as predictions, or ansatzes appear in the provided text. The result is self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical observations within the SIREN-based Meta Weight Transformer regime; it assumes the validity of the weight-space analysis tools and the representativeness of the chosen training conventions but introduces no new mathematical axioms or postulated entities.

axioms (1)

domain assumption The SIREN-based Meta Weight Transformer regime with its shared initialization and inner-loop schedule produces task-induced weights whose geometry can be meaningfully diagnosed by token-flow and clustering metrics.
All experiments and conclusions are conditioned on this specific meta-learning architecture and training protocol.

pith-pipeline@v0.9.0 · 5591 in / 1352 out tokens · 73656 ms · 2026-05-12T01:34:38.679124+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Ainsworth, Jonathan Hayase, and Siddhartha S

Samuel K. Ainsworth, Jonathan Hayase, and Siddhartha S. Srinivasa. Git re-basin: Merging models modulo permutation symmetries. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[2]

NeRN: Learning neural representations for neural networks

Maor Ashkenazi, Zohar Rimon, Ron Vainshtein, Shir Levi, Elad Richardson, Pinchas Mintz, and Eran Treister. NeRN: Learning neural representations for neural networks. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[3]

Spatial functa: Scaling functa to ImageNet classification and generation, 2023

Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz, and Hyunjik Kim. Spatial functa: Scaling functa to ImageNet classification and generation, 2023

work page 2023
[4]

Encoding semantic priors into the weights of implicit neural representation

Zhicheng Cai and Qiu Shen. Encoding semantic priors into the weights of implicit neural representation. In2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2024. doi: 10.1109/ICME57554.2024.10688197

work page doi:10.1109/icme57554.2024.10688197 2024
[5]

From data to functa: Your data point is a function and you can treat it like one.Proceedings of the 39th International Conference on Machine Learning, (162):5694–5725, 2022

Emilien Dupont, Hyunjik Kim, S M Ali Eslami, Danilo Rezende, and Dan Rosenbaum. From data to functa: Your data point is a function and you can treat it like one.Proceedings of the 39th International Conference on Machine Learning, (162):5694–5725, 2022

work page 2022
[6]

Classifying the classifier: Dissecting the weight space of neural networks

Eilertsen Gabriel, Jönsson Daniel, Ropinski Timo, Unger Jonas, and Ynnerman Anders. Classifying the classifier: Dissecting the weight space of neural networks. InFrontiers in Artificial Intelligence and Applications. IOS Press, 2020. doi: 10.3233/FAIA200209

work page doi:10.3233/faia200209 2020
[7]

The role of permutation invariance in linear mode connectivity of neural networks

Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. The role of permutation invariance in linear mode connectivity of neural networks. InThe Tenth International Conference on Learning Representations, 2022

work page 2022
[8]

2023 , url =

Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. HyperDiffusion: Generating implicit neural fields with weight-space diffusion. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14254–14264. IEEE, 2023. doi: 10.1109/ICCV51070.2023.01315

work page doi:10.1109/iccv51070.2023.01315 2023
[9]

Model-agnostic meta-learning for fast adaptation of deep networks.Proceedings of the 34th International Conference on Machine Learning, (70):1126–1135, 2017

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks.Proceedings of the 34th International Conference on Machine Learning, (70):1126–1135, 2017

work page 2017
[10]

What’s in the image? a deep-dive into the vision of vision language models

Alexander Gielisse and Jan Van Gemert. End-to-end implicit neural representations for classification. In 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18728–18737. IEEE, 2025. doi: 10.1109/CVPR52734.2025.01745

work page doi:10.1109/cvpr52734.2025.01745 2025
[11]

A survey of weight space learning: Understanding, representation, and generation, 2026

Xiaolong Han, Zehong Wang, Bo Zhao, Binchi Zhang, Jundong Li, Damian Borth, Rose Yu, Haggai Maron, Yanfang Ye, Lu Yin, and Ferrante Neri. A survey of weight space learning: Understanding, representation, and generation, 2026

work page 2026
[12]

What’s in the image? a deep-dive into the vision of vision language models

Eliahu Horwitz, Bar Cavia, Jonathan Kahana, and Yedid Hoshen. Learning on model weights using tree experts. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20468–20478. IEEE, 2025. doi: 10.1109/CVPR52734.2025.01906

work page doi:10.1109/cvpr52734.2025.01906 2025
[13]

Parameter-efficient transfer learning for NLP.Proceedings of the 36th International Conference on Machine Learning, (97):2790–2799, 2019

Neil Houlsby, Andrei Giurgiu, Stanisław Jastrzebski, and Bruna Morrone. Parameter-efficient transfer learning for NLP.Proceedings of the 36th International Conference on Machine Learning, (97):2790–2799, 2019

work page 2019
[14]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InThe Tenth International Conference on Learning Representations, 2022

work page 2022
[15]

Editing models with task arithmetic

Gabriel Ilharco, Marco Túlio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[16]

Deep linear probe generators for weight space learning

Jonathan Kahana, Eliahu Horwitz, Imri Shuval, and Yedid Hoshen. Deep linear probe generators for weight space learning. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[17]

Scale equivariant graph metanetworks

Ioannis Kalogeropoulos, Giorgos Bouritsas, and Yannis Panagakis. Scale equivariant graph metanetworks. InAdvances in Neural Information Processing Systems 37, pages 106800–106840. Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024. doi: 10.52202/079017-3391. 10

work page doi:10.52202/079017-3391 2024
[18]

Lifting the veil on visual information flow in mllms: Unlocking pathways to faster inference

Alper Kayabasi, Anil Kumar Vadathya, Guha Balakrishnan, and Vishwanath Saragadam. Bias for action: Video implicit neural representations with bias modulation. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27999–28008. IEEE, 2025. doi: 10.1109/CVPR52734. 2025.02607

work page doi:10.1109/cvpr52734 2025
[19]

Supervised contrastive learning

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. InAdvances in Neural Information Processing Systems, volume 33, pages 18661–18673. Curran Associates, Inc., 2020

work page 2020
[20]

Burghouts, Efstratios Gavves, Cees G

Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, and David W. Zhang. Graph neural networks for learning equivariant representations of neural networks. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[21]

Meta-SGD: Learning to learn quickly for few-shot learning, 2017

Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-SGD: Learning to learn quickly for few-shot learning, 2017

work page 2017
[22]

Law, Jonathan Lorraine, and James Lucas

Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, and James Lucas. Graph metanetworks for pro- cessing diverse neural architectures. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[23]

In: CVPR

Zhen Liu, Hao Zhu, Qi Zhang, Jingde Fu, Weibing Deng, Zhan Ma, Yanwen Guo, and Xun Cao. FINER: Flexible spectral-bias tuning in implicit NEural representation by variableperiodic activation functions. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2713–2722. IEEE, 2024. doi: 10.1109/CVPR52733.2024.00262

work page doi:10.1109/cvpr52733.2024.00262 2024
[24]

ARC: Anchored representa- tion clouds for high-resolution INR classification, 2025

Joost Luijmes, Alexander Gielisse, Roman Knyazhitskiy, and Jan van Gemert. ARC: Anchored representa- tion clouds for high-resolution INR classification, 2025

work page 2025
[25]

Paudel, Ender Konukoglu, and Luc Van Gool

Qi Ma, Danda P. Paudel, Ender Konukoglu, and Luc Van Gool. Implicit zoo: A large-scale dataset of neural implicit functions for 2d images and 3d scenes.Advances in Neural Information Processing Systems, 37:88684–88704, 2024. doi: 10.52202/079017-2814

work page doi:10.52202/079017-2814 2024
[26]

Equivariant architectures for learning in deep weight spaces

Aviv Navon, Aviv Shamsian, Idan Achituve, Ethan Fetaya, Gal Chechik, and Haggai Maron. Equivariant architectures for learning in deep weight spaces. InInternational Conference on Machine Learning, volume 202, pages 25790–25816. PMLR, 2023

work page 2023
[27]

Equivariant deep weight space alignment

Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, and Haggai Maron. Equivariant deep weight space alignment. InForty-first International Conference on Machine Learning, 2024

work page 2024
[28]

In: CVPR

Samuele Papa, Riccardo Valperga, David Knigge, Miltiadis Kofinas, Phillip Lippe, Jan-Jakob Sonke, and Efstratios Gavves. How to train neural field representations: A comprehensive study and benchmark. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22616–22625. IEEE, 2024. doi: 10.1109/CVPR52733.2024.02134

work page doi:10.1109/cvpr52733.2024.02134 2024
[29]

Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. doi: 10.1073/pnas.2015509117

work page doi:10.1073/pnas.2015509117 2020
[30]

Efros, and Jitendra Malik

William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, and Jitendra Malik. Learning to learn with generative models of neural network checkpoints, 2022

work page 2022
[31]

Ensuring semantics in weights of implicit neural representa- tions through the implicit function theorem, 2026

Tianming Qiu, Christos Sonis, and Hao Shen. Ensuring semantics in weights of implicit neural representa- tions through the implicit function theorem, 2026

work page 2026
[32]

Self-supervised representation learning on neural network weights for model characteristic prediction

Konstantin Schürholt, Dimche Kostadinov, and Damian Borth. Self-supervised representation learning on neural network weights for model characteristic prediction. InAdvances in Neural Information Processing Systems, volume 34, pages 16481–16493, 2021

work page 2021
[33]

Hyper-representations as generative models: Sampling unseen neural network weights.Advances in Neural Information Processing Systems, 35:27906–27920, 2022

Konstantin Schürholt, Boris Knyazev, Xavier Giró-i Nieto, and Damian Borth. Hyper-representations as generative models: Sampling unseen neural network weights.Advances in Neural Information Processing Systems, 35:27906–27920, 2022

work page 2022
[34]

Model zoos: A dataset of diverse populations of neural network models.Advances in Neural Information Processing Systems, 35:38134–38148, 2022

Konstantin Schürholt, Diyar Taskiran, Boris Knyazev, Xavier Giró-i Nieto, and Damian Borth. Model zoos: A dataset of diverse populations of neural network models.Advances in Neural Information Processing Systems, 35:38134–38148, 2022

work page 2022
[35]

Zhang, Yan Zhang, Ethan Fetaya, Gal Chechik, and Haggai Maron

Aviv Shamsian, Aviv Navon, David W. Zhang, Yan Zhang, Ethan Fetaya, Gal Chechik, and Haggai Maron. Improved generalization of weight space networks via augmentations. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 44378–44393. PMLR, 2024. 11

work page 2024
[36]

Implicit neural representations with periodic activation functions

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. InAdvances in Neural Information Processing Systems, volume 33, pages 7462–7473. Curran Associates, Inc., 2020

work page 2020
[37]

Tran, Thieu N

Hoang V . Tran, Thieu N. V o, Tho Huu, An Nguyen The, and Tan Minh Nguyen. Monomial matrix group equivariant neural functional networks. InAdvances in Neural Information Processing Systems 37, pages 48628–48665. Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024. doi: 10.52202/079017-1541

work page doi:10.52202/079017-1541 2024
[38]

Predicting neural network accuracy from weights, 2020

Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, and Ilya Tolstikhin. Predicting neural network accuracy from weights, 2020

work page 2020
[39]

A discriminative feature learning approach for deep face recognition

Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors,Computer Vision – ECCV 2016, volume 9911, pages 499–515. Springer International Publishing, 2016. doi: 10.1007/ 978-3-319-46478-7_31

work page 2016
[40]

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

Gizem Yuce, Guillermo Ortiz-Jimenez, Beril Besbinar, and Pascal Frossard. A structured dictionary perspective on implicit neural representations. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19206–19216. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01863

work page doi:10.1109/cvpr52688.2022.01863 2022
[41]

Understanding bias terms in neural representations

Weixiang Zhang, Boxi Li, Shuzhao Xie, Chengwei Ren, Yuan Xue, and Zhi Wang. Understanding bias terms in neural representations. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026
[42]

Zico Kolter, and Chelsea Finn

Allan Zhou, Kaien Yang, Kaylee Burns, Adriano Cardace, Yiding Jiang, Samuel Sokota, J. Zico Kolter, and Chelsea Finn. Permutation equivariant neural functionals.Advances in Neural Information Processing Systems, 36:24966–24992, 2023

work page 2023
[43]

Zico Kolter, and Chelsea Finn

Allan Zhou, Kaien Yang, Yiding Jiang, Kaylee Burns, Winnie Xu, Samuel Sokota, J. Zico Kolter, and Chelsea Finn. Neural functional transformers.Advances in Neural Information Processing Systems, 36: 77485–77502, 2023

work page 2023
[44]

no auxiliary loss

Allan Zhou, Chelsea Finn, and James Harrison. Universal neural functionals. InAdvances in Neural Information Processing Systems 37, pages 104754–104775. Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024. doi: 10.52202/079017-3326. 12 A Model definitions and training conventions This appendix records the model definitions and training ...

work page doi:10.52202/079017-3326 2024