pith. machine review for the scientific record. sign in

arxiv: 2605.08589 · v1 · submitted 2026-05-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

S2FT: Parameter-Efficient Fine-Tuning in Sparse Spectrum Domain

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:56 UTC · model grok-4.3

classification 💻 cs.CV
keywords parameter-efficient fine-tuningsparse spectrumFourier transformweight rearrangementPEFTnearest neighbor searchmodel adaptation
0
0 comments X

The pith

S2FT finds an invertible rearrangement of a coarse weight-change estimate that turns uniform-spectrum updates into sparse-spectrum ones, allowing fine-tuning with 0.08 percent of the parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that weight changes during fine-tuning do not have sparse spectra as assumed by earlier Fourier methods, but instead follow a power-uniform distribution. This makes tuning only a few spectral coefficients ineffective for accurate modeling. S2FT addresses it by pre-estimating a coarse weight change and using nearest-neighbor search to find row and column rearrangements that impose local smoothness, creating a sparse spectrum in a transformed domain. Fine-tuning then occurs on the few spectral coefficients of this sparse representation, with the inverse transformation recovering the full update. A sympathetic reader would care because this promises to adapt massive pretrained models to new tasks while training orders of magnitude fewer parameters than standard approaches.

Core claim

S2FT proposes an invertible transformation obtained by rearranging rows and columns of a pre-estimated coarse weight change via nearest-neighbor search, which maps a latent sparse-spectrum matrix to the observed weight change with uniform spectrum. By performing parameter-efficient updates only on the sparse spectral coefficients in this domain and applying the inverse, the method achieves superior adaptation performance using just 0.08% of the training parameters.

What carries the argument

The nearest-neighbor search for row-and-column rearrangement on the pre-estimated weight change, which enforces local spatial smoothness corresponding to sparse spectra while preserving neuron structure.

If this is right

  • Only 0.08% of parameters need training to achieve better results than prior spectral PEFT methods.
  • The weight update can be accurately modeled by few coefficients once rearranged into a sparse-spectrum form.
  • Rearrangement is found simply without exhaustive search or additional optimization.
  • Performance gains come from operating in the transformed sparse domain rather than the original uniform one.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining this rearrangement idea with low-rank methods like LoRA might further reduce parameters.
  • The observation of uniform spectra could apply to other adaptation techniques beyond Fourier transforms.
  • If the coarse estimate comes from a quick pass or smaller model, it could make the method even more efficient in practice.

Load-bearing premise

That rearranging a coarse pre-estimate via nearest neighbors will consistently yield a sparse enough spectrum for the actual weight change to be captured by few coefficients.

What would settle it

If the spectrum of the rearranged weight change remains power-uniform or if fine-tuning performance drops below standard PEFT baselines at the 0.08% parameter budget.

Figures

Figures reproduced from arXiv: 2605.08589 by Baoquan Zhang, Kenghong Lin, Lisai Zhang, Tianran Chen, Yao He, Yunming Ye, Yuxi Sun, Zhehao Yu.

Figure 1
Figure 1. Figure 1: Existing FourierFT (a) directly models weight change [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution analysis of weight change ∆W in spatial and spectral domains. 3. Methodology 3.1. Preliminaries and Motivation Analysis 3.1.1. Preliminaries: FourierFT Formally, let W denotes a pre-trained weight matrix, i.e., W ∈ R d×k and ∆W is its weight change after adapting to downstream tasks. The key challenge of achieving PEFT is how to model the ∆W by using only few trainable pa￾rameters. Recently, G… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of spectrum and sampling points. LF/MF/HF [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution analysis of the spatial-domain matrix [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison between FourierFT and our S [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Parameter Efficient Fine-Tuning (PEFT) is a key technique for adapting a large pretrained model to downstream tasks by fine-tuning only a small number of parameters. Recent methods based on Fourier transforms have further reduced the fine-tuned parameters scale by only fine-tuning a few spectral coefficients. Its basic assumption is that the weight change \delta W is a spatial-domain matrix with a sparse spectrum. However, in this paper, we observe that the spectrum of weight change is not sparse, but instead distributed like power-uniform. This fact implies that fine-tuning only a few spectral coefficients is insufficient to accurately model the weight change with uniform spectrum. To address this issue, we propose to seek an invertible transformation that can transform a latent spatial-domain matrix with sparse spectrum to the weight change, and then perform PEFT on such sparse spectrum domain with few spectral coefficients, called S2FT. To seek such transformation, we first pre-estimate a coarse weight change as a prior. Then, inspired by that sparse spectrum often correspond to locally smooth spatial structures, we regard this transformation as a row and column rearrangement operation on the pre-estimated weight change that smooth spatial structures while keep the structure information of neurons. Finally, we propose to solve the rearrangement search problem in a simple nearest neighbor search manner, thereby obtaining the invertible transformation. Extensive results show our S2FT achieves superior performance by only using 0.08% training parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper observes that weight-change matrices δW during fine-tuning exhibit power-uniform spectra rather than sparse ones, rendering direct spectral PEFT ineffective. It proposes S2FT, which first obtains a coarse pre-estimate of δW, then applies a nearest-neighbor search to find an invertible row-and-column rearrangement that induces local smoothness (hence sparsity) in the spectrum, and finally fine-tunes only a small number (0.08 %) of spectral coefficients in the transformed domain. Extensive experiments are said to demonstrate superior performance over existing PEFT methods.

Significance. If the rearrangement reliably produces a sufficiently sparse spectrum and the pre-estimate is accurate enough to locate the correct permutation, S2FT would constitute a meaningful advance in extreme parameter-efficient fine-tuning by moving the problem into a domain where a tiny fraction of coefficients suffices. The empirical observation of power-uniform spectra and the heuristic rearrangement search are potentially useful contributions, but the absence of direct measurements of achieved sparsity and pre-estimate fidelity limits the strength of the current evidence.

major comments (3)
  1. [Abstract / Method] Abstract and Method section: the central claim that the nearest-neighbor row/column rearrangement of the coarse pre-estimate produces a spectrum sparse enough for 0.08 % coefficients to suffice is not supported by any quantitative measurement (e.g., sorted coefficient-magnitude curves, energy concentration ratios, or sparsity metrics before versus after rearrangement). Without such evidence the parameter-efficiency argument remains unsubstantiated.
  2. [Method] Method section (rearrangement search): the paper provides no analysis or ablation showing that the coarse pre-estimate is sufficiently accurate to recover a permutation that sparsifies the true (unknown) δW; if the pre-estimate error is large, the nearest-neighbor search may select a transformation under which the final spectrum remains close to uniform, collapsing the efficiency gain.
  3. [Experiments] Experiments section: reported performance gains are presented without error bars, without an ablation isolating the contribution of the rearrangement versus the pre-estimate alone, and without direct verification that the post-rearrangement spectra are indeed sparse; these omissions make it impossible to assess whether the claimed superiority is robust or merely an artifact of the heuristic design choices.
minor comments (2)
  1. [Method] Notation for the rearrangement operator and the spectral coefficients should be introduced with explicit equations rather than prose descriptions.
  2. [Abstract] The abstract states an empirical observation about power-uniform spectra; a brief quantitative characterization (e.g., average decay exponent or Gini coefficient of the spectrum) would strengthen the motivation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We agree that the manuscript would benefit from additional quantitative evidence on sparsity, analysis of the pre-estimate, and experimental ablations with error bars. We will incorporate these in the revised version.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and Method section: the central claim that the nearest-neighbor row/column rearrangement of the coarse pre-estimate produces a spectrum sparse enough for 0.08 % coefficients to suffice is not supported by any quantitative measurement (e.g., sorted coefficient-magnitude curves, energy concentration ratios, or sparsity metrics before versus after rearrangement). Without such evidence the parameter-efficiency argument remains unsubstantiated.

    Authors: We agree that direct quantitative measurements are needed to substantiate the sparsity claim. In the revision we will add sorted coefficient-magnitude curves, energy concentration ratios, and sparsity metrics (e.g., percentage of energy in the top-k coefficients) for representative weight-change matrices, shown both before and after the nearest-neighbor rearrangement. These plots will demonstrate the improvement in spectral sparsity that enables the 0.08 % coefficient regime. revision: yes

  2. Referee: [Method] Method section (rearrangement search): the paper provides no analysis or ablation showing that the coarse pre-estimate is sufficiently accurate to recover a permutation that sparsifies the true (unknown) δW; if the pre-estimate error is large, the nearest-neighbor search may select a transformation under which the final spectrum remains close to uniform, collapsing the efficiency gain.

    Authors: We acknowledge the absence of explicit analysis on pre-estimate fidelity. We will add an ablation that systematically varies pre-estimate quality (by changing the number of calibration samples or injecting controlled noise) and reports the resulting post-rearrangement spectral sparsity together with downstream fine-tuning accuracy. This will quantify how robust the nearest-neighbor search remains under realistic pre-estimate error. revision: yes

  3. Referee: [Experiments] Experiments section: reported performance gains are presented without error bars, without an ablation isolating the contribution of the rearrangement versus the pre-estimate alone, and without direct verification that the post-rearrangement spectra are indeed sparse; these omissions make it impossible to assess whether the claimed superiority is robust or merely an artifact of the heuristic design choices.

    Authors: We will revise the experimental section to include (i) error bars from at least three independent runs with different random seeds, (ii) an ablation that applies the spectral PEFT directly to the pre-estimate without rearrangement, and (iii) additional figures that verify post-rearrangement spectral sparsity on the same layers used for the main results. These changes will allow readers to isolate the rearrangement contribution and assess robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper observes that weight-change spectra are power-uniform rather than sparse, then constructs an invertible row/column rearrangement from a coarse pre-estimate via nearest-neighbor search to induce local smoothness before spectral fine-tuning. No equation or performance claim reduces by construction to a quantity defined by the method's own fitted parameters; the rearrangement is computed once from the pre-estimate and applied independently, while the 0.08% parameter claim is supported by external empirical results rather than a self-referential loop. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method rests on one domain assumption about smoothness and one invented procedural entity; no numerical free parameters are declared in the abstract.

axioms (1)
  • domain assumption Sparse spectrum often corresponds to locally smooth spatial structures
    Invoked to justify treating the transformation as a row-and-column rearrangement that smooths while preserving neuron structure.
invented entities (1)
  • invertible row-column rearrangement transformation no independent evidence
    purpose: Maps pre-estimated weight change into a latent matrix whose spectrum is sparse enough for few-coefficient fine-tuning
    The transformation is constructed by nearest-neighbor search rather than derived from first principles.

pith-pipeline@v0.9.0 · 5574 in / 1327 out tokens · 33101 ms · 2026-05-12T00:56:40.597496+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

  1. [1]

    Artech House, 2012

    David Brandwood.Fourier transforms in radar and signal processing. Artech House, 2012

  2. [2]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  3. [3]

    Conv-adapter: Exploring parameter efficient transfer learning for convnets

    Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Xiang Li, Wei Ye, Jindong Wang, Guosheng Hu, and Marios Savvides. Conv-adapter: Exploring parameter efficient transfer learning for convnets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1551–1561, 2024

  4. [4]

    QuanTA: Efficient high-rank fine-tuning of LLMs with quantum-informed ten- sor adaptation

    Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen M Dugan, Di Luo, and Marin Soljacic. QuanTA: Efficient high-rank fine-tuning of LLMs with quantum-informed ten- sor adaptation. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  5. [5]

    Vicuna: An open- source chatbot impressing gpt-4 with 90%* chatgpt quality

    Wei-Lin Chiang, Zhuohan Li, Ziqing Lin, Ying Sheng, Zhang- hao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yong- hao Zhuang, Joseph E Gonzalez, et al. Vicuna: An open- source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3): 6, 2023

  6. [6]

    Tim OF Conrad, Martin Genzel, Nada Cvetkovic, Niklas Wulkow, Alexander Leichtle, Jan Vybiral, Gitta Kutyniok, and Christof Schütte. Sparse proteomics analysis–a com- pressed sensing-based approach for feature selection and clas- sification of high-dimensional proteomics mass spectrometry data.BMC Bioinformatics, 18:1–20, 2017

  7. [7]

    An image is worth 16x16 words: Transform- ers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Transform- ers for image recognition at scale. InInternational Conference on Learning Representations, 2020

  8. [8]

    Deep residual learning in the jpeg transform domain

    Max Ehrlich and Larry S Davis. Deep residual learning in the jpeg transform domain. InProceedings of the IEEE/CVF international conference on computer vision, pages 3484– 3493, 2019

  9. [9]

    Parameter-efficient fine- tuning with discrete fourier transform.arXiv preprint arXiv:2405.03003, 2024

    Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter-efficient fine- tuning with discrete fourier transform.arXiv preprint arXiv:2405.03003, 2024

  10. [10]

    Fine-grained car detection for visual census estimation

    Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, and Li Fei-Fei. Fine-grained car detection for visual census estimation. InProceedings of the AAAI Conference on Artificial Intelligence, 2017

  11. [11]

    Faster neural networks straight from jpeg.Advances in Neural Information Processing Systems, 31, 2018

    Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. Faster neural networks straight from jpeg.Advances in Neural Information Processing Systems, 31, 2018

  12. [12]

    Pela: Learning parameter-efficient models with low-rank ap- proximation

    Yangyang Guo, Guangzhi Wang, and Mohan Kankanhalli. Pela: Learning parameter-efficient models with low-rank ap- proximation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15699– 15709, 2024

  13. [13]

    E 2 vpt: An effective and efficient approach for visual prompt tuning

    Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wen- guan Wang, Siyuan Qi, and Dongfang Liu. E 2 vpt: An effective and efficient approach for visual prompt tuning. In 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 17445–17456. IEEE, 2023

  14. [14]

    Lora+: Efficient low rank adaptation of large models

    Soufiane Hayou, Nikhil Ghosh, and Bin Yu. Lora+: Efficient low rank adaptation of large models. InForty-first Interna- tional Conference on Machine Learning

  15. [15]

    Sensitivity-aware visual parameter-efficient fine- tuning

    Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, and Bohan Zhuang. Sensitivity-aware visual parameter-efficient fine- tuning. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 11825–11835, 2023

  16. [16]

    SMT: Fine-tuning large language models with sparse matri- ces

    Haoze He, Juncheng B Li, Xuan Jiang, and Heather Miller. SMT: Fine-tuning large language models with sparse matri- ces. InThe Thirteenth International Conference on Learning Representations, 2025

  17. [17]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022

  18. [18]

    Parameter-efficient transfer learning for nlp

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019

  19. [19]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

  20. [20]

    HiRA: Parameter-efficient hadamard high-rank adap- tation for large language models

    Qiushi Huang, Tom Ko, Zhan Zhuang, Lilian Tang, and Yu Zhang. HiRA: Parameter-efficient hadamard high-rank adap- tation for large language models. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025

  21. [21]

    Hurkens and Gerhard J

    Cor A.J. Hurkens and Gerhard J. Woeginger. On the nearest neighbor rule for the traveling salesman problem.Operations Research Letters, 32(1):1–4, 2004

  22. [22]

    Visual prompt tuning

    Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean Conference on Computer Vision, pages 709–727. Springer, 2022

  23. [23]

    Novel dataset for fine-grained image cat- egorization: Stanford dogs

    Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine-grained image cat- egorization: Stanford dogs. InProc. CVPR workshop on fine-grained visual categorization (FGVC), 2011

  24. [24]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 4015–4026, 2023

  25. [25]

    Vera: Vector-based random matrix adaptation

    Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Vera: Vector-based random matrix adaptation. InThe Twelfth International Conference on Learning Representations

  26. [26]

    Scaling & shifting your features: A new baseline for efficient model tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022

    Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: A new baseline for efficient model tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022

  27. [27]

    HMoRA: Making LLMs more effective with hierarchical mixture of loRA experts

    Mengqi Liao, Wei Chen, Junfeng Shen, Shengnan Guo, and Huaiyu Wan. HMoRA: Making LLMs more effective with hierarchical mixture of loRA experts. InThe Thirteenth Inter- national Conference on Learning Representations, 2025

  28. [28]

    Vision transformers are parameter-efficient audio-visual learners

    Yan-Bo Lin, Yi-Lin Sung, Jie Lei, et al. Vision transformers are parameter-efficient audio-visual learners. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2299–2309, 2023

  29. [29]

    Dora: Weight-decomposed low-rank adaptation

    Shih-yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation. InForty-first International Conference on Ma- chine Learning

  30. [30]

    Time-memory-and parameter-efficient visual adaptation

    Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, and Anurag Arnab. Time-memory-and parameter-efficient visual adaptation. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 5536–5545, 2024

  31. [31]

    Variational multi- phase segmentation using high-dimensional local features

    Niklas Mevenkamp and Benjamin Berkels. Variational multi- phase segmentation using high-dimensional local features. In WACV, 2016

  32. [32]

    Automated flower classification over a large number of classes

    Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008

  33. [33]

    Parameter efficient fine-tuning via cross block orchestration for segment anything model

    Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, and Wei Shen. Parameter efficient fine-tuning via cross block orchestration for segment anything model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3743–3752, 2024

  34. [34]

    Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Askell Amanda, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever

    Alec Radford, JongWook Kim, Chris Hallacy, A. Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Askell Amanda, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from nat- ural language supervision.Cornell University - arXiv,Cornell University - arXiv, 2021

  35. [35]

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven gen- eration

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven gen- eration. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 22500–22510, 2023

  36. [36]

    Rethinking graph neural networks for anomaly detection

    Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. Rethinking graph neural networks for anomaly detection. InInternational conference on machine learning, pages 21076–21089. PMLR, 2022

  37. [37]

    Hydralora: An asymmetric lora architecture for efficient fine-tuning

    Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Chengzhong Xu. Hydralora: An asymmetric lora architecture for efficient fine-tuning. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  38. [38]

    InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7725–7735, 2023

    Cheng-Hao Tu, Zheda Mai, and Wei-Lun Chao. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7725–7735, 2023

  39. [39]

    Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

    Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Be- longie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. InProceedings of the IEEE conference on com- puter vision and pattern recognition, pages 595–604, 2015

  40. [40]

    The caltech-ucsd birds-200-2011 dataset

    Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. 2011

  41. [41]

    Glue: A multi-task bench- mark and analysis platform for natural language understand- ing

    Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. Glue: A multi-task bench- mark and analysis platform for natural language understand- ing. InProceedings of the 2018 EMNLP Workshop Black- boxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018

  42. [42]

    Self-instruct: Aligning language models with self-generated instructions

    Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), pages 13484–13508, 2023

  43. [43]

    PaCA: Partial connec- tion adaptation for efficient fine-tuning

    Sunghyeon Woo, Sol Namkung, Sunwoo Lee, Inho Jeong, Beomseok Kim, and Dongsuk Jeon. PaCA: Partial connec- tion adaptation for efficient fine-tuning. InThe Thirteenth International Conference on Learning Representations, 2025

  44. [44]

    Manning, and Christo- pher Potts

    Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, and Christo- pher Potts. Reft: Representation finetuning for language mod- els. InAdvances in Neural Information Processing Systems, pages 63908–63962, 2024

  45. [45]

    Learning in the frequency domain

    Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. Learning in the frequency domain. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  46. [46]

    1% vs 100%: Parameter-efficient low rank adapter for dense predictions

    Dongshuo Yin, Yiran Yang, Zhechao Wang, Hongfeng Yu, Kaiwen Wei, and Xian Sun. 1% vs 100%: Parameter-efficient low rank adapter for dense predictions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20116–20126, 2023

  47. [47]

    Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

    Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, 2022

  48. [48]

    A large-scale study of representation learning with the visual task adaptation benchmark

    Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, An- dre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, et al. A large-scale study of representation learning with the visual task adaptation benchmark.arXiv preprint arXiv:1910.04867, 2019

  49. [49]

    Code- book transfer with part-of-speech for vector-quantized image modeling

    Baoquan Zhang, Huaibin Wang, Chuyao Luo, Xutao Li, Guo- tao Liang, Yunming Ye, Xiaochen Qi, and Yao He. Code- book transfer with part-of-speech for vector-quantized image modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7757–7766, 2024

  50. [50]

    Zookt: Task-adaptive knowledge transfer of model zoo for few-shot learning.Pattern Recogni- tion, 158:110960, 2025

    Baoquan Zhang, Bingqi Shan, Aoxue Li, Chuyao Luo, Yun- ming Ye, and Zhenguo Li. Zookt: Task-adaptive knowledge transfer of model zoo for few-shot learning.Pattern Recogni- tion, 158:110960, 2025

  51. [51]

    LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

    Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, and Bo Li. Lora-fa: Memory-efficient low-rank adapta- tion for large language models fine-tuning.arXiv preprint arXiv:2308.03303, 2023

  52. [52]

    Efros, Eli Shechtman, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep fea- tures as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

  53. [53]

    Gradient-based parameter selection for efficient fine-tuning

    Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, and Shanghang Zhang. Gradient-based parameter selection for efficient fine-tuning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28566–28577, 2024

  54. [54]

    Galore: Memory- efficient llm training by gradient low-rank projection

    Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, and Yuandong Tian. Galore: Memory- efficient llm training by gradient low-rank projection. In Forty-first International Conference on Machine Learning

  55. [55]

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

  56. [56]

    Asymmetry in low-rank adapters of foundation models

    Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Saez De Ocariz Borde, Rickard Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, and Justin Solomon. Asymmetry in low-rank adapters of foundation models. InInternational Conference on Learning Represen- tations, 2024