hub

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng · 2023 · cs.CL · arXiv 2303.10512

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it

open full Pith review browse 27 citing papers arXiv PDF

abstract

Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.

Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.

BoostLoRA: Growing Effective Rank by Boosting Adapters

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

BoostLoRA grows effective adapter rank linearly via iterative boosting on hard examples with orthogonal low-rank updates, outperforming both single-shot ultra-low-rank adapters and full fine-tuning on math and code tasks with zero added inference overhead.

Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.

BioVLM: Routing Prompts, Not Parameters, for Cross-Modality Generalization in Biomedical VLMs

cs.CV · 2026-04-19 · unverdicted · novelty 7.0

BioVLM achieves state-of-the-art cross-modality generalization on biomedical VLMs by learning a prompt bank and routing inputs to the most discriminative prompts via low-entropy selection plus LLM distillation.

SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates

cs.LG · 2026-04-12 · unverdicted · novelty 7.0

LoRA weight updates are spectrally sparse, with 33% of DCT coefficients capturing 90% of energy on average, enabling 10x storage reduction and occasional gains by masking high frequencies.

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

MinT enables efficient management of million-scale LoRA-adapted LLM policies over shared 1T-parameter base models by moving only small adapters through training and serving pipelines.

Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.

Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

cs.LG · 2026-04-26 · conditional · novelty 6.0 · 2 refs

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter and memory usage.

Sensitivity-Positional Co-Localization in GQA Transformers

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU, GPQA, HumanEval+, MATH, MGSM and ARC.

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

cs.CR · 2026-04-09 · unverdicted · novelty 6.0

ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.

Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems

cs.NE · 2026-04-07 · unverdicted · novelty 6.0

CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

cs.LG · 2026-04-04 · conditional · novelty 6.0

Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.

Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

PoLAR-VBLL combines orthogonalized low-rank adapters with variational Bayesian last-layer inference to enable scalable, well-calibrated uncertainty quantification in fine-tuned LLMs.

CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

CLIPer uses classifier guidance during inference to personalize LLM generations across single and multi-dimensional user preferences without extensive fine-tuning.

Text-Guided Multi-Scale Frequency Representation Adaptation

cs.CV · 2026-05-05 · unverdicted · novelty 5.0

FreqAdapter adapts multimodal models by text-guided multi-scale fine-tuning in the frequency domain, claiming better performance and efficiency than signal-space PEFT methods.

Dynamic Scaled Gradient Descent for Stable Fine-Tuning for Classifications

cs.LG · 2026-04-30 · unverdicted · novelty 5.0

Dynamic scaled gradient descent prevents fine-tuning collapse by dynamically down-weighting gradients of correct examples, yielding lower performance variance and higher accuracy than standard methods on classification benchmarks.

ChipLingo: A Systematic Training Framework for Large Language Models in EDA

cs.LG · 2026-04-30 · unverdicted · novelty 5.0

ChipLingo trains LLMs on EDA data via corpus construction, domain-adaptive pretraining, and RAG scenario alignment, reaching 59.7% accuracy with an 8B model and 70.02% with a 32B model on a new internal EDA benchmark.

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

eess.AS · 2026-04-29 · unverdicted · novelty 5.0

Dual-LoRA with a language-anchored adversary achieves 0.91% EER on the TidyVoice benchmark for cross-lingual speaker verification by targeting true linguistic cues while preserving speaker discriminability.

HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.

citing papers explorer

Showing 27 of 27 citing papers.

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning cs.CL · 2026-05-08 · unverdicted · none · ref 25 · internal anchor
MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.
Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA cs.LG · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.
Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration cs.CV · 2026-05-07 · unverdicted · none · ref 60 · internal anchor
CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.
BoostLoRA: Growing Effective Rank by Boosting Adapters cs.LG · 2026-04-30 · unverdicted · none · ref 38 · internal anchor
BoostLoRA grows effective adapter rank linearly via iterative boosting on hard examples with orthogonal low-rank updates, outperforming both single-shot ultra-low-rank adapters and full fine-tuning on math and code tasks with zero added inference overhead.
Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning cs.LG · 2026-04-29 · unverdicted · none · ref 11 · internal anchor
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
BioVLM: Routing Prompts, Not Parameters, for Cross-Modality Generalization in Biomedical VLMs cs.CV · 2026-04-19 · unverdicted · none · ref 10 · internal anchor
BioVLM achieves state-of-the-art cross-modality generalization on biomedical VLMs by learning a prompt bank and routing inputs to the most discriminative prompts via low-entropy selection plus LLM distillation.
SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates cs.LG · 2026-04-12 · unverdicted · none · ref 7 · internal anchor
LoRA weight updates are spectrally sparse, with 33% of DCT coefficients capturing 90% of energy on average, enabling 10x storage reduction and occasional gains by masking high frequencies.
MinT: Managed Infrastructure for Training and Serving Millions of LLMs cs.LG · 2026-05-13 · unverdicted · none · ref 34 · internal anchor
MinT enables efficient management of million-scale LoRA-adapted LLM policies over shared 1T-parameter base models by moving only small adapters through training and serving pipelines.
Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation cs.LG · 2026-05-08 · unverdicted · none · ref 14 · internal anchor
Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.
Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces cs.AI · 2026-05-04 · unverdicted · none · ref 54 · internal anchor
JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation cs.LG · 2026-04-26 · conditional · none · ref 38 · 2 links · internal anchor
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
TLoRA: Task-aware Low Rank Adaptation of Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 28 · internal anchor
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning cs.LG · 2026-04-10 · unverdicted · none · ref 140 · internal anchor
MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter and memory usage.
Sensitivity-Positional Co-Localization in GQA Transformers cs.CL · 2026-04-09 · unverdicted · none · ref 5 · internal anchor
In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU, GPQA, HumanEval+, MATH, MGSM and ARC.
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training cs.CR · 2026-04-09 · unverdicted · none · ref 65 · internal anchor
ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems cs.NE · 2026-04-07 · unverdicted · none · ref 16 · internal anchor
CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures cs.LG · 2026-04-04 · conditional · none · ref 9 · internal anchor
Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.
Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters cs.LG · 2026-04-03 · unverdicted · none · ref 17 · internal anchor
PoLAR-VBLL combines orthogonalized low-rank adapters with variational Bayesian last-layer inference to enable scalable, well-calibrated uncertainty quantification in fine-tuned LLMs.
CLIPer: Tailoring Diverse User Preference via Classifier-Guided Inference-Time Personalization cs.CL · 2026-05-08 · unverdicted · none · ref 1 · internal anchor
CLIPer uses classifier guidance during inference to personalize LLM generations across single and multi-dimensional user preferences without extensive fine-tuning.
Text-Guided Multi-Scale Frequency Representation Adaptation cs.CV · 2026-05-05 · unverdicted · none · ref 37 · internal anchor
FreqAdapter adapts multimodal models by text-guided multi-scale fine-tuning in the frequency domain, claiming better performance and efficiency than signal-space PEFT methods.
Dynamic Scaled Gradient Descent for Stable Fine-Tuning for Classifications cs.LG · 2026-04-30 · unverdicted · none · ref 4 · internal anchor
Dynamic scaled gradient descent prevents fine-tuning collapse by dynamically down-weighting gradients of correct examples, yielding lower performance variance and higher accuracy than standard methods on classification benchmarks.
ChipLingo: A Systematic Training Framework for Large Language Models in EDA cs.LG · 2026-04-30 · unverdicted · none · ref 16 · internal anchor
ChipLingo trains LLMs on EDA data via corpus construction, domain-adaptive pretraining, and RAG scenario alignment, reaching 59.7% accuracy with an 8B model and 70.02% with a 32B model on a new internal EDA benchmark.
Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification eess.AS · 2026-04-29 · unverdicted · none · ref 28 · internal anchor
Dual-LoRA with a language-anchored adversary achieves 0.91% EER on the TidyVoice benchmark for cross-lingual speaker verification by targeting true linguistic cues while preserving speaker discriminability.
HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation cs.LG · 2026-04-20 · unverdicted · none · ref 34 · internal anchor
HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.
A Benchmark Study of Segmentation Models and Adaptation Strategies for Landslide Detection from Satellite Imagery cs.CV · 2026-04-17 · unverdicted · none · ref 12 · internal anchor
Transformer-based models deliver strong landslide segmentation on satellite images, and parameter-efficient fine-tuning matches full fine-tuning accuracy while cutting trainable parameters by up to 95%.
Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework cs.LG · 2026-04-14 · unverdicted · none · ref 26 · internal anchor
A low-rank mixture of experts model trained on handwriting data delivers strong Alzheimer's diagnosis performance with substantially reduced parameter activation during inference.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 83 · internal anchor
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer