LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
hub
Adapterfusion: Non-destructive task composition for transfer learning
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
polarities
background 2representative citing papers
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
SECOS enables direct semantic label prediction in open-world semi-supervised learning by aligning representations with external knowledge for novel classes, outperforming prior methods by up to 5.4% even without post-hoc matching.
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
AdaLoRA uses SVD-based pruning to allocate the parameter budget for low-rank fine-tuning updates according to per-matrix importance scores, yielding better performance than uniform allocation especially under tight budgets.
A temporal extension of TabDDPM generates coherent synthetic time-series sequences on the WISDM dataset that match real distributions and support downstream classification with macro F1 of 0.64.
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.
A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.
citing papers explorer
-
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
-
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
-
Substrate Asymmetry in User-Side Memory: A Diagnostic Framework
User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.
-
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
-
SECOS: Semantic Capture for Rigorous Classification in Open-World Semi-Supervised Learning
SECOS enables direct semantic label prediction in open-world semi-supervised learning by aligning representations with external knowledge for novel classes, outperforming prior methods by up to 5.4% even without post-hoc matching.
-
TLoRA: Task-aware Low Rank Adaptation of Large Language Models
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
-
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
AdaLoRA uses SVD-based pruning to allocate the parameter budget for low-rank fine-tuning updates according to per-matrix importance scores, yielding better performance than uniform allocation especially under tight budgets.
-
Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation
A temporal extension of TabDDPM generates coherent synthetic time-series sequences on the WISDM dataset that match real distributions and support downstream classification with macro F1 of 0.64.
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
-
Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.
-
Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)
A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.