VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.
hub Canonical reference
The Power of Scale for Parameter-Efficient Prompt Tuning
Canonical reference. 75% of citing Pith papers cite this work as background.
abstract
In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of paramet
co-cited works
representative citing papers
Gradient and greedy search over token suffixes produces universal, transferable adversarial prompts that elicit objectionable outputs from aligned models including black-box commercial systems.
Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.
Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.
ART optimizes visual pixel inputs to frozen MLLMs to achieve LoRA-competitive accuracy on math and structured tool-use benchmarks without modifying computational graphs.
LR-LoRA learns per-layer adapter ranks during training and reports outperforming fixed-rank LoRA and other PEFT baselines on language understanding and commonsense reasoning tasks.
Domain-incremental video learning that permits forgetting through per-domain LoRA adapters and recovers the matching adapter at inference via test-time training on a self-supervised MAE reconstruction head.
SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.
CellxPert uses inference-time MCMC steering on a multi-omics single-cell foundation model to predict genome-wide transcriptomic responses to gene perturbations and outperforms baselines on cell-type annotation, perturbation prediction, and multi-omic integration benchmarks.
ConfSMoE adds expert-opinion imputation and detaches softmax routing scores to ground-truth task confidence to relieve expert collapse in SMoE without extra load-balance losses, evaluated on four real-world datasets.
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
PagedAttention achieves near-zero waste in LLM key-value cache memory and enables 2-4x higher serving throughput than prior systems.
Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.
In-context learning emerges as implicit Bayesian inference of latent concepts when pretraining data has long-range coherence, proven for mixture-of-HMM distributions and replicated on the synthetic GINC dataset.
Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
EPnG reallocates LoRA capacity in MoE models by pruning experts with low router gate probabilities and expanding high-importance ones via rank growth, outperforming standard LoRA and nearing full fine-tuning performance with 0.55-0.72% parameters updated.
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.
Flatness Preference Optimization (FlatPO) improves multimodal PEFT generalization by flattening a small set of sharp dimensions that dominate performance.
Introduces MM-Privacy dataset and evaluations showing MLLMs leak sensitive data from images in various tasks, highlighting task inconsistency effects.
citing papers explorer
-
Towards an AI co-scientist
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
-
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigating catastrophic forgetting.
-
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems
A comprehensive reference book organizing existing techniques for agentic AI systems across LLM substrate, reasoning, agent design patterns, inter-agent coordination, and production deployment.