Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
Roberta: A robustly optimized bert pretraining approach
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
CS conferences should require nonrepudiable experimental results via signed attestations that prevent authors from later altering or denying reported numbers.
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.
citing papers explorer
-
LoRA: Low-Rank Adaptation of Large Language Models
Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
-
Computer Science Conferences Should Require Nonrepudiable Experimental Results
CS conferences should require nonrepudiable experimental results via signed attestations that prevent authors from later altering or denying reported numbers.
-
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.