{"work":{"id":"0426219a-789e-4964-adc8-a04538510818","openalex_id":null,"doi":null,"arxiv_id":"2106.09685","raw_key":null,"title":"LoRA: Low-Rank Adaptation of Large Language Models","authors":null,"authors_text":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang","year":2021,"venue":"cs.CL","abstract":"An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.","external_url":"https://arxiv.org/abs/2106.09685","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-14T21:15:16.289399+00:00","pith_arxiv_id":"2106.09685","created_at":"2026-05-08T18:49:25.859185+00:00","updated_at":"2026-05-14T21:15:16.289399+00:00","title_quality_ok":true,"display_title":"LoRA: Low-Rank Adaptation of Large Language Models","render_title":"LoRA: Low-Rank Adaptation of Large Language Models"},"hub":{"state":{"work_id":"0426219a-789e-4964-adc8-a04538510818","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":241,"external_cited_by_count":null,"distinct_field_count":25,"first_pith_cited_at":"2023-04-20T18:25:35+00:00","last_pith_cited_at":"2026-05-13T14:30:39+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-14T21:56:14.359169+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"method","n":4},{"context_role":"background","n":1}],"polarity_counts":[{"context_polarity":"background","n":3},{"context_polarity":"use_method","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"LoRA: Low-Rank Adaptation of Large Language Models","claims":[{"claim_text":"An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, grea","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks LoRA: Low-Rank Adaptation of Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-13T19:43:42.849934+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"5b39c18b-048d-4e0e-a79f-edfc6a7d55f0","orcid":null,"display_name":"Edward J. Hu"},{"id":"5a103e4d-cf90-4665-8169-10074570b2a4","orcid":null,"display_name":"Yelong Shen"},{"id":"8a91d865-49be-4cad-bde9-31226ed6d0aa","orcid":null,"display_name":"Phillip Wallis"},{"id":"d94426ed-79d3-4820-b9c9-32e3272e6821","orcid":null,"display_name":"Zeyuan Allen-Zhu"},{"id":"b43c92f8-9816-4d02-8106-05dc393093a7","orcid":null,"display_name":"Yuanzhi Li"},{"id":"5e630c48-8030-4ff9-bbf6-3c46dff3ac87","orcid":null,"display_name":"Shean Wang"}]},"error":null,"updated_at":"2026-05-13T19:43:42.848191+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-13T19:43:42.269073+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":36},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":25},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":23},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":23},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":23},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":21},{"title":"QLoRA: Efficient Finetuning of Quantized LLMs","work_id":"d3fdf68e-3a5e-48b5-8a18-7a9137479c55","shared_citers":21},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":20},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":18},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":16},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":16},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":15},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":14},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":14},{"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","shared_citers":14},{"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","work_id":"d1cf6693-a082-403c-ada9-dac7b96341f9","shared_citers":13},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":13},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":13},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":12},{"title":"Dora: Weight-decomposed low-rank adaptation","work_id":"6726c65c-0da8-4d37-9dae-84dbdd936ae4","shared_citers":12},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":12},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":12},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":11},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":11}],"time_series":[{"n":10,"year":2023},{"n":11,"year":2024},{"n":2,"year":2025},{"n":198,"year":2026}]},"error":null,"updated_at":"2026-05-13T19:43:42.344207+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"fixed":1,"items":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-13T19:33:32.462619+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"LoRA: Low-Rank Adaptation of Large Language Models","claims":[{"claim_text":"An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, grea","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks LoRA: Low-Rank Adaptation of Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-13T19:43:42.273159+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"LoRA: Low-Rank Adaptation of Large Language Models","claims":[{"claim_text":"An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, grea","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks LoRA: Low-Rank Adaptation of Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-13T19:43:42.271540+00:00"}},"summary":{"title":"LoRA: Low-Rank Adaptation of Large Language Models","claims":[{"claim_text":"An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, grea","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks LoRA: Low-Rank Adaptation of Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":36},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":25},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":23},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":23},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":23},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":21},{"title":"QLoRA: Efficient Finetuning of Quantized LLMs","work_id":"d3fdf68e-3a5e-48b5-8a18-7a9137479c55","shared_citers":21},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":20},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":18},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":16},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":16},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":15},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":14},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":14},{"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","shared_citers":14},{"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","work_id":"d1cf6693-a082-403c-ada9-dac7b96341f9","shared_citers":13},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":13},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":13},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":12},{"title":"Dora: Weight-decomposed low-rank adaptation","work_id":"6726c65c-0da8-4d37-9dae-84dbdd936ae4","shared_citers":12},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":12},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":12},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":11},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":11}],"time_series":[{"n":10,"year":2023},{"n":11,"year":2024},{"n":2,"year":2025},{"n":198,"year":2026}]},"authors":[{"id":"5b39c18b-048d-4e0e-a79f-edfc6a7d55f0","orcid":null,"display_name":"Edward J. Hu","source":"manual","import_confidence":0.72},{"id":"8a91d865-49be-4cad-bde9-31226ed6d0aa","orcid":null,"display_name":"Phillip Wallis","source":"manual","import_confidence":0.72},{"id":"5e630c48-8030-4ff9-bbf6-3c46dff3ac87","orcid":null,"display_name":"Shean Wang","source":"manual","import_confidence":0.72},{"id":"5a103e4d-cf90-4665-8169-10074570b2a4","orcid":null,"display_name":"Yelong Shen","source":"manual","import_confidence":0.72},{"id":"b43c92f8-9816-4d02-8106-05dc393093a7","orcid":null,"display_name":"Yuanzhi Li","source":"manual","import_confidence":0.72},{"id":"d94426ed-79d3-4820-b9c9-32e3272e6821","orcid":null,"display_name":"Zeyuan Allen-Zhu","source":"manual","import_confidence":0.72}]}}