super hub Mixed citations

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Phillip Wallis, Shean Wang, Yelong Shen, Yuanzhi Li, Zeyuan Allen-Zhu · 2021 · cs.CL · arXiv 2106.09685

Mixed citation behavior. Most common role is background (60%).

241 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 241 citing papers more from Edward J. Hu arXiv PDF

abstract

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 4 background 1

citation-polarity summary

background 3 use method 2

claims ledger

abstract An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, grea

authors

Edward J. Hu Phillip Wallis Shean Wang Yelong Shen Yuanzhi Li Zeyuan Allen-Zhu

co-cited works

representative citing papers

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

cs.CR · 2026-05-10 · conditional · novelty 8.0

Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.

PhysInOne: Visual Physics Learning and Reasoning in One Suite

cs.CV · 2026-04-10 · unverdicted · novelty 8.0

PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.

Inducing Artificial Uncertainty in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.

Efficient and Adaptive Human Activity Recognition via LLM Backbones

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.

Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.

Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation

cs.IR · 2026-05-09 · unverdicted · novelty 7.0

Reddit2Deezer supplies 190k authentic Reddit dialogues grounded in Deezer music entities for scalable conversational music recommendation research.

KL for a KL: On-Policy Distillation with Control Variate Baseline

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

vOPD stabilizes on-policy distillation gradients by subtracting a closed-form per-token negative reverse KL baseline as a detached control variate, preserving unbiasedness while lowering variance and matching expensive full-vocabulary methods.

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.

Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Instruction tuning makes late-layer computation depend more on the model's own post-trained upstream state than on base-model upstream state, producing a consistent +1.68 logit interaction effect across five model families.

Dataset Watermarking for Closed LLMs with Provable Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A new watermarking method for closed LLMs boosts random word-pair co-occurrences via rephrasing and detects the signal statistically in outputs, working reliably even when the watermarked data is only 1% of fine-tuning tokens while preserving utility.

Rethinking Vacuity for OOD Detection in Evidential Deep Learning

cs.AI · 2026-05-07 · accept · novelty 7.0

Vacuity-based OOD detection in evidential deep learning is highly sensitive to class cardinality differences between ID and OOD, which can artificially inflate AUROC and AUPR without any change in model predictions.

A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

FP-FM adapts flow matching models to unseen distributions via least-squares projection onto basis functions spanning training velocity fields, yielding improved precision and recall without inference-time training.

TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.

Act2See: Emergent Active Visual Perception for Video Reasoning

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

Act2See trains VLMs via supervised fine-tuning on verified reasoning traces to interleave active frame calls within text CoTs, yielding SOTA results on video reasoning benchmarks.

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

DSR uses transformer models to detect sentiment targets in text and score them along three theory-motivated axes, with validation showing correlations to existing social science datasets.

Subliminal Steering: Stronger Encoding of Hidden Signals

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

Subliminal steering transfers complex behavioral biases and the underlying steering vector through fine-tuning on innocuous data, achieving higher precision than prior prompt-based methods.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow

cs.SE · 2026-04-24 · unverdicted · novelty 7.0

RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

High-variance activation directions are uncorrelated with predictions, transformer blocks grow more linear with depth, and single-block linear replacement yields 34x compression on Mistral's final block at a 1.71 perplexity cost.

Fine-Tuning Small Reasoning Models for Quantum Field Theory

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 142 · internal anchor
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

LoRA: Low-Rank Adaptation of Large Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer