super hub

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Phillip Wallis, Shean Wang, Yelong Shen, Yuanzhi Li, Zeyuan Allen-Zhu · 2021 · cs.CL · arXiv 2106.09685

240 Pith papers cite this work. Polarity classification is still indexing.

240 Pith papers citing it

open full Pith review browse 240 citing papers more from Edward J. Hu arXiv PDF

abstract

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 3 background 1

citation-polarity summary

background 3 use method 1

claims ledger

abstract An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, grea

authors

Edward J. Hu Phillip Wallis Shean Wang Yelong Shen Yuanzhi Li Zeyuan Allen-Zhu

co-cited works

representative citing papers

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

cs.CR · 2026-05-10 · conditional · novelty 8.0

Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.

PhysInOne: Visual Physics Learning and Reasoning in One Suite

cs.CV · 2026-04-10 · unverdicted · novelty 8.0

PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.

Inducing Artificial Uncertainty in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.

Efficient and Adaptive Human Activity Recognition via LLM Backbones

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.

Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.

Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation

cs.IR · 2026-05-09 · unverdicted · novelty 7.0

Reddit2Deezer supplies 190k authentic Reddit dialogues grounded in Deezer music entities for scalable conversational music recommendation research.

KL for a KL: On-Policy Distillation with Control Variate Baseline

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

vOPD stabilizes on-policy distillation gradients by subtracting a closed-form per-token negative reverse KL baseline as a detached control variate, preserving unbiasedness while lowering variance and matching expensive full-vocabulary methods.

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.

Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Instruction tuning makes late-layer computation depend more on the model's own post-trained upstream state than on base-model upstream state, producing a consistent +1.68 logit interaction effect across five model families.

Dataset Watermarking for Closed LLMs with Provable Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A new watermarking method for closed LLMs boosts random word-pair co-occurrences via rephrasing and detects the signal statistically in outputs, working reliably even when the watermarked data is only 1% of fine-tuning tokens while preserving utility.

Rethinking Vacuity for OOD Detection in Evidential Deep Learning

cs.AI · 2026-05-07 · accept · novelty 7.0

Vacuity-based OOD detection in evidential deep learning is highly sensitive to class cardinality differences between ID and OOD, which can artificially inflate AUROC and AUPR without any change in model predictions.

A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

FP-FM adapts flow matching models to unseen distributions via least-squares projection onto basis functions spanning training velocity fields, yielding improved precision and recall without inference-time training.

TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.

Act2See: Emergent Active Visual Perception for Video Reasoning

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

Act2See trains VLMs via supervised fine-tuning on verified reasoning traces to interleave active frame calls within text CoTs, yielding SOTA results on video reasoning benchmarks.

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

DSR uses transformer models to detect sentiment targets in text and score them along three theory-motivated axes, with validation showing correlations to existing social science datasets.

Subliminal Steering: Stronger Encoding of Hidden Signals

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

Subliminal steering transfers complex behavioral biases and the underlying steering vector through fine-tuning on innocuous data, achieving higher precision than prior prompt-based methods.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow

cs.SE · 2026-04-24 · unverdicted · novelty 7.0

RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

High-variance activation directions are uncorrelated with predictions, transformer blocks grow more linear with depth, and single-block linear replacement yields 34x compression on Mistral's final block at a 1.71 perplexity cost.

Fine-Tuning Small Reasoning Models for Quantum Field Theory

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.

citing papers explorer

Showing 50 of 240 citing papers.

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models cs.CR · 2026-05-10 · conditional · none · ref 16 · internal anchor
Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.
PhysInOne: Visual Physics Learning and Reasoning in One Suite cs.CV · 2026-04-10 · unverdicted · none · ref 42 · internal anchor
PhysInOne is a new dataset of 2 million videos across 153,810 dynamic 3D scenes covering 71 physical phenomena, shown to improve AI performance on physics-aware video generation, prediction, property estimation, and motion transfer.
Inducing Artificial Uncertainty in Language Models cs.CL · 2026-05-13 · unverdicted · none · ref 13 · internal anchor
Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.
Efficient and Adaptive Human Activity Recognition via LLM Backbones cs.LG · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.
Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation cs.AI · 2026-05-12 · unverdicted · none · ref 64 · internal anchor
AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization cs.CV · 2026-05-11 · unverdicted · none · ref 24 · internal anchor
Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.
Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases cs.LG · 2026-05-10 · unverdicted · none · ref 55 · internal anchor
ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.
Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation cs.IR · 2026-05-09 · unverdicted · none · ref 18 · internal anchor
Reddit2Deezer supplies 190k authentic Reddit dialogues grounded in Deezer music entities for scalable conversational music recommendation research.
KL for a KL: On-Policy Distillation with Control Variate Baseline cs.LG · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
vOPD stabilizes on-policy distillation gradients by subtracting a closed-form per-token negative reverse KL baseline as a detached control variate, preserving unbiasedness while lowering variance and matching expensive full-vocabulary methods.
MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning cs.CL · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.
Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic cs.LG · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
Instruction tuning makes late-layer computation depend more on the model's own post-trained upstream state than on base-model upstream state, producing a consistent +1.68 logit interaction effect across five model families.
Dataset Watermarking for Closed LLMs with Provable Detection cs.LG · 2026-05-07 · unverdicted · none · ref 7 · internal anchor
A new watermarking method for closed LLMs boosts random word-pair co-occurrences via rephrasing and detects the signal statistically in outputs, working reliably even when the watermarked data is only 1% of fine-tuning tokens while preserving utility.
Rethinking Vacuity for OOD Detection in Evidential Deep Learning cs.AI · 2026-05-07 · accept · none · ref 6 · internal anchor
Vacuity-based OOD detection in evidential deep learning is highly sensitive to class cardinality differences between ID and OOD, which can artificially inflate AUROC and AUPR without any change in model predictions.
A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions cs.LG · 2026-05-07 · unverdicted · none · ref 13 · internal anchor
FP-FM adapts flow matching models to unseen distributions via least-squares projection onto basis functions spanning training velocity fields, yielding improved precision and recall without inference-time training.
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models cs.LG · 2026-05-07 · unverdicted · none · ref 14 · 2 links · internal anchor
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
A foundation model of vision, audition, and language for in-silico neuroscience q-bio.NC · 2026-05-05 · unverdicted · none · ref 54 · internal anchor
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns cs.CR · 2026-05-03 · unverdicted · none · ref 31 · internal anchor
VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.
Act2See: Emergent Active Visual Perception for Video Reasoning cs.CV · 2026-05-03 · unverdicted · none · ref 17 · internal anchor
Act2See trains VLMs via supervised fine-tuning on verified reasoning traces to interleave active frame calls within text CoTs, yielding SOTA results on video reasoning benchmarks.
Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media cs.CL · 2026-05-01 · unverdicted · none · ref 40 · internal anchor
DSR uses transformer models to detect sentiment targets in text and score them along three theory-motivated axes, with validation showing correlations to existing social science datasets.
Subliminal Steering: Stronger Encoding of Hidden Signals cs.CL · 2026-04-28 · unverdicted · none · ref 8 · internal anchor
Subliminal steering transfers complex behavioral biases and the underlying steering vector through fine-tuning on innocuous data, achieving higher precision than prior prompt-based methods.
Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning cs.LG · 2026-04-24 · unverdicted · none · ref 9 · internal anchor
A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow cs.SE · 2026-04-24 · unverdicted · none · ref 14 · internal anchor
RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.
Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales cs.LG · 2026-04-22 · unverdicted · none · ref 10 · internal anchor
High-variance activation directions are uncorrelated with predictions, transformer blocks grow more linear with depth, and single-block linear replacement yields 34x compression on Mistral's final block at a 1.71 perplexity cost.
Fine-Tuning Small Reasoning Models for Quantum Field Theory cs.LG · 2026-04-21 · unverdicted · none · ref 216 · internal anchor
Small 7B reasoning models were fine-tuned on synthetic and curated QFT problems using RL and SFT, yielding performance gains, error analysis, and public release of data and traces.
LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification cs.CL · 2026-04-20 · conditional · none · ref 5 · internal anchor
Fine-tuned BERTimbau-LoRA achieves 87.6% accuracy and 0.87 macro-F1 on LegalBench-BR, outperforming commercial LLMs by 22-28 points and eliminating their systematic bias toward civil law on Brazilian legal classification.
AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation cs.CV · 2026-04-20 · unverdicted · none · ref 31 · internal anchor
AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.
DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion cs.IR · 2026-04-20 · conditional · none · ref 19 · internal anchor
Adaptive trie-guided decoding with document context and tunable penalties improves in-document query auto-completion, outperforming baselines and larger models like LLaMA-3 on seen queries.
How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them cs.CL · 2026-04-18 · unverdicted · none · ref 25 · internal anchor
Subword tokenization impairs phonological knowledge encoding in LMs, but an IPA-based fine-tuning method restores it with minimal impact on other capabilities.
BasketHAR: A Multimodal Dataset for Human Activity Recognition and Sport Analysis in Basketball Training Scenarios cs.CV · 2026-04-18 · conditional · none · ref 9 · internal anchor
BasketHAR is a publicly released multimodal dataset of professional basketball training activities captured with inertial sensors, physiological signals, and video, accompanied by a baseline alignment method.
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration cs.CL · 2026-04-17 · unverdicted · none · ref 15 · internal anchor
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
S-GRPO: Unified Post-Training for Large Vision-Language Models cs.LG · 2026-04-17 · unverdicted · none · ref 20 · internal anchor
S-GRPO unifies SFT and RL for LVLMs via conditional ground-truth injection that supplies a maximal-reward anchor when group exploration fails completely.
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines cs.CV · 2026-04-15 · unverdicted · none · ref 48 · internal anchor
DharmaOCR models reach 0.925 and 0.911 extraction scores with 0.40% and 0.20% degeneration rates on a new benchmark covering printed, handwritten, and legal documents, outperforming open-source and commercial baselines via SFT plus DPO.
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models cs.LG · 2026-04-15 · conditional · none · ref 11 · internal anchor
GUI-Perturbed shows that GUI grounding models suffer systematic accuracy collapse under relational instructions and visual changes such as 70% zoom, with even augmented fine-tuning worsening results.
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS eess.AS · 2026-04-13 · unverdicted · none · ref 19 · internal anchor
Adapting speech-aware LLMs with speaker cluster identification tags and concatenated multi-speaker data yields superior speaker-attributed ASR performance versus sequential diarization-plus-ASR pipelines.
SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates cs.LG · 2026-04-12 · unverdicted · none · ref 4 · internal anchor
LoRA weight updates are spectrally sparse, with 33% of DCT coefficients capturing 90% of energy on average, enabling 10x storage reduction and occasional gains by masking high frequencies.
FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks cs.AI · 2026-04-11 · unverdicted · none · ref 2 · internal anchor
FinTrace supplies trajectory-level metrics for LLM financial tool calling, exposing gaps in information use and output quality, while its preference dataset enables DPO training that boosts intermediate metrics.
UIPress: Bringing Optical Token Compression to UI-to-Code Generation cs.CL · 2026-04-10 · unverdicted · none · ref 14 · internal anchor
UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.
Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification cs.CV · 2026-04-09 · unverdicted · none · ref 15 · internal anchor
Medical MLLMs degrade on image classification due to four failure modes in visual representation quality, connector projection fidelity, LLM comprehension, and semantic mapping alignment, quantified by feature probing on 14 models across 3 datasets.
DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning eess.IV · 2026-04-09 · unverdicted · none · ref 16 · internal anchor
DiV-INR integrates implicit neural representations as conditioning signals for diffusion models to achieve better perceptual quality than HEVC, VVC, and prior neural codecs at extremely low bitrates under 0.05 bpp.
HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 20 · internal anchor
HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents cs.SE · 2026-04-04 · unverdicted · none · ref 2 · internal anchor
A LoRA-fine-tuned Qwen 3.5 2B model for task-conditioned tool-output pruning reaches 0.86 recall and 0.80 F1 on a new 618-example test set while removing 92% of input tokens and outperforming larger zero-shot models.
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models cs.CL · 2026-04-01 · conditional · none · ref 9 · internal anchor
S0 tuning optimizes initial recurrent states in hybrid models to outperform LoRA with zero inference cost on HumanEval and partial cross-domain transfer.
Refusal in Language Models Is Mediated by a Single Direction cs.LG · 2024-06-17 · accept · none · ref 138 · internal anchor
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
3D-VLA: A 3D Vision-Language-Action Generative World Model cs.CV · 2024-03-14 · unverdicted · none · ref 23 · internal anchor
3D-VLA is a new embodied foundation model that uses a 3D LLM plus aligned diffusion models to generate future images and point clouds for improved reasoning and action planning in 3D environments.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment cs.CV · 2024-03-08 · unverdicted · none · ref 26 · internal anchor
ELLA introduces a timestep-aware semantic connector to link LLMs with diffusion models for improved dense prompt following, validated on a new 1K-prompt benchmark.
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation cs.CV · 2023-10-09 · unverdicted · none · ref 251 · internal anchor
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023-07-10 · unverdicted · none · ref 9 · internal anchor
A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.
Extending Context Window of Large Language Models via Positional Interpolation cs.CL · 2023-06-27 · conditional · none · ref 4 · internal anchor
Position Interpolation linearly down-scales position indices to extend RoPE context windows to 32768 tokens with 1000-step fine-tuning, delivering strong long-context results on LLaMA 7B-65B while preserving short-context quality.
QLoRA: Efficient Finetuning of Quantized LLMs cs.LG · 2023-05-23 · conditional · none · ref 28 · internal anchor
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
Combining pre-trained models via localized model averaging stat.ME · 2026-05-13 · unverdicted · none · ref 142 · internal anchor
Localized model averaging with covariate-dependent weights achieves asymptotic optimality and weight consistency for combining pre-trained models under a general loss framework.

LoRA: Low-Rank Adaptation of Large Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer