super hub Canonical reference

QLoRA: Efficient Finetuning of Quantized LLMs

Ari Holtzman, Artidoro Pagnoni, Luke Zettlemoyer, Tim Dettmers · 2023 · cs.LG · arXiv 2305.14314

Canonical reference. 70% of citing Pith papers cite this work as background.

113 Pith papers citing it

Background 70% of classified citations

open full Pith review browse 113 citing papers more from Ari Holtzman arXiv PDF

abstract

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 20 method 6 other 1

citation-polarity summary

background 19 use method 6 unclear 2

claims ledger

abstract We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save m

authors

Ari Holtzman Artidoro Pagnoni Luke Zettlemoyer Tim Dettmers

co-cited works

representative citing papers

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

cs.LG · 2026-06-01 · unverdicted · novelty 8.0

KV cache quantization silently erodes LLM safety alignment via vulnerable low-dimensional subspaces, diagnosed by Per-Channel Reduction into three failure modes and mitigated training-free with up to 97% recovery.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Universal and Transferable Adversarial Attacks on Aligned Language Models

cs.CL · 2023-07-27 · accept · novelty 8.0

Gradient and greedy search over token suffixes produces universal, transferable adversarial prompts that elicit objectionable outputs from aligned models including black-box commercial systems.

High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent

math.OC · 2026-06-18 · unverdicted · novelty 7.0

Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).

Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy

astro-ph.IM · 2026-06-16 · unverdicted · novelty 7.0

A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.

The Regularizing Power of Language-Training Deepfake Detectors

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Autoregressive transformers exhibit measurable cognitive fatigue during extended generation, quantified by the Fatigue Index that predicts degradation (AUROC 0.95) and repetition (rho 0.94).

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

cs.DC · 2026-05-20 · conditional · novelty 7.0

LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.

Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

PCM uses success-failure action variance to probabilistically select and mask chunks for gradient updates in GRPO, matching standard success rates with 2.38x wall-clock speedup and 60% lower memory on LIBERO benchmarks.

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

cs.LG · 2026-05-14 · conditional · novelty 7.0

The paper introduces an outlier-injection attack that induces targeted weight collapse in LLMs under advanced quantization schemes including AWQ, GPTQ, and GGUF I-quants.

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

AtlasOCR delivers the first open-source Darija OCR by fine-tuning Qwen2.5-VL 3B, achieving state-of-the-art results on custom and existing benchmarks for both Darija and Arabic.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation

cs.IR · 2026-02-16 · unverdicted · novelty 7.0

ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.

CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.

Democratising Pathology Co-Pilots: An Open Pipeline and Dataset for Whole-Slide Vision-Language Modelling

cs.CV · 2025-12-19 · conditional · novelty 7.0

A new open pipeline and dataset enable training of a vision-language model for whole-slide pathology VQA that outperforms MedGemma on tissue identification, neoplasm detection, and differential diagnosis.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

Self-Rewarding Language Models

cs.CL · 2024-01-18 · conditional · novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

REPAIR-Bench: A Benchmark for Robot Error Perception And Interaction Recovery

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

REPAIR-Bench provides a new multimodal dataset and three tasks for evaluating longitudinal failure detection, failure-type classification, and user-centered recovery prediction in human-robot interaction.

Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

A local cascade framework for educational dialogue de-identification reaches 0.958 macro F1 on math tutoring transcripts, outperforming same-family LLM-only and commercial baselines while remaining fully on-device.

citing papers explorer

Showing 50 of 88 citing papers after filters.

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation cs.LG · 2026-06-01 · unverdicted · none · ref 26 · internal anchor
KV cache quantization silently erodes LLM safety alignment via vulnerable low-dimensional subspaces, diagnosed by Per-Channel Reduction into three failure modes and mitigated training-free with up to 97% recovery.
High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent math.OC · 2026-06-18 · unverdicted · none · ref 4 · internal anchor
Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).
Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy astro-ph.IM · 2026-06-16 · unverdicted · none · ref 58 · internal anchor
A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.
The Regularizing Power of Language-Training Deepfake Detectors cs.CV · 2026-05-29 · unverdicted · none · ref 12 · internal anchor
A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.
Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement cs.CL · 2026-05-29 · unverdicted · none · ref 4 · internal anchor
Autoregressive transformers exhibit measurable cognitive fatigue during extended generation, quantified by the Fatigue Index that predicts degradation (AUROC 0.95) and repetition (rho 0.94).
Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU cs.DC · 2026-05-20 · conditional · none · ref 12 · internal anchor
LlamaWeb is a WebGPU backend for llama.cpp that uses static memory planning, tunable kernels, and templated multi-precision support to cut memory use by 29-33% and raise decode throughput by 45-69% versus prior browser frameworks on tested hardware.
Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking cs.LG · 2026-05-15 · unverdicted · none · ref 6 · internal anchor
PCM uses success-failure action variance to probabilistically select and mask chunks for gradient updates in GRPO, matching standard success rates with 2.38x wall-clock speedup and 60% lower memory on LIBERO benchmarks.
Widening the Gap: Exploiting LLM Quantization via Outlier Injection cs.LG · 2026-05-14 · conditional · none · ref 6 · internal anchor
The paper introduces an outlier-injection attack that induces targeted weight collapse in LLMs under advanced quantization schemes including AWQ, GPTQ, and GGUF I-quants.
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules cs.AI · 2026-05-09 · unverdicted · none · ref 11 · internal anchor
DiagnosticIQ benchmark shows frontier LLMs perform similarly on standard rule-to-action tasks but lose substantial accuracy under distractor expansion and condition inversion, pointing to calibration as the key deployment issue.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis cs.CL · 2026-05-02 · unverdicted · none · ref 74 · internal anchor
Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment cs.RO · 2026-04-27 · unverdicted · none · ref 4 · internal anchor
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.
Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding cs.CV · 2026-04-14 · unverdicted · none · ref 12 · internal anchor
Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.
CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning cs.CV · 2026-04-09 · unverdicted · none · ref 3 · internal anchor
CrashSight is a new infrastructure-focused benchmark showing that state-of-the-art vision-language models can describe crash scenes but fail at temporal and causal reasoning.
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 7 · internal anchor
AtlasOCR delivers the first open-source Darija OCR by fine-tuning Qwen2.5-VL 3B, achieving state-of-the-art results on custom and existing benchmarks for both Darija and Arabic.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis cs.RO · 2026-04-08 · unverdicted · none · ref 49 · internal anchor
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation cs.IR · 2026-02-16 · unverdicted · none · ref 4 · internal anchor
ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.
CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers cs.LG · 2026-02-05 · unverdicted · none · ref 1 · internal anchor
CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.
REPAIR-Bench: A Benchmark for Robot Error Perception And Interaction Recovery cs.RO · 2026-06-29 · unverdicted · none · ref 16 · internal anchor
REPAIR-Bench provides a new multimodal dataset and three tasks for evaluating longitudinal failure detection, failure-type classification, and user-centered recovery prediction in human-robot interaction.
Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification cs.CL · 2026-06-16 · unverdicted · none · ref 21 · internal anchor
A local cascade framework for educational dialogue de-identification reaches 0.958 macro F1 on math tutoring transcripts, outperforming same-family LLM-only and commercial baselines while remaining fully on-device.
Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning cs.CL · 2026-06-09 · unverdicted · none · ref 4 · internal anchor
SDBN introduces adversarial training to PEFT via two variants using character-level edits and LLM-generated perturbations, claiming improved robustness and generalization on NLP benchmarks in low-resource noisy settings.
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models cs.LG · 2026-06-08 · unverdicted · none · ref 211 · internal anchor
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.
SafetyRepro: Configuration-Conditional Rank Instability on Alignment Benchmarks cs.LG · 2026-05-25 · unverdicted · none · ref 13 · internal anchor
Configuration choices alone flip pairwise safety verdicts on every tested alignment benchmark, isolated via a finite-envelope proposition linking disagreement rate to strict ordering reversal.
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws cs.LG · 2026-05-22 · unverdicted · none · ref 7 · internal anchor
The Shannon Scaling Law treats LLM training as noisy-channel transmission and predicts U-shaped performance degradation when signal-to-noise ratio falls below a threshold, outperforming monotonic scaling laws on Pythia and OLMo2 data.
What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA cs.CL · 2026-05-21 · unverdicted · none · ref 28 · internal anchor
Controlled study shows mixed training curricula improve aggregate F1 on memory QA benchmarks while out-of-domain data transfers targeted skills like temporal reasoning, with per-question-type effects exceeding aggregate differences.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 46 · internal anchor
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation cs.CL · 2026-05-12 · unverdicted · none · ref 23 · internal anchor
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting cs.LG · 2026-05-04 · unverdicted · none · ref 68 · internal anchor
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 26 · internal anchor
SST V2 introduces parallel-trainable nonlinear recurrence in latent space to let transformers reason continuously across positions, delivering +15 points on GPQA-Diamond and halving remaining GSM8K errors over matched baselines.
Diversity in Large Language Models under Supervised Fine-Tuning cs.LG · 2026-04-30 · unverdicted · none · ref 53 · 2 links · internal anchor
TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study cs.SE · 2026-04-27 · unverdicted · none · ref 11 · internal anchor
Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.
Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies cs.AI · 2026-04-23 · unverdicted · none · ref 19 · internal anchor
A separable expert architecture uses base models, LoRA adapters, and deletable per-user proxies to enable privacy-preserving personalization and deterministic unlearning in LLMs.
Pioneer Agent: Continual Improvement of Small Language Models in Production cs.AI · 2026-04-10 · unverdicted · none · ref 19 · internal anchor
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.
Sensitivity-Positional Co-Localization in GQA Transformers cs.CL · 2026-04-09 · unverdicted · none · ref 12 · internal anchor
In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU, GPQA, HumanEval+, MATH, MGSM and ARC.
The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training cs.CR · 2026-04-09 · unverdicted · none · ref 13 · internal anchor
ORPO is most effective at misaligning LLMs while DPO excels at realigning them, though it reduces utility, revealing an asymmetry between attack and defense methods.
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache cs.DC · 2026-04-07 · unverdicted · none · ref 15 · internal anchor
ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.
Constraint-Driven Warm-Freeze for Efficient Transfer Learning in Photovoltaic Systems cs.NE · 2026-04-07 · unverdicted · none · ref 18 · internal anchor
CDWF achieves 90-99% of full fine-tuning performance with up to 120x fewer trainable parameters by dynamically allocating full trainability to gradient-important blocks and LoRA to others for PV cyberattack transfer learning.
Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures cs.LG · 2026-04-04 · conditional · none · ref 3 · internal anchor
Gradient-guided layer selection for LoRA yields 15-28% training speedup with matched downstream results on MMLU, GSM8K, and HumanEval across 14 models from 0.5B to 72B parameters.
An Explainable Vision-Language Model Framework with Adaptive PID-Tversky Loss for Lumbar Spinal Stenosis Diagnosis cs.CV · 2026-04-02 · unverdicted · none · ref 9 · internal anchor
A VLM framework with spatial patch cross-attention and adaptive PID-Tversky loss reports 90.69% classification accuracy, 0.9512 Dice score, and 92.80 CIDEr for LSS diagnosis plus automated report generation.
LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models? cs.CL · 2026-03-25 · unverdicted · none · ref 1 · internal anchor
LiFT instruction fine-tunes LLMs with a temporal curriculum to improve in-context learning on longitudinal NLP tasks, yielding gains on out-of-distribution data and rare change events across multiple model sizes.
ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods cs.CE · 2026-01-08 · unverdicted · none · ref 68 · internal anchor
ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.
AI Native Games: A Survey and Roadmap cs.AI · 2026-07-01 · unverdicted · none · ref 74 · internal anchor
The paper proposes a counterfactual definition of AI-native games, screens 53 examples, introduces a G/N taxonomy, and outlines a research roadmap for the field.
RetailSMV: Exocentric vs. Egocentric Adaptation of Foundation Video World Models in Retail cs.CV · 2026-07-01 · unverdicted · none · ref 4 · internal anchor
Exocentric-only LoRA adaptation of Cosmos3-Nano on a new synchronized retail video dataset matches or exceeds combined ego+exo training on most held-out metrics.
Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering cs.CL · 2026-06-30 · unverdicted · none · ref 20 · 2 links · internal anchor
BiRG-LoRA achieves 69.31% macro-average accuracy across CMB, CMExam, MedQA, and MedMCQA, outperforming MoELoRA by 0.89 points with 28.1% fewer trainable parameters under a matched Qwen3-8B protocol.
Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models cs.LG · 2026-06-29 · unverdicted · none · ref 5 · internal anchor
Higher conservatism in offline DPO training of Qwen3-14B monotonically increases reward-hacking damage (Goodhart gap AUGC) during online adaptation on GSM8K.
TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs cs.CL · 2026-06-28 · unverdicted · none · ref 1 · internal anchor
TriageRA-CCF combines source-side confidence, coverage, and counterfactual signals to supervise an adaptive LoRA rank router, reporting modest average accuracy gains over LoRA/DoRA/MoELoRA baselines on two 8B models under matched training.
Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs cs.LG · 2026-06-18 · unverdicted · none · ref 155 · internal anchor
AIR augments activation-aware SVD compression of LLMs with an influence metric and a closed-form ALS update, claiming >18% perplexity improvement at 60% parameter retention and 90% less calibration data than SVD-LLM(W).
Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents cs.AI · 2026-06-06 · unverdicted · none · ref 37 · internal anchor
CICL scores and compresses context evidence for LLM agents via action-shift and outcome-uplift metrics, lifting hit@1 from 0.58 to 0.78 on 50 SWE-bench retrieval tasks.
SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation cs.CR · 2026-06-04 · unverdicted · none · ref 6 · internal anchor
SecRL-Prune learns layer-wise pruning policies via RL on CodeLLMs, preserving higher pass@k and var@k than baselines at 10-30% compression on HumanEval and enabling semantics-preserving mutations that reduce malware detections in a case study.
Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark cs.AI · 2026-05-28 · unverdicted · none · ref 8 · internal anchor
Fine-tuned Qwen3-VL-8B reaches sem_sim 0.783 on PiSAR held-out set vs 0.46-0.48 for frontier zero-shot, while Gemma-4-26B scores 0.441.
TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints cs.LG · 2026-05-27 · unverdicted · none · ref 1 · internal anchor
TIMEGATE introduces time-boxed promotion gates and an M signal that delivers 66% evaluation-compute savings in simulation and 89% wall-clock/energy reduction on LLaMA-3.1-8B experiments with no silent mis-promotions.

QLoRA: Efficient Finetuning of Quantized LLMs

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer