Title resolution pending

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B · 2023

24 Pith papers cite this work. Polarity classification is still indexing.

24 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Backdooring Masked Diffusion Language Models

cs.LG · 2026-05-19 · conditional · novelty 8.0

SHADOWMASK backdoors MDLMs by modifying the forward corruption process with a trigger-mask mixture, achieving near-100% attack success while preserving clean utility on DiT-based and LLaDA models.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

cs.CL · 2024-10-14 · conditional · novelty 7.0

DuoAttention identifies retrieval heads requiring full KV cache and streaming heads using constant-length cache to reduce memory and latency in long-context LLM inference.

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

cs.CL · 2024-06-12 · unverdicted · novelty 7.0

Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.

Self-Rewarding Language Models

cs.CL · 2024-01-18 · conditional · novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers

cs.CL · 2023-09-15 · unverdicted · novelty 7.0

EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.

LIMA: Less Is More for Alignment

cs.CL · 2023-05-18 · conditional · novelty 7.0

Fine-tuning a 65B model on 1,000 high-quality examples produces output that humans rate as good as or better than GPT-4 in 43% of cases, indicating most capabilities come from pretraining.

WizardLM: Empowering large pre-trained language models to follow complex instructions

cs.CL · 2023-04-24 · conditional · novelty 7.0

WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

cs.CV · 2023-03-28 · conditional · novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

Latent-space Attacks for Refusal Evasion in Language Models

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

Introduces Controlled Latent-space Evasion attack that projects model activations past a linear probe's decision boundary to suppress refusal, outperforming ablation baselines on 15 models.

Alignment Dynamics in LLM Fine-Tuning

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

The paper introduces a dynamical model that decomposes alignment updates in LLM fine-tuning into rebound and driving forces and predicts a rehearsal priming effect.

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

NCO enables efficient online pattern matching for negative hard and regex constraints in LLM decoding to prevent forbidden content without state explosion.

Efficient Streaming Language Models with Attention Sinks

cs.CL · 2023-09-29 · accept · novelty 6.0

StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

cs.CL · 2023-09-29 · conditional · novelty 6.0

ToRA trains language models on interactive tool-use trajectories with imitation learning and output shaping to integrate reasoning and external tools, yielding 13-19% gains on math datasets and new highs like 44.6% on MATH for a 7B model.

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

cs.CL · 2023-09-11 · conditional · novelty 6.0

MAmmoTH models trained via hybrid CoT-PoT instruction tuning on MathInstruct outperform prior open-source LLMs by 16-32% average accuracy on nine math datasets, reaching 33% and 44% on MATH for 7B and 34B scales.

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

cs.LG · 2023-09-01 · conditional · novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

cs.CL · 2023-06-09 · accept · novelty 6.0

GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

cs.CL · 2023-06-05 · conditional · novelty 6.0

A 13B model called Orca learns detailed reasoning from GPT-4 explanation traces and reaches parity with ChatGPT on Big-Bench Hard while outperforming other 13B models.

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

cs.CL · 2025-09-22 · unverdicted · novelty 5.0

QWHA proposes Walsh-Hadamard Transform adapters with adaptive initialization for quantization-aware PEFT, claiming better low-bit accuracy and faster training than low-rank or other FT-based baselines.

Instruction-Following Evaluation for Large Language Models

cs.CL · 2023-11-14 · unverdicted · novelty 5.0

IFEval is a new benchmark of 25 verifiable instruction types and ~500 prompts for objective, reproducible evaluation of LLMs' instruction-following abilities.

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

cs.CL · 2023-08-07 · unverdicted · novelty 5.0

LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

citing papers explorer

Showing 24 of 24 citing papers.

Backdooring Masked Diffusion Language Models cs.LG · 2026-05-19 · conditional · none · ref 19
SHADOWMASK backdoors MDLMs by modifying the forward corruption process with a trigger-mask mixture, achieving near-100% attack success while preserving clean utility on DiT-based and LLaDA models.
Crafting Reversible SFT Behaviors in Large Language Models cs.LG · 2026-05-07 · unverdicted · none · ref 28
LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.
WildChat: 1M ChatGPT Interaction Logs in the Wild cs.CL · 2024-05-02 · accept · none · ref 16
WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads cs.CL · 2024-10-14 · conditional · none · ref 45
DuoAttention identifies retrieval heads requiring full KV cache and streaming heads using constant-length cache to reduce memory and latency in long-context LLM inference.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing cs.CL · 2024-06-12 · unverdicted · none · ref 139
Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.
Self-Rewarding Language Models cs.CL · 2024-01-18 · conditional · none · ref 116
Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers cs.CL · 2023-09-15 · unverdicted · none · ref 55
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
LIMA: Less Is More for Alignment cs.CL · 2023-05-18 · conditional · none · ref 49
Fine-tuning a 65B model on 1,000 high-quality examples produces output that humans rate as good as or better than GPT-4 in 43% of cases, indicating most capabilities come from pretraining.
WizardLM: Empowering large pre-trained language models to follow complex instructions cs.CL · 2023-04-24 · conditional · none · ref 38
WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 74
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Latent-space Attacks for Refusal Evasion in Language Models cs.AI · 2026-05-20 · unverdicted · none · ref 17
Introduces Controlled Latent-space Evasion attack that projects model activations past a linear probe's decision boundary to suppress refusal, outperforming ablation baselines on 15 models.
Alignment Dynamics in LLM Fine-Tuning cs.LG · 2026-05-18 · unverdicted · none · ref 28
The paper introduces a dynamical model that decomposes alignment updates in LLM fine-tuning into rebound and driving forces and predicts a rehearsal priming effect.
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation cs.LG · 2026-05-12 · unverdicted · none · ref 81
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding cs.CL · 2026-05-11 · unverdicted · none · ref 25
NCO enables efficient online pattern matching for negative hard and regex constraints in LLM decoding to prevent forbidden content without state explosion.
Efficient Streaming Language Models with Attention Sinks cs.CL · 2023-09-29 · accept · none · ref 47
StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving cs.CL · 2023-09-29 · conditional · none · ref 42
ToRA trains language models on interactive tool-use trajectories with imitation learning and output shaping to integrate reasoning and external tools, yielding 13-19% gains on math datasets and new highs like 44.6% on MATH for a 7B model.
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning cs.CL · 2023-09-11 · conditional · none · ref 42
MAmmoTH models trained via hybrid CoT-PoT instruction tuning on MathInstruct outperform prior open-source LLMs by 16-32% average accuracy on nine math datasets, reaching 33% and 44% on MATH for 7B and 34B scales.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models cs.LG · 2023-09-01 · conditional · none · ref 53
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena cs.CL · 2023-06-09 · accept · none · ref 38
GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 cs.CL · 2023-06-05 · conditional · none · ref 7
A 13B model called Orca learns detailed reasoning from GPT-4 explanation traces and reaches parity with ChatGPT on Big-Bench Hard while outperforming other 13B models.
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates cs.LG · 2026-05-19 · unverdicted · none · ref 63
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models cs.CL · 2025-09-22 · unverdicted · none · ref 48
QWHA proposes Walsh-Hadamard Transform adapters with adaptive initialization for quantization-aware PEFT, claiming better low-bit accuracy and faster training than low-rank or other FT-based baselines.
Instruction-Following Evaluation for Large Language Models cs.CL · 2023-11-14 · unverdicted · none · ref 22
IFEval is a new benchmark of 25 verifiable instruction types and ~500 prompts for objective, reproducible evaluation of LLMs' instruction-following abilities.
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 51
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer