hub

Hashimoto , title =

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin · 2023

40 Pith papers cite this work. Polarity classification is still indexing.

40 Pith papers citing it

browse 40 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

WildChat: 1M ChatGPT Interaction Logs in the Wild

cs.CL · 2024-05-02 · accept · novelty 8.0

WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

cs.CR · 2026-05-22 · accept · novelty 7.0

PoisonForge benchmark shows that 1% poisoned examples achieve over 70% attack success rate on targeted tasks across 11 of 12 tested LLMs with under 0.5% leakage to non-target tasks.

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Pretraining and alignment induce asymmetric geometric traces in transformer weights because alignment updates concentrate in read pathways due to activation covariance while write pathways inherit less structure from alignment losses.

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

cs.LG · 2026-05-14 · conditional · novelty 7.0

A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

cs.CL · 2025-02-28 · unverdicted · novelty 7.0

CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

cs.SE · 2025-02-25 · unverdicted · novelty 7.0

SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.

Refusal in Language Models Is Mediated by a Single Direction

cs.LG · 2024-06-17 · accept · novelty 7.0

Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

cs.CL · 2024-06-12 · unverdicted · novelty 7.0

Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.

Self-Rewarding Language Models

cs.CL · 2024-01-18 · conditional · novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

cs.CV · 2023-12-28 · conditional · novelty 7.0

Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.

EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers

cs.CL · 2023-09-15 · unverdicted · novelty 7.0

EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.

LIMA: Less Is More for Alignment

cs.CL · 2023-05-18 · conditional · novelty 7.0

Fine-tuning a 65B model on 1,000 high-quality examples produces output that humans rate as good as or better than GPT-4 in 43% of cases, indicating most capabilities come from pretraining.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

cs.CV · 2023-03-28 · conditional · novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation

cs.CL · 2026-05-21 · conditional · novelty 6.0

LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.

GradShield: Alignment Preserving Finetuning

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

GradShield removes data points likely to cause safety misalignment during LLM finetuning by computing a Finetuning Implicit Harmfulness Score and applying adaptive thresholding, keeping attack success rates below 6% while preserving utility.

A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Suppressing one refusal neuron or amplifying one concept neuron bypasses safety alignment in LLMs from 1.7B to 70B parameters without training or prompt engineering.

Minimizing Collateral Damage in Activation Steering

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Activation steering is cast as constrained optimization that minimizes collateral damage by weighting perturbations according to the empirical second-moment matrix of activations instead of assuming isotropy.

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

cs.LG · 2024-12-07 · unverdicted · novelty 6.0

FlexAttention supplies a compiler-driven interface that expresses common attention variants in a few lines of PyTorch and emits optimized kernels whose speed matches hand-written implementations.

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

cs.AI · 2024-08-01 · conditional · novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.

Lessons from the Trenches on Reproducible Evaluation of Language Models

cs.CL · 2024-05-23 · accept · novelty 6.0

The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.

citing papers explorer

Showing 40 of 40 citing papers.

WildChat: 1M ChatGPT Interaction Logs in the Wild cs.CL · 2024-05-02 · accept · none · ref 32
WildChat releases a dataset of 1 million ChatGPT conversations with timestamps, demographics, and headers, claimed to be the most diverse and multilingual such resource available.
ORPO: Monolithic Preference Optimization without Reference Model cs.CL · 2024-03-12 · conditional · none · ref 102
ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.
PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs cs.CR · 2026-05-22 · accept · none · ref 3
PoisonForge benchmark shows that 1% poisoned examples achieve over 70% attack success rate on targeted tasks across 11 of 12 tested LLMs with under 0.5% leakage to non-target tasks.
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space cs.LG · 2026-05-15 · unverdicted · none · ref 39
Pretraining and alignment induce asymmetric geometric traces in transformer weights because alignment updates concentrate in read pathways due to activation covariance while write pathways inherit less structure from alignment losses.
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm cs.LG · 2026-05-14 · conditional · none · ref 101
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 98 · 2 links
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation cs.CV · 2026-04-20 · unverdicted · none · ref 56
AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation cs.CL · 2025-02-28 · unverdicted · none · ref 37
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution cs.SE · 2025-02-25 · unverdicted · none · ref 80
SWE-RL uses RL on software evolution data to train LLMs achieving 41% on SWE-bench Verified with generalization to other reasoning tasks.
Refusal in Language Models Is Mediated by a Single Direction cs.LG · 2024-06-17 · accept · none · ref 38
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing cs.CL · 2024-06-12 · unverdicted · none · ref 35
Magpie synthesizes 300K high-quality alignment instructions from Llama-3-Instruct via auto-regressive prompting on partial templates, enabling fine-tuned models to match official instruct performance on AlpacaEval, ArenaHard, and WildBench.
Self-Rewarding Language Models cs.CL · 2024-01-18 · conditional · none · ref 23
Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels cs.CV · 2023-12-28 · conditional · none · ref 69
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers cs.CL · 2023-09-15 · unverdicted · none · ref 127
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
LIMA: Less Is More for Alignment cs.CL · 2023-05-18 · conditional · none · ref 12
Fine-tuning a 65B model on 1,000 high-quality examples produces output that humans rate as good as or better than GPT-4 in 43% of cases, indicating most capabilities come from pretraining.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 174
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation cs.CL · 2026-05-21 · conditional · none · ref 157
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
GradShield: Alignment Preserving Finetuning cs.CL · 2026-05-13 · unverdicted · none · ref 44
GradShield removes data points likely to cause safety misalignment during LLM finetuning by computing a Finetuning Implicit Harmfulness Score and applying adaptive thresholding, keeping attack success rates below 6% while preserving utility.
A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 28
Suppressing one refusal neuron or amplifying one concept neuron bypasses safety alignment in LLMs from 1.7B to 70B parameters without training or prompt engineering.
Minimizing Collateral Damage in Activation Steering cs.LG · 2026-05-01 · unverdicted · none · ref 23
Activation steering is cast as constrained optimization that minimizes collateral damage by weighting perturbations according to the empirical second-moment matrix of activations instead of assuming isotropy.
Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks cs.CR · 2026-04-20 · unverdicted · none · ref 18
Different LLM jailbreak techniques achieve similar harmful compliance but lead to distinct behavioral side effects and mechanistic changes.
Flex Attention: A Programming Model for Generating Optimized Attention Kernels cs.LG · 2024-12-07 · unverdicted · none · ref 17
FlexAttention supplies a compiler-driven interface that expresses common attention variants in a few lines of PyTorch and emits optimized kernels whose speed matches hand-written implementations.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models cs.AI · 2024-08-01 · conditional · none · ref 6
Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
Lessons from the Trenches on Reproducible Evaluation of Language Models cs.CL · 2024-05-23 · accept · none · ref 121
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty cs.LG · 2024-01-26 · unverdicted · none · ref 34
EAGLE resolves feature-level uncertainty in speculative sampling via one-step token advancement, delivering 2.7x-3.5x speedup on LLaMA2-Chat 70B and doubled throughput across multiple model families and tasks.
The Falcon Series of Open Language Models cs.CL · 2023-11-28 · conditional · none · ref 211
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
Chain-of-Verification Reduces Hallucination in Large Language Models cs.CL · 2023-09-20 · unverdicted · none · ref 73
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models cs.CL · 2023-09-07 · conditional · none · ref 73
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 200
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
GiVA: Gradient-Informed Bases for Vector-Based Adaptation cs.CL · 2026-04-23 · unverdicted · none · ref 20
GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion cs.LG · 2026-04-21 · unverdicted · none · ref 60
FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models cs.CV · 2024-08-09 · unverdicted · none · ref 64
mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning cs.LG · 2024-02-18 · unverdicted · none · ref 101
POVID generates AI-created preference data to fine-tune vision-language models with DPO, reducing hallucinations and improving benchmark scores.
AppAgent: Multimodal Agents as Smartphone Users cs.CV · 2023-12-21 · unverdicted · none · ref 57
AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 4
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 176
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 20
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
A Survey on Knowledge Distillation of Large Language Models cs.CL · 2024-02-20 · accept · none · ref 52
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
A Survey of Hallucination in Large Foundation Models cs.AI · 2023-09-12 · accept · none · ref 86
A survey classifying hallucination phenomena specific to large foundation models, establishing evaluation criteria, examining mitigation strategies, and discussing future directions.
Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning cs.LG · 2026-05-01 · unreviewed · ref 44

Hashimoto , title =

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer