Grains: Gradient-based attribution for inference-time steering of llms and vlms.CoRR, abs/2507.18043, 2025a

Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal · 2025 · arXiv 2507.18043

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Continuous Interpretive Steering for Scalar Diversity

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

Continuous Interpretive Steering and the GraSD dataset reveal that LLMs encode graded sensitivity to scalar diversity in their internal representations, recoverable via controlled activation interventions.

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment cs.LG · 2026-04-07 · unverdicted · none · ref 44
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

Grains: Gradient-based attribution for inference-time steering of llms and vlms.CoRR, abs/2507.18043, 2025a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer