pith. sign in

A unified understanding and evaluation of steering methods.arXiv preprint arXiv:2502.02716

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

fields

cs.LG 5 cs.CL 2

years

2026 6 2025 1

verdicts

UNVERDICTED 7

representative citing papers

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

cs.LG · 2025-05-30 · unverdicted · novelty 6.0

K-Steering uses a non-linear multi-label classifier on activations to compute gradient-based intervention directions for unified multi-attribute control in LLMs, outperforming linear baselines on ToneBank and DebateMix benchmarks across three model families.

citing papers explorer

Showing 7 of 7 citing papers.