Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
arXiv preprint arXiv:2402.01694 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Pref-CTRL trains a multi-objective value function on preferences to guide representation editing for LLM alignment, outperforming RE-Control on benchmarks with better out-of-domain generalization.
Preference-Paired Fine-Tuning (PFT) lets LLMs handle conflicting and dynamic individual preferences better than single-preference methods, reaching 96.6% accuracy on the new VCD dataset and 44.76% gains in user alignment with limited history.
citing papers explorer
-
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
-
Pref-CTRL: Preference Driven LLM Alignment using Representation Editing
Pref-CTRL trains a multi-objective value function on preferences to guide representation editing for LLM alignment, outperforming RE-Control on benchmarks with better out-of-domain generalization.
-
Meet Dynamic Individual Preferences: Resolving Conflicting Human Value with Paired Fine-Tuning
Preference-Paired Fine-Tuning (PFT) lets LLMs handle conflicting and dynamic individual preferences better than single-preference methods, reaching 96.6% accuracy on the new VCD dataset and 44.76% gains in user alignment with limited history.