Steer-to-Detect learns a steering vector injected into LLM hidden states to boost class separability and applies hypothesis testing with finite-sample Type I/II error guarantees for generated-text detection.
A General Method for Detecting Information Generated by Large Language Models.arXiv preprint arXiv:2506.21589, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.AP 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Steer-to-Detect: Probing Hidden Representations for Detection of LLM-Generated Texts
Steer-to-Detect learns a steering vector injected into LLM hidden states to boost class separability and applies hypothesis testing with finite-sample Type I/II error guarantees for generated-text detection.