arXiv preprint arXiv:2510.18081 , year=

Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth , author= · arXiv 2510.18081

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

LLM Safety From Within: Detecting Harmful Content with Internal Representations

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.

citing papers explorer

Showing 1 of 1 citing paper.

LLM Safety From Within: Detecting Harmful Content with Internal Representations cs.AI · 2026-04-20 · unverdicted · none · ref 78
SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.

arXiv preprint arXiv:2510.18081 , year=

fields

years

verdicts

representative citing papers

citing papers explorer