Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Fine-tuning security LLMs specializes inherited classification circuits into token-level indicators that preserve canonical accuracy but fail under behavior-preserving transformations like aliasing and case mutation.
citing papers explorer
-
Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
-
Inherited Circuits, Learned Semantics: How Fine-Tuning Creates Evasion Vulnerabilities Invisible to Standard Evaluation
Fine-tuning security LLMs specializes inherited classification circuits into token-level indicators that preserve canonical accuracy but fail under behavior-preserving transformations like aliasing and case mutation.