CSO-LLM proposes class subspace orthogonalization to enhance post-training backdoor detection sensitivity/specificity and enable accurate trigger inversion in LLMs via continuous embedding optimization and discrete greedy accretion.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CR 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.
citing papers explorer
-
CSO-LLM: Class Subspace Orthogonalization for Post-Training Backdoor Detection and Trigger Inversion in LLMs
CSO-LLM proposes class subspace orthogonalization to enhance post-training backdoor detection sensitivity/specificity and enable accurate trigger inversion in LLMs via continuous embedding optimization and discrete greedy accretion.
-
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
-
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.