Targeted vaccine: Safety alignment for large language models against harmful fine-tuning via layer-wise perturbation

Guozhi Liu, Weiwei Lin, Qi Mu, et al · 2025 · arXiv 2025.361541

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

cs.CR · 2026-05-24 · unverdicted · novelty 5.0

A lifecycle-based survey of LLM fine-tuning security that reviews attacks and defenses by intervention phase and reports unified empirical findings on model-dependent attack effectiveness and limited defense generalization.

citing papers explorer

Showing 1 of 1 citing paper.

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions cs.CR · 2026-05-24 · unverdicted · none · ref 50
A lifecycle-based survey of LLM fine-tuning security that reviews attacks and defenses by intervention phase and reports unified empirical findings on model-dependent attack effectiveness and limited defense generalization.

Targeted vaccine: Safety alignment for large language models against harmful fine-tuning via layer-wise perturbation

fields

years

verdicts

representative citing papers

citing papers explorer