SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

Chen Zhi; Chunqiang Hu; Maike Li; Meng Han; Wenpeng Xing; Zhebo Wang; Zhenhua Xu

arxiv: 2505.06304 · v4 · pith:4OSMGZ2Lnew · submitted 2025-05-08 · 💻 cs.CR

SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

Zhebo Wang , Zhenhua Xu , Maike Li , Wenpeng Xing , Chunqiang Hu , Chen Zhi , Meng Han This is my paper

classification 💻 cs.CR

keywords modeladversarialsraflanguageverificationblack-boxfamilyfingerprint

0 comments

read the original abstract

The protection of Intellectual Property (IP) for Large Language Models (LLMs) has become a critical concern as model theft and unauthorized commercialization escalate. While adversarial fingerprinting offers a promising black-box solution for ownership verification, existing methods suffer from significant limitations: they are fragile against downstream model modifications, sensitive to system prompt variations, and easily detectable due to high-perplexity input patterns. In this paper, we propose \textbf{SRAF}, a stealthy and robust adversarial fingerprinting framework. SRAF employs a synergistic joint optimization strategy across homologous model variants and diverse chat templates, forcing the fingerprint to anchor onto the invariant intrinsic comprehension features of the model family. Furthermore, we introduce a Perplexity Hiding technique that embeds adversarial perturbations within Markdown tables, effectively aligning the prompt's statistics with natural language to evade perplexity-based detection. Extensive experiments on the Llama-2 model family demonstrate that SRAF significantly enhances robustness against fine-tuning, alignment, pruning, merging, and input perturbations while maintaining exceptional stealthiness and low false-positive rates, offering a practical and resilient black-box solution for LLM ownership verification.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
cs.CR 2025-08 accept novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.