SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

Zhenhua Xu, Zhebo Wang, Maike Li, Wenpeng Xing, Chunqiang Hu, Chen Zhi, Meng Han · 2025 · cs.CR · arXiv 2505.06304

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

The protection of Intellectual Property (IP) for Large Language Models (LLMs) has become a critical concern as model theft and unauthorized commercialization escalate. While adversarial fingerprinting offers a promising black-box solution for ownership verification, existing methods suffer from significant limitations: they are fragile against downstream model modifications, sensitive to system prompt variations, and easily detectable due to high-perplexity input patterns. In this paper, we propose \textbf{SRAF}, a stealthy and robust adversarial fingerprinting framework. SRAF employs a synergistic joint optimization strategy across homologous model variants and diverse chat templates, forcing the fingerprint to anchor onto the invariant intrinsic comprehension features of the model family. Furthermore, we introduce a Perplexity Hiding technique that embeds adversarial perturbations within Markdown tables, effectively aligning the prompt's statistics with natural language to evade perplexity-based detection. Extensive experiments on the Llama-2 model family demonstrate that SRAF significantly enhances robustness against fine-tuning, alignment, pruning, merging, and input perturbations while maintaining exceptional stealthiness and low false-positive rates, offering a practical and resilient black-box solution for LLM ownership verification.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FLIPS: Instance-Fingerprinting for LLMs via Pseudo-random Sequences

cs.LG · 2026-06-02 · unverdicted · novelty 8.0

FLIPS identifies LLM instances with 96% closed-set and 90% open-set accuracy by exploiting biases in generated binary random sequences across 237 instances.

cs.CR · 2025-08-15 · accept · novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

citing papers explorer

Showing 2 of 2 citing papers.

FLIPS: Instance-Fingerprinting for LLMs via Pseudo-random Sequences cs.LG · 2026-06-02 · unverdicted · none · ref 17 · internal anchor
FLIPS identifies LLM instances with 96% closed-set and 90% open-set accuracy by exploiting biases in generated binary random sequences across 237 instances.
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends cs.CR · 2025-08-15 · accept · none · ref 162 · internal anchor
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer