SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Two new constructions for multi-bit generative watermarking attain the established lower bound on miss-detection probability under worst-case false-alarm constraints, fully characterizing optimal performance via linear programming.
TextSeal provides a localized, distortion-free LLM watermark that enables provenance tracking and distillation detection while preserving performance and text quality.
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
citing papers explorer
-
SLAM: Structural Linguistic Activation Marking for Language Models
SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
-
Optimal Multi-bit Generative Watermarking Schemes Under Worst-Case False-Alarm Constraints
Two new constructions for multi-bit generative watermarking attain the established lower bound on miss-detection probability under worst-case false-alarm constraints, fully characterizing optimal performance via linear programming.
-
TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection
TextSeal provides a localized, distortion-free LLM watermark that enables provenance tracking and distillation detection while preserving performance and text quality.
-
Towards Robust Content Watermarking Against Removal and Forgery Attacks
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.