A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein · 2023 · arXiv 2301.10226

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

SWAN: Semantic Watermarking with Abstract Meaning Representation

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

TextSeal provides a localized, distortion-free LLM watermark that enables provenance tracking and distillation detection while preserving performance and text quality.

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

Existing LLM watermarking schemes can be defeated by semantic-preserving attacks including lexical changes, machine translation, and neural paraphrasing.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

cs.CL · 2026-05-06 · unverdicted · novelty 5.0

Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

cs.SE · 2026-04-17 · unverdicted · novelty 4.0

LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.

citing papers explorer

Showing 7 of 7 citing papers.

SWAN: Semantic Watermarking with Abstract Meaning Representation cs.CL · 2026-05-05 · unverdicted · none · ref 51
SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 80
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection cs.CR · 2026-05-12 · unverdicted · none · ref 12
TextSeal provides a localized, distortion-free LLM watermark that enables provenance tracking and distillation detection while preserving performance and text quality.
Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs cs.CR · 2026-05-08 · unverdicted · none · ref 10
Existing LLM watermarking schemes can be defeated by semantic-preserving attacks including lexical changes, machine translation, and neural paraphrasing.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 98
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks cs.CL · 2026-05-06 · unverdicted · none · ref 5
Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning cs.SE · 2026-04-17 · unverdicted · none · ref 12
LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.

A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

fields

years

verdicts

representative citing papers

citing papers explorer