pith. machine review for the scientific record. sign in

arxiv: 2601.22246 · v3 · submitted 2026-01-29 · 💻 cs.CR · cs.AI

Recognition: unknown

MirrorMark: Generalizable Mirrored Sampling for Multi-bit LLM Watermarking

Authors on Pith no claims yet
classification 💻 cs.CR cs.AI
keywords mirrormarkwatermarkingmulti-bitsamplerapproachbasebecomecontent
0
0 comments X
read the original abstract

As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but most existing methods either provide only binary signals or achieve multi-bit embedding by distorting the generation distribution. We propose MirrorMark, a generalizable mapping-centric approach for multi-bit LLM watermarking. MirrorMark separates the symbol mapping rule from the base watermarking sampler and maps each symbol to a mod-1 mirroring transformation of a detector-reproducible pseudorandom object, such as sampling values or permutation ranks. A binary-tokenizer analysis shows that complementary mappings yield larger matched--mismatched score gaps than independent-key or shift-based mappings. When composed with a distortion-free base sampler, MirrorMark preserves the token probability distribution by design and maintains text quality in practice. To support practical payload embedding, we introduce a Context-Anchored Balanced Scheduler (CABS), which balances token assignments across message positions while localizing edit effects. We further provide theoretical EER analyses for two representative sampler instantiations. Experiments show that MirrorMark achieves strong detectability and bit accuracy while maintaining text quality comparable to non-watermarked generation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark

    cs.CR 2026-05 unverdicted novelty 7.0

    A binomial multibit watermarking scheme encodes every payload bit at each LLM token with dynamic redirection, outperforming baselines in accuracy and robustness for large payloads.

  2. QuantileMark: A Message-Symmetric Multi-bit Watermark for LLMs

    cs.CL 2026-04 unverdicted novelty 6.0

    QuantileMark is a white-box multi-bit LLM watermark that partitions the [0,1) probability interval into equal-mass bins to achieve message symmetry and proves that averaging over messages recovers the base distribution.