CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

HoEun Kim; Joeun Kim; Young-Sik Kim

arxiv: 2606.24163 · v1 · pith:RN6PLMLBnew · submitted 2026-06-23 · 💻 cs.CR · cs.CL

CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

Joeun Kim , HoEun Kim , Young-Sik Kim This is my paper

Pith reviewed 2026-06-25 23:41 UTC · model grok-4.3

classification 💻 cs.CR cs.CL

keywords LLM watermarkingsoft decodinglog-likelihood ratiomulti-bit watermarkrobustnessfalse positive rateLLR

0 comments

The pith

CORE-BREW derives closed-form per-token LLRs for soft-decision decoding in multi-bit LLM watermarks by targeting a fixed hit rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CORE-BREW as an extension to block-wise BREW that embeds watermarks while holding the hit rate at a chosen constant p-star. This calibration step produces exact log-likelihood ratios for each token, which soft-decision decoding can then use instead of discarding reliability information. The method offers two operating modes that keep false-positive rates under control even when text is edited or paraphrased. A reader would care because multi-bit watermarks need to survive real-world modifications if they are to serve as reliable provenance signals for LLM outputs.

Core claim

CORE-BREW calibrates the watermark channel by targeting a fixed hit rate p-star, yielding closed-form per-token log-likelihood ratios for principled soft-decision decoding. It supports Strict-Safe mode, which preserves the bounded-distance designated-codeword acceptance region, and FPR-Calibrated mode, which uses likelihood-based scoring together with lightweight list decoding. Experiments on open-source LLMs under token-level edits and paraphrasing show improved low-FPR discrimination and robustness over prior multi-bit watermarking baselines while maintaining comparable semantic quality.

What carries the argument

Constant-hit-rate embedding that produces per-token LLRs for soft decoding of the watermark.

If this is right

Token reliability information can be retained and used rather than thrown away by hard decisions.
Strict-Safe mode keeps the original acceptance region while FPR-Calibrated mode trades off detection thresholds explicitly.
Low false-positive performance improves under both local edits and global paraphrasing.
Semantic quality of the watermarked text stays comparable to unwatermarked baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fixed-hit-rate calibration step could be applied to other embedding families to obtain analogous LLR expressions.
Standard soft-decision algorithms from communication theory become directly usable once the LLRs are available.
List decoding in the FPR-Calibrated mode opens a path to message-length scaling without exponential growth in search cost.

Load-bearing premise

A fixed target hit rate p-star produces accurate closed-form LLRs that remain reliable under token-level edits and paraphrasing without further adjustments.

What would settle it

A test in which the measured hit rate after paraphrasing deviates enough from p-star that the LLR decoder no longer outperforms hard-decision baselines at the target false-positive rate.

Figures

Figures reproduced from arXiv: 2606.24163 by HoEun Kim, Joeun Kim, Young-Sik Kim.

**Figure 2.** Figure 2: ROC curves under the clean setting. (a) Full ROC comparison. (b) Zoomed view of the low-FPR region. CORE-BREW variants and BREW [9] achieve strong discrimination, while MPAC [27] and Qu et al. [19] remain close to random guessing. 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate (FPR) 0.0 0.2 0.4 0.6 0.8 1.0 True Positive Rate (TPR) Ideal (FPR=0, TPR=1) C4 MPAC (AUC=0.580) Qu et al. (AUC=0.512) BREW (AUC=0.828)… view at source ↗

**Figure 5.** Figure 5: ROC curves under 10% token-level insertion on C4 (left) and OpenGen (right). All methods degrade under insertion, but COREBREW and BREW [9] maintain stronger performance than MPAC [27] and Qu et al. [19]. 6.4 Paraphrasing and Semantic Editing (Q3) 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate (FPR) 0.0 0.2 0.4 0.6 0.8 1.0 True Positive Rate (TPR) Ideal (FPR=0, TPR=1) C4 MPAC (AUC=0.710) Qu et al. (AUC=0.4… view at source ↗

**Figure 6.** Figure 6: ROC curves under T5-based paraphrasing on C4 and OpenGen. CORE-BREW maintains strong low-FPR discrimination, while MPAC and Qu et al. [19] suffer from high FPR. We evaluate robustness against realistic paraphrasing attacks that aim to remove watermark signals while preserving semantic content. Paraphrases are generated using models distinct from the watermark generator, following prior work [23, 15, 11,… view at source ↗

**Figure 7.** Figure 7: Ablation on entropy-aware erasures under the clean setting at [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Sensitivity of detection performance to the window-shift budget [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: ROC curves under 10% token-level substitution attacks on C4 (left) and OpenGen (right). [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: ROC curves under 10% deletion attacks on C4 (left) and OpenGen (right). CORE-BREW [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: ROC curves under 10% insertion attacks on C4 (left) and OpenGen (right). All methods [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

read the original abstract

Reliable provenance for LLM outputs requires multi-bit watermarks that remain robust under editing while maintaining strict false-positive control. Existing ECC-based LLM watermarks rely largely on hard-decision decoding, discarding token-level reliability information. We propose CORE-BREW, a Constant-hit-Rate Embedding extension of block-wise BREW for robust multi-bit watermarking. CORE-BREW calibrates the watermark channel by targeting a fixed hit rate p-star, yielding closed-form per-token log-likelihood ratios (LLRs) for principled soft-decision decoding. It supports two detection modes: Strict-Safe, which preserves the bounded-distance designated-codeword acceptance region, and FPR-Calibrated, which uses likelihood-based scoring and lightweight list decoding to characterize the FPR-TPR trade-off. Experiments on open-source LLMs under token-level edits and paraphrasing demonstrate improved low-FPR discrimination and robustness over prior multi-bit watermarking baselines while maintaining comparable semantic quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CORE-BREW gives closed-form LLRs for soft decoding in multi-bit LLM watermarks by fixing hit rate p-star, but the full derivation and edit robustness need direct verification.

read the letter

The paper's main contribution is extending block-wise BREW with constant-hit-rate embedding so that per-token LLRs come out in closed form. This lets them do soft-decision decoding instead of the usual hard-decision ECC approach, and they add two modes: one that keeps the bounded-distance acceptance region and one that uses likelihood scoring plus list decoding for FPR control.

The experiments on open-source models with token edits and paraphrasing are the part that actually shows something: they report better low-FPR discrimination and robustness while keeping semantic quality comparable to baselines. That is concrete and worth looking at if you work in this area.

The soft spot is exactly the one the stress test flags. The LLRs rest on the channel being stationary at the target p-star. Edits and paraphrasing change token distributions, so the actual hit probabilities move. The abstract gives no bound showing the mismatch stays small enough to preserve the claimed FPR calibration or the soft-decision gain. Without the derivation steps and the quantitative checks in the full text, it is hard to tell how much of the reported improvement comes from the LLRs versus from other tuning.

This is a narrow but practical piece for the LLM watermarking subgroup. Readers who already follow BREW and ECC-based schemes will get the most out of it. The work is coherent on its own terms and the authors engage the right prior results, so it clears the bar for a serious referee even if the central robustness claim needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CORE-BREW, a constant-hit-rate extension of block-wise BREW watermarking for multi-bit LLM provenance. By targeting a fixed hit rate p-star during embedding, it derives closed-form per-token LLRs that enable soft-decision decoding. Two detection modes are supported: Strict-Safe (preserving bounded-distance acceptance) and FPR-Calibrated (likelihood scoring with list decoding). Experiments on open-source LLMs claim improved low-FPR discrimination and robustness to token edits and paraphrasing relative to prior multi-bit baselines, while preserving semantic quality.

Significance. If the LLR derivation remains valid under editing, the work supplies a principled soft-information channel model for multi-bit watermark detection that could tighten FPR-TPR trade-offs without sacrificing the designated-codeword guarantees of earlier hard-decision schemes.

major comments (2)

[§3] §3 (CORE-BREW construction): the closed-form LLR expressions are obtained under the assumption that the per-token hit probability equals the design parameter p-star exactly and is stationary across tokens and contexts. Token-level edits and paraphrasing change both the token distribution and the effective embedding behavior, so the observed hit/miss statistics deviate from p-star; no explicit bound is supplied showing that the resulting LLR mismatch remains small enough to preserve the claimed FPR calibration.
[§5] §5 (robustness experiments): the reported gains in low-FPR discrimination are shown only via aggregate ROC curves; the manuscript does not report the empirical per-token hit rates after each edit type, nor does it compare the LLRs computed from the nominal p-star against LLRs recomputed from the observed post-edit statistics, leaving open whether the soft-decision advantage survives the distribution shift.

minor comments (2)

[Abstract] The abstract states that the scheme 'yields closed-form per-token log-likelihood ratios' but does not cite the equation number or subsection containing the derivation.
[§4] Notation for the two detection modes (Strict-Safe vs. FPR-Calibrated) is introduced without a compact table summarizing the acceptance region, scoring function, and list-decoding parameters for each mode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the importance of validating the LLR model under distribution shift. We respond to each major comment below.

read point-by-point responses

Referee: [§3] §3 (CORE-BREW construction): the closed-form LLR expressions are obtained under the assumption that the per-token hit probability equals the design parameter p-star exactly and is stationary across tokens and contexts. Token-level edits and paraphrasing change both the token distribution and the effective embedding behavior, so the observed hit/miss statistics deviate from p-star; no explicit bound is supplied showing that the resulting LLR mismatch remains small enough to preserve the claimed FPR calibration.

Authors: The closed-form LLRs are indeed derived under the modeling assumption that the constant-hit-rate embedding achieves exactly p-star on average. CORE-BREW enforces this target by construction via per-context adjustment of the embedding probability; the Strict-Safe detector does not use the LLR values at all and therefore inherits the same bounded-distance guarantees as the original BREW scheme. For the FPR-Calibrated detector we treat the nominal LLRs as an approximation. The current manuscript does not supply a formal bound on the mismatch induced by edits. In revision we will add an explicit statement of the stationarity assumption in §3 together with a short discussion of its implications. revision: partial
Referee: [§5] §5 (robustness experiments): the reported gains in low-FPR discrimination are shown only via aggregate ROC curves; the manuscript does not report the empirical per-token hit rates after each edit type, nor does it compare the LLRs computed from the nominal p-star against LLRs recomputed from the observed post-edit statistics, leaving open whether the soft-decision advantage survives the distribution shift.

Authors: We agree that reporting the post-edit hit-rate statistics and a direct comparison of nominal versus observed LLRs would strengthen the robustness claims. In the revised manuscript we will add a table (or supplementary figure) that lists the empirical per-token hit rates measured after each edit type and paraphrasing attack, together with the corresponding ROC curves obtained when LLRs are recomputed from the observed statistics. This will allow readers to assess how much of the reported soft-decision gain persists under the observed distribution shift. revision: yes

Circularity Check

0 steps flagged

No circularity: LLRs derived from explicit modeling assumption, not from fitted evaluation data

full rationale

The abstract presents targeting a fixed hit rate p-star as a deliberate design choice that directly yields closed-form LLRs under the assumed stationary channel model. This is a standard modeling step (LLR = log(p*/(1-p*)) for hit/miss) rather than a statistical fit to the same data later used for robustness evaluation. No equations or claims in the provided text reduce a 'prediction' or central result to its own inputs by construction. Experimental claims rest on external tests under edits/paraphrasing, not on self-referential definitions or self-citation chains. The derivation chain is self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is populated from the high-level description; p-star appears as a design parameter whose value is chosen rather than derived from external benchmarks.

free parameters (1)

p-star
Target hit rate used to calibrate the watermark channel and obtain closed-form LLRs; its specific value is not stated in the abstract.

pith-pipeline@v0.9.1-grok · 5695 in / 1253 out tokens · 20309 ms · 2026-06-25T23:41:06.420239+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281, 2024

Patrick Chao, Yan Sun, Edgar Dobriban, and Hamed Hassani. Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281, 2024

arXiv 2024
[2]

D. Chase. Class of algorithms for decoding block codes with channel measurement information. IEEE Transactions on Information Theory, 18(1):170–182, 1972. doi: 10.1109/TIT.1972. 1054746

work page doi:10.1109/tit.1972 1972
[3]

Pseudorandom error-correcting codes

Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology – CRYPTO 2024, pages 325–347, Cham,

2024
[4]

Springer Nature Switzerland
[5]

Cambridge University Press, 2001

Oded Goldreich.Foundations of Cryptography, Volume 1: Basic Tools. Cambridge University Press, 2001. ISBN 0-521-79172-3

2001
[6]

Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963. URL https://doi.org/10. 1080/01621459.1963.10500830

arXiv 1963
[7]

The curious case of neural text degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. InInternational Conference on Learning Representations (ICLR 2020), 2020. URLhttps://openreview.net/forum?id=rygGQyrFvH

2020
[8]

An upper bound for the probability of a union.Journal of Applied Probability, 13(3):597–603, 1976

David Hunter. An upper bound for the probability of a union.Journal of Applied Probability, 13(3):597–603, 1976

1976
[9]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...

Pith/arXiv arXiv 2023
[10]

Block-wise codeword embedding for reliable multi-bit text watermarking, 2026

Joeun Kim, HoEun Kim, Dongsup Jin, and Young-Sik Kim. Block-wise codeword embedding for reliable multi-bit text watermarking, 2026. URL https://arxiv.org/abs/2605.00348. 10

Pith/arXiv arXiv 2026
[11]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InProceedings of the 40th International Conference on Machine Learning (ICML 2023), pages 17061–17084. PMLR, 2023

2023
[12]

Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Advances in Neural Information Processing Systems (NeurIPS 2023), 36:27469–27500, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Advances in Neural Information Processing Systems (NeurIPS 2023), 36:27469–27500, 2023

2023
[13]

Proceedings of the 2018

Taku Kudo and John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Eduardo Blanco and Wei Lu, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018. Association for Co...

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018
[14]

Costello.Error Control Coding: Fundamentals and Applications

Shu Lin and Daniel J. Costello.Error Control Coding: Fundamentals and Applications. Prentice Hall, Upper Saddle River, NJ, 2 edition, 2004. ISBN 0-13-042672-6. URL https://www.pearson.com/en-us/subject-catalog/p/error-control-coding/ P200000003536/9780130426727

2004
[15]

Manning, and Chelsea Finn

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023), pages 24950–24962. PMLR, 2023

2023
[16]

Morris, Eli Lifland, Jin Yong Yoo, and Yanjun Qi

John X. Morris, Eli Lifland, Jin Yong Yoo, and Yanjun Qi. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. InProceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations, pages 119–126, 2020

2020
[17]

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. MarkLLM: An open-source toolkit for LLM watermarking. In Delia Irazu Hernandez Farias, Tom Hope, and Manling Li, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processin...

work page doi:10.18653/v1/2024.emnlp-demo.7 2024
[18]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association...

work page doi:10.3115/1073083.1073135 2002
[19]

PyTorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, An- dreas Kopf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perf...

2019
[20]

Provably robust multi-bit watermarking for {AI-generated} text

Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, Zhihua Tian, Wei Zou, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for {AI-generated} text. In34th USENIX Security Symposium (USENIX Security 25), pages 201–220, 2025

2025
[21]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020. 11

2020
[22]

Cambridge university press, 2008

Tom Richardson and Rudiger Urbanke.Modern coding theory. Cambridge university press, 2008

2008
[23]

Necessary and sufficient watermark for large language models.arXiv preprint arXiv:2310.00833, 2023

Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, and Makoto Yamada. Necessary and sufficient watermark for large language models.arXiv preprint arXiv:2310.00833, 2023

arXiv 2023
[24]

ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations

John Wieting and Kevin Gimpel. ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 451–462, 2018. URL https: //arxiv.org/abs/1711.05732

Pith/arXiv arXiv 2018
[25]

Transformers: State- of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. Transformers: State- of-the-art natural language processing. InProceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

2020
[26]

Attacking neural text detectors.arXiv preprint arXiv:2002.11768, 2020

Max Wolff and Stuart Wolff. Attacking neural text detectors.arXiv preprint arXiv:2002.11768, 2020

arXiv 2002
[27]

A resilient and accessible distribution-preserving watermark for large language models

Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang. A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (ICML 2024), volume 235 ofProceedings of Machine Learning Research, pages 53443–53470. PMLR, 2024

2024
[28]

Advancing beyond identification: Multi- bit watermark for large language models

KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi- bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055, 2024

2024
[29]

OPT: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. OPT: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

Pith/arXiv arXiv 2022
[30]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with BERT, 2020. URLhttps://arxiv.org/abs/1904.09675

Pith/arXiv arXiv 2020
[31]

red/green

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text. InProceedings of the 12th International Conference on Learning Representations (ICLR 2024), 2024. A Baseline Full Details A.1 Block-wise Designated-Codeword Watermarking This appendix summarizes the generic block-wise designated-codeword waterm...

arXiv 2024
[32]

Justification: The paper does not involve crowdsourcing, user studies, surveys, or research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

[1] [1]

Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281, 2024

Patrick Chao, Yan Sun, Edgar Dobriban, and Hamed Hassani. Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281, 2024

arXiv 2024

[2] [2]

D. Chase. Class of algorithms for decoding block codes with channel measurement information. IEEE Transactions on Information Theory, 18(1):170–182, 1972. doi: 10.1109/TIT.1972. 1054746

work page doi:10.1109/tit.1972 1972

[3] [3]

Pseudorandom error-correcting codes

Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology – CRYPTO 2024, pages 325–347, Cham,

2024

[4] [4]

Springer Nature Switzerland

[5] [5]

Cambridge University Press, 2001

Oded Goldreich.Foundations of Cryptography, Volume 1: Basic Tools. Cambridge University Press, 2001. ISBN 0-521-79172-3

2001

[6] [6]

Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963. URL https://doi.org/10. 1080/01621459.1963.10500830

arXiv 1963

[7] [7]

The curious case of neural text degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. InInternational Conference on Learning Representations (ICLR 2020), 2020. URLhttps://openreview.net/forum?id=rygGQyrFvH

2020

[8] [8]

An upper bound for the probability of a union.Journal of Applied Probability, 13(3):597–603, 1976

David Hunter. An upper bound for the probability of a union.Journal of Applied Probability, 13(3):597–603, 1976

1976

[9] [9]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...

Pith/arXiv arXiv 2023

[10] [10]

Block-wise codeword embedding for reliable multi-bit text watermarking, 2026

Joeun Kim, HoEun Kim, Dongsup Jin, and Young-Sik Kim. Block-wise codeword embedding for reliable multi-bit text watermarking, 2026. URL https://arxiv.org/abs/2605.00348. 10

Pith/arXiv arXiv 2026

[11] [11]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InProceedings of the 40th International Conference on Machine Learning (ICML 2023), pages 17061–17084. PMLR, 2023

2023

[12] [12]

Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Advances in Neural Information Processing Systems (NeurIPS 2023), 36:27469–27500, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Advances in Neural Information Processing Systems (NeurIPS 2023), 36:27469–27500, 2023

2023

[13] [13]

Proceedings of the 2018

Taku Kudo and John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Eduardo Blanco and Wei Lu, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018. Association for Co...

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018

[14] [14]

Costello.Error Control Coding: Fundamentals and Applications

Shu Lin and Daniel J. Costello.Error Control Coding: Fundamentals and Applications. Prentice Hall, Upper Saddle River, NJ, 2 edition, 2004. ISBN 0-13-042672-6. URL https://www.pearson.com/en-us/subject-catalog/p/error-control-coding/ P200000003536/9780130426727

2004

[15] [15]

Manning, and Chelsea Finn

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023), pages 24950–24962. PMLR, 2023

2023

[16] [16]

Morris, Eli Lifland, Jin Yong Yoo, and Yanjun Qi

John X. Morris, Eli Lifland, Jin Yong Yoo, and Yanjun Qi. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. InProceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations, pages 119–126, 2020

2020

[17] [17]

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. MarkLLM: An open-source toolkit for LLM watermarking. In Delia Irazu Hernandez Farias, Tom Hope, and Manling Li, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processin...

work page doi:10.18653/v1/2024.emnlp-demo.7 2024

[18] [18]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association...

work page doi:10.3115/1073083.1073135 2002

[19] [19]

PyTorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, An- dreas Kopf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perf...

2019

[20] [20]

Provably robust multi-bit watermarking for {AI-generated} text

Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, Zhihua Tian, Wei Zou, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for {AI-generated} text. In34th USENIX Security Symposium (USENIX Security 25), pages 201–220, 2025

2025

[21] [21]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020. 11

2020

[22] [22]

Cambridge university press, 2008

Tom Richardson and Rudiger Urbanke.Modern coding theory. Cambridge university press, 2008

2008

[23] [23]

Necessary and sufficient watermark for large language models.arXiv preprint arXiv:2310.00833, 2023

Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, and Makoto Yamada. Necessary and sufficient watermark for large language models.arXiv preprint arXiv:2310.00833, 2023

arXiv 2023

[24] [24]

ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations

John Wieting and Kevin Gimpel. ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 451–462, 2018. URL https: //arxiv.org/abs/1711.05732

Pith/arXiv arXiv 2018

[25] [25]

Transformers: State- of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. Transformers: State- of-the-art natural language processing. InProceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

2020

[26] [26]

Attacking neural text detectors.arXiv preprint arXiv:2002.11768, 2020

Max Wolff and Stuart Wolff. Attacking neural text detectors.arXiv preprint arXiv:2002.11768, 2020

arXiv 2002

[27] [27]

A resilient and accessible distribution-preserving watermark for large language models

Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang. A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (ICML 2024), volume 235 ofProceedings of Machine Learning Research, pages 53443–53470. PMLR, 2024

2024

[28] [28]

Advancing beyond identification: Multi- bit watermark for large language models

KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi- bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055, 2024

2024

[29] [29]

OPT: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. OPT: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

Pith/arXiv arXiv 2022

[30] [30]

Weinberger, and Yoav Artzi

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with BERT, 2020. URLhttps://arxiv.org/abs/1904.09675

Pith/arXiv arXiv 2020

[31] [31]

red/green

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text. InProceedings of the 12th International Conference on Learning Representations (ICLR 2024), 2024. A Baseline Full Details A.1 Block-wise Designated-Codeword Watermarking This appendix summarizes the generic block-wise designated-codeword waterm...

arXiv 2024

[32] [32]

Justification: The paper does not involve crowdsourcing, user studies, surveys, or research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...