pith. sign in

arxiv: 2606.24163 · v1 · pith:RN6PLMLBnew · submitted 2026-06-23 · 💻 cs.CR · cs.CL

CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

Pith reviewed 2026-06-25 23:41 UTC · model grok-4.3

classification 💻 cs.CR cs.CL
keywords LLM watermarkingsoft decodinglog-likelihood ratiomulti-bit watermarkrobustnessfalse positive rateLLR
0
0 comments X

The pith

CORE-BREW derives closed-form per-token LLRs for soft-decision decoding in multi-bit LLM watermarks by targeting a fixed hit rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CORE-BREW as an extension to block-wise BREW that embeds watermarks while holding the hit rate at a chosen constant p-star. This calibration step produces exact log-likelihood ratios for each token, which soft-decision decoding can then use instead of discarding reliability information. The method offers two operating modes that keep false-positive rates under control even when text is edited or paraphrased. A reader would care because multi-bit watermarks need to survive real-world modifications if they are to serve as reliable provenance signals for LLM outputs.

Core claim

CORE-BREW calibrates the watermark channel by targeting a fixed hit rate p-star, yielding closed-form per-token log-likelihood ratios for principled soft-decision decoding. It supports Strict-Safe mode, which preserves the bounded-distance designated-codeword acceptance region, and FPR-Calibrated mode, which uses likelihood-based scoring together with lightweight list decoding. Experiments on open-source LLMs under token-level edits and paraphrasing show improved low-FPR discrimination and robustness over prior multi-bit watermarking baselines while maintaining comparable semantic quality.

What carries the argument

Constant-hit-rate embedding that produces per-token LLRs for soft decoding of the watermark.

If this is right

  • Token reliability information can be retained and used rather than thrown away by hard decisions.
  • Strict-Safe mode keeps the original acceptance region while FPR-Calibrated mode trades off detection thresholds explicitly.
  • Low false-positive performance improves under both local edits and global paraphrasing.
  • Semantic quality of the watermarked text stays comparable to unwatermarked baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fixed-hit-rate calibration step could be applied to other embedding families to obtain analogous LLR expressions.
  • Standard soft-decision algorithms from communication theory become directly usable once the LLRs are available.
  • List decoding in the FPR-Calibrated mode opens a path to message-length scaling without exponential growth in search cost.

Load-bearing premise

A fixed target hit rate p-star produces accurate closed-form LLRs that remain reliable under token-level edits and paraphrasing without further adjustments.

What would settle it

A test in which the measured hit rate after paraphrasing deviates enough from p-star that the LLR decoder no longer outperforms hard-decision baselines at the target false-positive rate.

Figures

Figures reproduced from arXiv: 2606.24163 by HoEun Kim, Joeun Kim, Young-Sik Kim.

Figure 1
Figure 1. Figure 1: Overview of CORE-BREW. Constant Hit-Rate embedding provides calibrated reliability [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ROC curves under the clean setting. (a) Full ROC comparison. (b) Zoomed view of the low-FPR region. CORE-BREW variants and BREW [9] achieve strong discrimination, while MPAC [27] and Qu et al. [19] remain close to random guessing. 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate (FPR) 0.0 0.2 0.4 0.6 0.8 1.0 True Positive Rate (TPR) Ideal (FPR=0, TPR=1) C4 MPAC (AUC=0.580) Qu et al. (AUC=0.512) BREW (AUC=0.828)… view at source ↗
Figure 5
Figure 5. Figure 5: ROC curves under 10% token-level in￾sertion on C4 (left) and OpenGen (right). All methods degrade under insertion, but CORE￾BREW and BREW [9] maintain stronger perfor￾mance than MPAC [27] and Qu et al. [19]. 6.4 Paraphrasing and Semantic Editing (Q3) 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate (FPR) 0.0 0.2 0.4 0.6 0.8 1.0 True Positive Rate (TPR) Ideal (FPR=0, TPR=1) C4 MPAC (AUC=0.710) Qu et al. (AUC=0.4… view at source ↗
Figure 6
Figure 6. Figure 6: ROC curves under T5-based paraphras￾ing on C4 and OpenGen. CORE-BREW maintains strong low-FPR discrimination, while MPAC and Qu et al. [19] suffer from high FPR. We evaluate robustness against realistic para￾phrasing attacks that aim to remove watermark signals while preserving semantic content. Para￾phrases are generated using models distinct from the watermark generator, following prior work [23, 15, 11,… view at source ↗
Figure 7
Figure 7. Figure 7: Ablation on entropy-aware erasures under the clean setting at [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity of detection performance to the window-shift budget [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: ROC curves under 10% token-level substitution attacks on C4 (left) and OpenGen (right). [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: ROC curves under 10% deletion attacks on C4 (left) and OpenGen (right). CORE-BREW [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: ROC curves under 10% insertion attacks on C4 (left) and OpenGen (right). All methods [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
read the original abstract

Reliable provenance for LLM outputs requires multi-bit watermarks that remain robust under editing while maintaining strict false-positive control. Existing ECC-based LLM watermarks rely largely on hard-decision decoding, discarding token-level reliability information. We propose CORE-BREW, a Constant-hit-Rate Embedding extension of block-wise BREW for robust multi-bit watermarking. CORE-BREW calibrates the watermark channel by targeting a fixed hit rate p-star, yielding closed-form per-token log-likelihood ratios (LLRs) for principled soft-decision decoding. It supports two detection modes: Strict-Safe, which preserves the bounded-distance designated-codeword acceptance region, and FPR-Calibrated, which uses likelihood-based scoring and lightweight list decoding to characterize the FPR-TPR trade-off. Experiments on open-source LLMs under token-level edits and paraphrasing demonstrate improved low-FPR discrimination and robustness over prior multi-bit watermarking baselines while maintaining comparable semantic quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CORE-BREW, a constant-hit-rate extension of block-wise BREW watermarking for multi-bit LLM provenance. By targeting a fixed hit rate p-star during embedding, it derives closed-form per-token LLRs that enable soft-decision decoding. Two detection modes are supported: Strict-Safe (preserving bounded-distance acceptance) and FPR-Calibrated (likelihood scoring with list decoding). Experiments on open-source LLMs claim improved low-FPR discrimination and robustness to token edits and paraphrasing relative to prior multi-bit baselines, while preserving semantic quality.

Significance. If the LLR derivation remains valid under editing, the work supplies a principled soft-information channel model for multi-bit watermark detection that could tighten FPR-TPR trade-offs without sacrificing the designated-codeword guarantees of earlier hard-decision schemes.

major comments (2)
  1. [§3] §3 (CORE-BREW construction): the closed-form LLR expressions are obtained under the assumption that the per-token hit probability equals the design parameter p-star exactly and is stationary across tokens and contexts. Token-level edits and paraphrasing change both the token distribution and the effective embedding behavior, so the observed hit/miss statistics deviate from p-star; no explicit bound is supplied showing that the resulting LLR mismatch remains small enough to preserve the claimed FPR calibration.
  2. [§5] §5 (robustness experiments): the reported gains in low-FPR discrimination are shown only via aggregate ROC curves; the manuscript does not report the empirical per-token hit rates after each edit type, nor does it compare the LLRs computed from the nominal p-star against LLRs recomputed from the observed post-edit statistics, leaving open whether the soft-decision advantage survives the distribution shift.
minor comments (2)
  1. [Abstract] The abstract states that the scheme 'yields closed-form per-token log-likelihood ratios' but does not cite the equation number or subsection containing the derivation.
  2. [§4] Notation for the two detection modes (Strict-Safe vs. FPR-Calibrated) is introduced without a compact table summarizing the acceptance region, scoring function, and list-decoding parameters for each mode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the importance of validating the LLR model under distribution shift. We respond to each major comment below.

read point-by-point responses
  1. Referee: [§3] §3 (CORE-BREW construction): the closed-form LLR expressions are obtained under the assumption that the per-token hit probability equals the design parameter p-star exactly and is stationary across tokens and contexts. Token-level edits and paraphrasing change both the token distribution and the effective embedding behavior, so the observed hit/miss statistics deviate from p-star; no explicit bound is supplied showing that the resulting LLR mismatch remains small enough to preserve the claimed FPR calibration.

    Authors: The closed-form LLRs are indeed derived under the modeling assumption that the constant-hit-rate embedding achieves exactly p-star on average. CORE-BREW enforces this target by construction via per-context adjustment of the embedding probability; the Strict-Safe detector does not use the LLR values at all and therefore inherits the same bounded-distance guarantees as the original BREW scheme. For the FPR-Calibrated detector we treat the nominal LLRs as an approximation. The current manuscript does not supply a formal bound on the mismatch induced by edits. In revision we will add an explicit statement of the stationarity assumption in §3 together with a short discussion of its implications. revision: partial

  2. Referee: [§5] §5 (robustness experiments): the reported gains in low-FPR discrimination are shown only via aggregate ROC curves; the manuscript does not report the empirical per-token hit rates after each edit type, nor does it compare the LLRs computed from the nominal p-star against LLRs recomputed from the observed post-edit statistics, leaving open whether the soft-decision advantage survives the distribution shift.

    Authors: We agree that reporting the post-edit hit-rate statistics and a direct comparison of nominal versus observed LLRs would strengthen the robustness claims. In the revised manuscript we will add a table (or supplementary figure) that lists the empirical per-token hit rates measured after each edit type and paraphrasing attack, together with the corresponding ROC curves obtained when LLRs are recomputed from the observed statistics. This will allow readers to assess how much of the reported soft-decision gain persists under the observed distribution shift. revision: yes

Circularity Check

0 steps flagged

No circularity: LLRs derived from explicit modeling assumption, not from fitted evaluation data

full rationale

The abstract presents targeting a fixed hit rate p-star as a deliberate design choice that directly yields closed-form LLRs under the assumed stationary channel model. This is a standard modeling step (LLR = log(p*/(1-p*)) for hit/miss) rather than a statistical fit to the same data later used for robustness evaluation. No equations or claims in the provided text reduce a 'prediction' or central result to its own inputs by construction. Experimental claims rest on external tests under edits/paraphrasing, not on self-referential definitions or self-citation chains. The derivation chain is self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is populated from the high-level description; p-star appears as a design parameter whose value is chosen rather than derived from external benchmarks.

free parameters (1)
  • p-star
    Target hit rate used to calibrate the watermark channel and obtain closed-form LLRs; its specific value is not stated in the abstract.

pith-pipeline@v0.9.1-grok · 5695 in / 1253 out tokens · 20309 ms · 2026-06-25T23:41:06.420239+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281, 2024

    Patrick Chao, Yan Sun, Edgar Dobriban, and Hamed Hassani. Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281, 2024

  2. [2]

    D. Chase. Class of algorithms for decoding block codes with channel measurement information. IEEE Transactions on Information Theory, 18(1):170–182, 1972. doi: 10.1109/TIT.1972. 1054746

  3. [3]

    Pseudorandom error-correcting codes

    Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology – CRYPTO 2024, pages 325–347, Cham,

  4. [4]

    Springer Nature Switzerland

  5. [5]

    Cambridge University Press, 2001

    Oded Goldreich.Foundations of Cryptography, Volume 1: Basic Tools. Cambridge University Press, 2001. ISBN 0-521-79172-3

  6. [6]

    Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

    Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963. URL https://doi.org/10. 1080/01621459.1963.10500830

  7. [7]

    The curious case of neural text degeneration

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. InInternational Conference on Learning Representations (ICLR 2020), 2020. URLhttps://openreview.net/forum?id=rygGQyrFvH

  8. [8]

    An upper bound for the probability of a union.Journal of Applied Probability, 13(3):597–603, 1976

    David Hunter. An upper bound for the probability of a union.Journal of Applied Probability, 13(3):597–603, 1976

  9. [9]

    Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...

  10. [10]

    Block-wise codeword embedding for reliable multi-bit text watermarking, 2026

    Joeun Kim, HoEun Kim, Dongsup Jin, and Young-Sik Kim. Block-wise codeword embedding for reliable multi-bit text watermarking, 2026. URL https://arxiv.org/abs/2605.00348. 10

  11. [11]

    A watermark for large language models

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InProceedings of the 40th International Conference on Machine Learning (ICML 2023), pages 17061–17084. PMLR, 2023

  12. [12]

    Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Advances in Neural Information Processing Systems (NeurIPS 2023), 36:27469–27500, 2023

    Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Advances in Neural Information Processing Systems (NeurIPS 2023), 36:27469–27500, 2023

  13. [13]

    Proceedings of the 2018

    Taku Kudo and John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Eduardo Blanco and Wei Lu, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018. Association for Co...

  14. [14]

    Costello.Error Control Coding: Fundamentals and Applications

    Shu Lin and Daniel J. Costello.Error Control Coding: Fundamentals and Applications. Prentice Hall, Upper Saddle River, NJ, 2 edition, 2004. ISBN 0-13-042672-6. URL https://www.pearson.com/en-us/subject-catalog/p/error-control-coding/ P200000003536/9780130426727

  15. [15]

    Manning, and Chelsea Finn

    Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023), pages 24950–24962. PMLR, 2023

  16. [16]

    Morris, Eli Lifland, Jin Yong Yoo, and Yanjun Qi

    John X. Morris, Eli Lifland, Jin Yong Yoo, and Yanjun Qi. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. InProceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations, pages 119–126, 2020

  17. [17]

    Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. MarkLLM: An open-source toolkit for LLM watermarking. In Delia Irazu Hernandez Farias, Tom Hope, and Manling Li, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processin...

  18. [18]

    Bleu: a method for automatic evaluation of machine translation

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association...

  19. [19]

    PyTorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, An- dreas Kopf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An imperative style, high-perf...

  20. [20]

    Provably robust multi-bit watermarking for {AI-generated} text

    Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, Zhihua Tian, Wei Zou, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for {AI-generated} text. In34th USENIX Security Symposium (USENIX Security 25), pages 201–220, 2025

  21. [21]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020. 11

  22. [22]

    Cambridge university press, 2008

    Tom Richardson and Rudiger Urbanke.Modern coding theory. Cambridge university press, 2008

  23. [23]

    Necessary and sufficient watermark for large language models.arXiv preprint arXiv:2310.00833, 2023

    Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, and Makoto Yamada. Necessary and sufficient watermark for large language models.arXiv preprint arXiv:2310.00833, 2023

  24. [24]

    ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations

    John Wieting and Kevin Gimpel. ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 451–462, 2018. URL https: //arxiv.org/abs/1711.05732

  25. [25]

    Transformers: State- of-the-art natural language processing

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. Transformers: State- of-the-art natural language processing. InProceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

  26. [26]

    Attacking neural text detectors.arXiv preprint arXiv:2002.11768, 2020

    Max Wolff and Stuart Wolff. Attacking neural text detectors.arXiv preprint arXiv:2002.11768, 2020

  27. [27]

    A resilient and accessible distribution-preserving watermark for large language models

    Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang. A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (ICML 2024), volume 235 ofProceedings of Machine Learning Research, pages 53443–53470. PMLR, 2024

  28. [28]

    Advancing beyond identification: Multi- bit watermark for large language models

    KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi- bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055, 2024

  29. [29]

    OPT: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. OPT: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

  30. [30]

    Weinberger, and Yoav Artzi

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. BERTScore: Evaluating text generation with BERT, 2020. URLhttps://arxiv.org/abs/1904.09675

  31. [31]

    red/green

    Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text. InProceedings of the 12th International Conference on Learning Representations (ICLR 2024), 2024. A Baseline Full Details A.1 Block-wise Designated-Codeword Watermarking This appendix summarizes the generic block-wise designated-codeword waterm...

  32. [32]

    Justification: The paper does not involve crowdsourcing, user studies, surveys, or research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...