arxiv: 2605.10977 · v1 · submitted 2026-05-09 · 💻 cs.CR · cs.AI

Recognition: no theorem link

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

Zhenxin Ai , Haiyun He

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:09 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM watermarkingsemantic embedding spaceparaphrasing attackstext detectionrobust watermarkingdistributional dependencydistortion-freesemantic-invariant attacks

0 comments

The pith

PASA embeds watermarks in LLM semantic embedding space to detect generated text after paraphrasing without distorting output.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to prove that watermarking LLM output at the semantic level, rather than the token level, can survive attacks that rewrite text while preserving meaning. It does so by grouping tokens into semantic clusters in latent space and tying their statistics to an auxiliary sequence through randomness shared via a secret key and the semantic history of the generation. A sympathetic reader would care because current watermark detectors lose reliability once text is paraphrased, yet responsible use of LLMs requires dependable detection without forcing writers to accept lower-quality output. If the method works, detection becomes possible even on heavily rewritten passages while the original text remains statistically indistinguishable from human writing.

Core claim

PASA constructs a distributional dependency between token sequences and auxiliary sequences by synchronizing randomness with a secret key and semantic history inside semantic clusters of the latent embedding space. This construction is derived from a theoretical characterization of jointly optimal embedding and detection functions that balance detection accuracy, robustness to semantic-invariant changes, and zero distortion. Experiments on multiple LLMs show the resulting watermark survives strong paraphrasing attacks at higher rates than vocabulary-space baselines while leaving text quality unchanged.

What carries the argument

Semantic clusters in the latent embedding space together with shared-randomness distributional dependency synchronized by secret key and semantic history, which enables joint optimization of embedding and detection.

If this is right

Detection accuracy stays high after semantic-preserving rewrites that defeat token-level methods.
Generated text quality remains comparable to unwatermarked output because no token bias is introduced.
The theoretical trade-off surface among accuracy, robustness, and distortion is achieved by the embedding-detection pair.
Hyperparameter choices validated by ablation directly support the observed robustness without quality loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the synchronization mechanism holds, similar semantic-level dependencies could be applied to other generative models where meaning must survive transformation.
The approach implies that watermark verification can be performed on rewritten text without needing the original prompt or intermediate tokens.
Success here would motivate checking whether the same cluster-and-dependency pattern reduces false positives when watermarking is combined with other detection signals.

Load-bearing premise

Semantic clusters can be formed reliably in the embedding space and the synchronized randomness produces a distributional dependency that delivers the stated optimality and robustness without creating detectable artifacts or exploitable weaknesses.

What would settle it

Running the strongest paraphrasing attack described in the paper on PASA-watermarked text and finding detection accuracy no higher than that of a standard vocabulary-space watermark, or finding statistical patterns in the output that reveal the watermark without knowledge of the secret key.

Figures

Figures reproduced from arXiv: 2605.10977 by Haiyun He, Zhenxin Ai.

**Figure 1.** Figure 1: Left: Illustration of PASA, a principled watermarking approach operating in the latent embedding space on semantic clusters. By anchoring shared randomness to semantic clusters via a secret key, PASA remains robust against semantic-invariant attacks (e.g., paraphrasing) while ensuring distortion-free generation. Right: Quantitative results demonstrating that PASA outperforms standard vocabulary-space water… view at source ↗

**Figure 2.** Figure 2: Overview of PASA. Left: Construction of the semantic mapping function f, which partitions the latent token embedding space into K semantic clusters. Right: Top (Generation). (G1) At each step t, the NTP distribution Qt is transformed into the cluster distribution Q f t . (G2) The auxiliary distribution Pζt is truncated by a threshold α and contains an overflow state ˜ζ to ensure FA error control. (G3) Auxi… view at source ↗

**Figure 3.** Figure 3: Ablation study on hyper-parameters. (a) Impact of semantic cluster granularity (K) on robustness across log-scale cluster counts. (b) Impact of synchronization window size (w) on robustness. The plots compare the baseline (Original) against T5-based token replacement attacks (r = 0.3, 0.5). generations as well; see Appendix A. Computational Efficiency. To quantify runtime overhead, we measure average laten… view at source ↗

**Figure 4.** Figure 4: Detection performance across various generated text lengths. The ROC-AUC and True Positive Rate (TPR) exhibit rapid convergence, achieving near-perfect detection beyond 300 tokens. Generalization Analysis on the ELI5 Dataset. The ELI5 dataset is designed for long-form question answering, requiring models to produce detailed explanations for complex queries. We use this dataset to evaluate the generalizatio… view at source ↗

read the original abstract

Watermarking for large language models (LLMs) is a promising approach for detecting LLM-generated text and enabling responsible deployment. However, existing watermarking methods are often vulnerable to semantic-invariant attacks, such as paraphrasing. We propose PASA, a principled, robust, and distortion-free watermarking algorithm that embeds and detects a watermark at the semantic level. PASA operates on semantic clusters in a latent embedding space and constructs a distributional dependency between token and auxiliary sequences via shared randomness synchronized by a secret key and semantic history. This design is grounded in our theoretical framework that characterizes a jointly optimal embedding-detection pair, achieving the fundamental trade-offs among detection accuracy, robustness, and distortion. Evaluations across multiple LLMs and semantic-invariant attacks demonstrate that PASA remains robust even under strong paraphrasing attacks while preserving high text quality, outperforming standard vocabulary-space baselines. Ablation studies further validate the effectiveness of our hyperparameter choices. Webpage: https://ai-kunkun.github.io/PASA_page/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PASA moves watermarking into semantic embedding clusters with shared-randomness dependencies to target paraphrasing robustness, but the abstract supplies no numbers or derivations to check the optimality claims.

read the letter

PASA's main contribution is shifting the watermark from vocabulary space into semantic clusters in the latent embedding space, then building a distributional link between the token sequence and an auxiliary sequence through randomness that is synchronized by a secret key plus semantic history. The authors frame this as coming from a theoretical characterization of jointly optimal embedding-detection pairs that respect the accuracy-robustness-distortion trade-off. That framing is a step beyond the usual heuristic tweaks in the literature, and the idea of operating at the semantic level makes sense as a response to paraphrasing attacks that leave meaning intact but scramble token-level signals. If the full derivation actually derives the optimality conditions rather than retrofitting them to chosen cluster sizes and randomness parameters, it could give the field a cleaner way to reason about these methods. The abstract also notes evaluations across LLMs, multiple semantic-invariant attacks, and hyperparameter ablations, which at least shows they tried to test the practical side rather than stopping at the theory sketch. The direction is timely for anyone working on detection and accountability for generated text. That said, the summary contains zero quantitative results—no detection rates, no distortion scores, no attack-strength details, no error bars—so it is impossible to tell whether the robustness gain is real or modest. The reliability of the semantic clusters themselves is left unexamined in the provided material; if embeddings do not form stable, model-agnostic clusters, the whole construction could degrade or introduce new artifacts. The shared-randomness synchronization also needs explicit checks that it does not leak information or create detectable statistical patterns under the very attacks it claims to resist. This paper is for researchers already following watermarking and semantic-attack work. A reader looking for a new conceptual handle on the robustness problem will find the angle worth examining, even if the current write-up leaves the claims under-supported. It is solid enough in its novelty and problem framing to merit peer review rather than a desk reject, provided the authors supply the missing metrics, derivations, and implementation specifics in revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes PASA, a watermarking algorithm for LLM-generated text that embeds and detects watermarks in a latent embedding space using semantic clusters. It constructs distributional dependencies via shared randomness synchronized by a secret key and semantic history, grounded in a theoretical framework characterizing jointly optimal embedding-detection pairs that trade off detection accuracy, robustness, and distortion. Evaluations across LLMs and semantic-invariant attacks (including strong paraphrasing) claim superior robustness and text quality compared to vocabulary-space baselines, with ablations validating hyperparameter choices.

Significance. If the theoretical optimality derivation holds and the reported robustness metrics are reproducible under the described attack strengths, PASA would represent a meaningful advance in LLM watermarking by addressing the vulnerability of prior methods to paraphrasing and other semantic-preserving transformations. The embedding-space approach and explicit focus on joint optimality are strengths that could inform future designs.

major comments (2)

[§3.1–3.3] §3.1–3.3 (theoretical framework): the characterization of jointly optimal embedding-detection pairs relies on semantic cluster construction and randomness synchronization; the derivation should explicitly show whether optimality is parameter-free or reduces to choices of cluster granularity and history window length, as these appear among the free parameters.
[§4.3] §4.3 (experimental results on paraphrasing): the claim of remaining robust under strong paraphrasing requires quantitative attack details (e.g., semantic similarity thresholds, paraphrase model, number of rewrites) and effect sizes with error bars; without these, the outperformance over vocabulary baselines cannot be fully assessed as load-bearing evidence.

minor comments (2)

[Abstract] Abstract: quantitative metrics, specific LLMs tested, and attack strengths are referenced but not summarized; adding one sentence with key numbers would improve clarity.
[§5] §5 (ablations): ensure all tested hyperparameter ranges and the exact cluster construction algorithm (including any embedding model) are listed for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate to improve clarity and completeness.

read point-by-point responses

Referee: [§3.1–3.3] §3.1–3.3 (theoretical framework): the characterization of jointly optimal embedding-detection pairs relies on semantic cluster construction and randomness synchronization; the derivation should explicitly show whether optimality is parameter-free or reduces to choices of cluster granularity and history window length, as these appear among the free parameters.

Authors: We appreciate the referee highlighting this aspect of the theoretical framework. The derivation of jointly optimal embedding-detection pairs is performed conditionally on a fixed semantic cluster granularity and history window length; these are treated as design hyperparameters that control the granularity of the semantic partitioning and the extent of distributional dependence. The optimality result characterizes the fundamental trade-offs for any given choice of these parameters rather than claiming parameter-free optimality. In the revised manuscript we will add an explicit statement in §3 clarifying this conditional nature and include a brief discussion of how varying cluster granularity and window length affect the achievable accuracy-robustness-distortion frontier. revision: yes
Referee: [§4.3] §4.3 (experimental results on paraphrasing): the claim of remaining robust under strong paraphrasing requires quantitative attack details (e.g., semantic similarity thresholds, paraphrase model, number of rewrites) and effect sizes with error bars; without these, the outperformance over vocabulary baselines cannot be fully assessed as load-bearing evidence.

Authors: We agree that additional quantitative details are required to make the robustness claims fully reproducible and to allow readers to assess the strength of the reported outperformance. The current manuscript describes the paraphrasing attacks at a high level but does not enumerate the exact paraphrase model, similarity thresholds, number of rewrites, or report error bars. In the revision we will expand §4.3 (and the experimental setup subsection) to specify the paraphrase model, the semantic similarity thresholds employed, the number of rewrites applied, and to present all detection metrics with error bars computed across multiple independent runs. These additions will enable direct evaluation of the evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical framework presented as independent grounding

full rationale

The abstract grounds the PASA design in a theoretical framework characterizing jointly optimal embedding-detection pairs and trade-offs among accuracy, robustness, and distortion. No equations or self-citations are supplied in the given material that would reduce this framework to a redefinition of the algorithm's own cluster-construction or randomness parameters. Evaluations on multiple LLMs and attacks are described as external validation, with no indication that predictions reduce by construction to fitted inputs or prior self-citations. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Abstract-only view yields limited visibility into parameters; the method relies on semantic embedding clusters and synchronized randomness whose precise definitions and any fitting procedures are not detailed.

free parameters (2)

semantic cluster construction parameters
Hyperparameters that define clusters in the embedding space are required for the method but not quantified in the abstract.
randomness synchronization threshold or history window
Parameters controlling how semantic history synchronizes the shared randomness are implicit in the design.

axioms (2)

domain assumption Semantic clusters in the latent embedding space exist and can be reliably identified across paraphrases
The core operation of PASA presupposes stable semantic clustering that survives semantic-invariant attacks.
domain assumption A jointly optimal embedding-detection pair exists and is characterized by the theoretical framework
The paper states the design is grounded in this framework without providing the derivation in the abstract.

pith-pipeline@v0.9.0 · 5475 in / 1529 out tokens · 62455 ms · 2026-05-13T01:09:25.478980+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 8 internal anchors

[1]

Watermarking of large language models

Aaronson, S. Watermarking of large language models. https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17, 2023. Accessed: 2023-08

work page 2023
[2]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report. https://arxiv.org/abs/2303.08774, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

and Pannilage, S

Balalle, H. and Pannilage, S. Reassessing academic integrity in the age of ai: A systematic literature review on ai and academic integrity. Social Sciences & Humanities Open, 11: 0 101299, 2025

work page 2025
[4]

Gpt-neox-20b: An open-source autoregressive language model

Black, S., Biderman, S., Hallahan, E., Anthony, Q., Gao, L., Golding, L., He, H., Leahy, C., McDonell, K., Phang, J., et al. Gpt-neox-20b: An open-source autoregressive language model. In Proceedings of BigScience Episode\# 5--Workshop on Challenges & Perspectives in Creating Large Language Models, 2022

work page 2022
[5]

Towards Better Statistical Understanding of Watermarking LLMs

Cai, Z., Liu, S., Wang, H., Zhong, H., and Li, X. Towards better statistical understanding of watermarking llms. arXiv preprint arXiv:2403.13027, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Scalable watermarking for identifying large language model outputs

Dathathri, S., See, A., Ghaisas, S., Huang, P.-S., McAdam, R., Welbl, J., Bachani, V., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. Scalable watermarking for identifying large language model outputs. Nature, 2024

work page 2024
[7]

Can mllms guide me home? a benchmark study on fine-grained visual reasoning from transit maps

Feng, S., Wang, S., Ouyang, S., Kong, L., Song, Z., Zhu, J., Wang, H., and Wang, X. Can mllms guide me home? a benchmark study on fine-grained visual reasoning from transit maps. arXiv preprint arXiv:2505.18675, 2025

work page arXiv 2025
[8]

G umbel S oft: Diversified language model watermarking via the G umbel M ax-trick

Fu, J., Zhao, X., Yang, R., Zhang, Y., Chen, J., and Xiao, Y. G umbel S oft: Diversified language model watermarking via the G umbel M ax-trick. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024 a

work page 2024
[9]

Watermarking conditional text generation for ai detection: unveiling challenges and a semantic-aware watermark remedy

Fu, Y., Xiong, D., and Dong, Y. Watermarking conditional text generation for ai detection: unveiling challenges and a semantic-aware watermark remedy. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advan...

work page 2024
[10]

and Furon, T

Giboulot, E. and Furon, T. Watermax: breaking the LLM watermark detectability-robustness-quality trade-off. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[11]

L., Liang, P., and Hashimoto, T

Gu, C., Li, X. L., Liang, P., and Hashimoto, T. On the learnability of watermarks for language models. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[12]

Gumbel, E. J. Statistical theory of extreme values and some practical applications: a series of lectures, volume 33. US Government Printing Office, 1954

work page 1954
[13]

Context-aware watermark with semantic balanced green-red lists for large language models

Guo, Y., Tian, Z., Song, Y., Liu, T., Ding, L., and Li, D. Context-aware watermark with semantic balanced green-red lists for large language models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

work page 2024
[14]

Large language models can be used to effectively scale spear phishing campaigns

Hazell, J. Spear phishing with large language models. arXiv preprint arXiv:2305.06972, 2023

work page arXiv 2023
[15]

Theoretically grounded framework for LLM watermarking: A distribution-adaptive approach

He, H., Liu, Y., Wang, Z., Mao, Y., and Bu, Y. Theoretically grounded framework for LLM watermarking: A distribution-adaptive approach. In The 1st Workshop on GenAI Watermarking, 2025

work page 2025
[16]

Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models

He, Z., Zhou, B., Hao, H., Liu, A., Wang, X., Tu, Z., Zhang, Z., and Wang, R. Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

work page 2024
[17]

S em S tamp: A semantic watermark with paraphrastic robustness for text generation

Hou, A., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Van Durme, B., Khashabi, D., and Tsvetkov, Y. S em S tamp: A semantic watermark with paraphrastic robustness for text generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume ...

work page 2024
[18]

k- S em S tamp: A clustering-based semantic watermark for detection of machine-generated text

Hou, A., Zhang, J., Wang, Y., Khashabi, D., and He, T. k- S em S tamp: A clustering-based semantic watermark for detection of machine-generated text. In Findings of the Association for Computational Linguistics: ACL 2024, 2024 b

work page 2024
[19]

D., Jiao, J., and Jordan, M

Huang, B., Zhu, B., Zhu, H., Lee, J. D., Jiao, J., and Jordan, M. I. Towards optimal statistical watermarking. arXiv preprint arXiv:2312.07930, 2023

work page arXiv 2023
[20]

From clip to dino: Visual encoders shout in multi-modal large language models.arXiv preprint arXiv:2310.08825, 2023

Jiang, D., Liu, Y., Liu, S., Zhao, J., Zhang, H., Gao, Z., Zhang, X., Li, J., and Xiong, H. From clip to dino: Visual encoders shout in multi-modal large language models. arXiv preprint arXiv:2310.08825, 2023

work page arXiv 2023
[21]

Mergemix: A unified augmentation paradigm for visual and multi-modal understanding

Jin, X., Li, S., Jian, S., Yu, K., and Wang, H. Mergemix: A unified augmentation paradigm for visual and multi-modal understanding. arXiv preprint arXiv:2510.23479, 2025

work page arXiv 2025
[22]

A watermark for large language models

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, 2023

work page 2023
[23]

On the reliability of watermarks for large language models

Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., and Goldstein, T. On the reliability of watermarks for large language models. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[24]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense

Krishna, K., Song, Y., Karpinska, M., Wieting, J., and Iyyer, M. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. Advances in Neural Information Processing Systems, 36, 2023

work page 2023
[25]

Robust distortion-free watermarks for language models

Kuditipudi, R., Thickstun, J., Hashimoto, T., and Liang, P. Robust distortion-free watermarks for language models. Transactions on Machine Learning Research, 2024

work page 2024
[26]

Li, X., Ruan, F., Wang, H., Long, Q., and Su, W. J. A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules. The Annals of Statistics, 53 0 (1): 0 322--351, 2025

work page 2025
[27]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Li, Z., Zhang, X., Zhang, Y., Long, D., Xie, P., and Zhang, M. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

A semantic invariant robust watermark for large language models

Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. A semantic invariant robust watermark for large language models. In The Twelfth International Conference on Learning Representations, 2024 a

work page 2024
[29]

A semantic invariant robust watermark for large language models

Liu, A., Pan, L., Hu, X., Meng, S., and Wen, L. A semantic invariant robust watermark for large language models. In International Conference on Learning Representations, 2024 b

work page 2024
[30]

A survey of text watermarking in the era of large language models

Liu, A., Pan, L., Lu, Y., Li, J., Hu, X., Zhang, X., Wen, L., King, I., Xiong, H., and Yu, P. A survey of text watermarking in the era of large language models. ACM Comput. Surv., 57 0 (2), 2024 c

work page 2024
[31]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[32]

and Bu, Y

Liu, Y. and Bu, Y. Adaptive text watermark for large language models. In Proceedings of the 41st International Conference on Machine Learning, 2024

work page 2024
[33]

Least squares quantization in pcm

Lloyd, S. Least squares quantization in pcm. IEEE transactions on information theory, 28 0 (2): 0 129--137, 1982

work page 1982
[34]

The threat of offensive ai to organizations

Mirsky, Y., Demontis, A., Kotak, J., Shankar, R., Gelei, D., Yang, L., Zhang, X., Pintor, M., Lee, W., Elovici, Y., et al. The threat of offensive ai to organizations. Computers & Security, 124: 0 103006, 2023

work page 2023
[35]

Language models are unsupervised multitask learners

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019

work page 2019
[36]

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21 0 (1), 2020

work page 2020
[37]

Enhancing LLM watermark resilience against both scrubbing and spoofing attacks

Shen, H., Huang, B., and Wan, X. Enhancing LLM watermark resilience against both scrubbing and spoofing attacks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[38]

Necessary and sufficient watermark for large language models

Takezawa, Y., Sato, R., Bao, H., Niwa, K., and Yamada, M. Necessary and sufficient watermark for large language models. arXiv preprint arXiv:2310.00833, 2023

work page arXiv 2023
[39]

Lvomnibench: Pioneering long audio-video understanding evaluation for omnimodal llms

Tao, K., Zheng, Y., Xu, J., Du, W., Shao, K., Wang, H., Chen, X., Jin, X., Zhu, J., Yu, B., et al. Lvomnibench: Pioneering long audio-video understanding evaluation for omnimodal llms. arXiv preprint arXiv:2603.19217, 2026

work page arXiv 2026
[40]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozi \` e re, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., and Lample, G. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

Disinformation capabilities of large language models

Vykopal, I., Pikuliak, M., Srba, I., Moro, R., Macko, D., and Bielikova, M. Disinformation capabilities of large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 14830--14847, 2024

work page 2024
[42]

Optimizing watermarks for large language models

Wouters, B. Optimizing watermarks for large language models. In International Conference on Machine Learning, pp.\ 53251--53269. PMLR, 2024

work page 2024
[43]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al. Qwen3 technical report. https://arxiv.org/abs/2505.09388, 2025 a

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Watermarking for large language models: A survey

Yang, Z., Zhao, G., and Wu, H. Watermarking for large language models: A survey. Mathematics, 13 0 (9), 2025 b

work page 2025
[45]

Cohemark: A novel sentence-level watermark for enhanced text quality

Zhang, J., Liu, S., Liu, A., Gao, Y., Li, J., Gu, X., and Hu, X. Cohemark: A novel sentence-level watermark for enhanced text quality. In The 1st Workshop on GenAI Watermarking, 2025 a

work page 2025
[46]

Poison as cure: Visual noise for mitigating object hallucinations in lvms

Zhang, K., Tao, K., Tang, J., and Wang, H. Poison as cure: Visual noise for mitigating object hallucinations in lvms. In NeurIPS, 2025 b

work page 2025
[47]

TinyLlama: An Open-Source Small Language Model

Zhang, P., Zeng, G., Wang, T., and Lu, W. Tinyllama: An open-source small language model. arXiv preprint arXiv:2401.02385, 2024 a

work page internal anchor Pith review arXiv 2024
[48]

S., Neekhara, P., and Koushanfar, F

Zhang, R., Hussain, S. S., Neekhara, P., and Koushanfar, F. REMARK-LLM : A robust and efficient watermarking framework for generative large language models. In 33rd USENIX Security Symposium (USENIX Security 24), 2024 b

work page 2024
[49]

V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P

Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., Sridhar, A., Wang, T., and Zettlemoyer, L. Opt: Open pre-trained transformer language models, 2022

work page 2022
[50]

V., Li, L., and Wang, Y.-X

Zhao, X., Ananth, P. V., Li, L., and Wang, Y.-X. Provable robust watermarking for AI -generated text. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[51]

Obs-diff: Accurate pruning for diffusion models in one-shot

Zhu, J., Wang, H., Su, M., Wang, Z., and Wang, H. Obs-diff: Accurate pruning for diffusion models in one-shot. arXiv preprint arXiv:2510.06751, 2025 a

work page arXiv 2025
[52]

Revisiting Image Manipulation Localization under Realistic Manipulation Scenarios

Zhu, X., Zhou, J.-Z., Feng, K., Qu, C., Wang, Y., Zhou, L., and Liu, J. Does the manipulation process matter? rita: Reasoning composite image manipulations via reversely-ordered incremental-transition autoregression. arXiv preprint arXiv:2509.20006, 2025 b

work page internal anchor Pith review Pith/arXiv arXiv 2025