pith. sign in

arxiv: 2606.09854 · v1 · pith:UBYHVKEHnew · submitted 2026-05-20 · 💻 cs.CL · cs.AI· cs.CY· cs.LG

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

Pith reviewed 2026-06-30 17:40 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CYcs.LG
keywords stylometric fingerprintingmulti-agent LLMsmodel attributionanonymizationpolitical text analysisstatement-disjoint cross-validationT5 classifier
0
0 comments X

The pith

LLMs can identify the originating model family of anonymized political texts via stylometric signals that survive prompt anonymization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether prompt-level anonymization prevents multi-agent LLMs from recognizing peer models in political statement analysis. It evaluates zero-shot, few-shot, and fine-tuned classifiers on a five-class attribution task using outputs from four commercial LLM families plus an unknown class. A statement-disjoint cross-validation protocol ensures no content overlap between training and test data. The fine-tuned T5 model reaches Macro F1 of 0.991 under this protocol and 0.978 on fully held-out statements, showing that stylometric patterns persist independently of semantic content. This result indicates that simple anonymization does not remove model identity cues in role-constrained political writing.

Core claim

A fine-tuned T5-base classifier attributes anonymized political analysis texts to their source LLM family with Macro F1 of 0.991 under statement-disjoint cross-validation and F1 of 0.978 on 24 completely held-out statements. Performance remains high despite a statistically significant 2.1 times increase in content distance between training and test sets compared with a run-disjoint baseline, confirming that attribution relies on stylometric generalization rather than content leakage.

What carries the argument

The statement-disjoint cross-validation (SD-CV) protocol that partitions data so no individual statement appears in both training and validation folds while preserving role-constrained output style.

If this is right

  • Prompt-level anonymization alone leaves model identity detectable in role-constrained political outputs.
  • Multi-agent LLM pipelines for political analysis remain exposed to identity-dependent scoring distortions.
  • Compliance with transparency and oversight rules such as EU AI Act Articles 13, 14 and 26 requires measures beyond anonymization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar stylometric attribution may succeed in other constrained domains such as legal or medical report generation.
  • System designers could add explicit style-suppression objectives during fine-tuning to reduce fingerprint strength.
  • Validation protocols for multi-agent deployments might routinely include stylometric hold-out tests before production use.

Load-bearing premise

The statement-disjoint cross-validation protocol fully removes any possibility that high attribution scores come from shared content rather than writing style.

What would settle it

Retraining the T5 classifier on a new collection of political statements drawn from entirely different topics and sources while keeping the same anonymization prompts, then measuring whether Macro F1 remains above 0.95.

Figures

Figures reproduced from arXiv: 2606.09854 by Juergen Dietrich.

Figure 1
Figure 1. Figure 1: Training data sufficiency analysis (FracXVal). (a) Run-Disjoint Cross￾Validation (RD-CV): overall and Gemini-specific learning curves across training frac￾tions. The 80% data point is based on a single run (no SD available). (b) Statement￾Disjoint Cross-Validation (SD-CV): same metrics under content-disjoint evaluation. In both panels, filled markers show mean ± SD across 5 folds; stars (⋆) denote held-out… view at source ↗
Figure 2
Figure 2. Figure 2: TF-IDF embedding similarity analysis. Left: Similarity distributions (intra￾statement vs. inter-statement vs. inter-class). Centre: Box plots by pair type confirm￾ing refutation of H5 (null hypothesis that intra-statement similarity ≈ inter-statement similarity; see Section 4.5); ∆ = +0.484, p < 0.001. Right: Train-test nearest-neighbour distance distributions for RD-CV (µ=0.366) and SD-CV (µ=0.767), confi… view at source ↗
Figure 3
Figure 3. Figure 3: Per-model stylometric fingerprint analysis. Left: Intra-statement vs. inter￾statement cosine similarity by model. Centre: Fingerprint strength (∆ intra−inter) vs. T5 held-out F1—Gemini shows lowest ∆ and lowest F1, consistent with H7 (ex￾ploratory: contextual fingerprint). Right: Intra-statement similarity distributions per model (lower µ = more variable style). Abbreviations: H7: exploratory hypothesis (G… view at source ↗
read the original abstract

Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymization was proposed as a mitigation, but prior work simultaneously documented that stylometric fingerprints survive anonymization in role-constrained outputs - raising the question of whether this mitigation is sufficient. This paper provides the first systematic investigation of whether LLMs can identify the model family behind political analysis texts under anonymization conditions. We evaluate three classifier approaches - LLM zero-shot and few-shot (Claude Sonnet 4.6 and Llama-3.3-70B) and a fine-tuned T5-base model - on a five-class attribution task covering four commercial LLM families and an open-world 'unknown' class. We introduce a statement-disjoint cross-validation protocol (SD-CV; defined in Section 3.5) that guarantees no content overlap between training and validation data, and contrast it with a run-disjoint baseline (RD-CV). T5 achieves Macro F1 = 0.991 (+-0.008) under SD-CV and F1 = 0.978 on 24 completely held-out statements - robust despite a 2.1x increase in train-test content distance versus RD-CV (0.767 vs. 0.366, p<0.001), demonstrating genuine stylometric generalization. A fractional SD-CV analysis identifies a performance knee at 40% of training data (~440 texts). Our findings confirm that prompt-level anonymization alone cannot neutralize model identity signals, with direct implications for EU AI Act compliance (Articles 13, 14, 26) and for computer system validation (CSV) in quality-critical multi-agent deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that fine-tuned T5-base achieves Macro F1 of 0.991 (±0.008) under statement-disjoint cross-validation (SD-CV) and 0.978 on 24 held-out statements for five-class attribution of anonymized LLM-generated political texts, remaining robust despite a 2.1× increase in train-test content distance (0.767 vs. 0.366) relative to run-disjoint CV; this is taken to demonstrate genuine stylometric generalization beyond content leakage, with implications for multi-agent system design and EU AI Act compliance.

Significance. If the SD-CV protocol and distance metric adequately isolate stylometry, the result would establish that prompt-level anonymization is insufficient to remove model-family signals in role-constrained political analysis, directly affecting validation practices in quality-critical deployments. The fractional SD-CV analysis identifying a performance knee at ~40% training data (~440 texts) and the use of error bars with statistical significance testing add empirical value by quantifying data efficiency and robustness.

major comments (2)
  1. [Section 3.5] Section 3.5: The central claim that SD-CV produces 'genuine stylometric generalization' rests on the content distance metric showing a 2.1× increase while T5 retains high F1; however, the manuscript does not validate that this metric (whatever its exact formulation) is a tight proxy for semantic features an attribution model could exploit, such as shared argument structures, entity distributions, or topic-specific political phrasing across LLM outputs.
  2. [Section 3.5] Section 3.5 and results on held-out statements: The 24 completely held-out statements yielding F1=0.978 are load-bearing for the out-of-distribution generalization claim, yet no details are provided on whether their generation prompts, role constraints, or anonymization exactly match the SD-CV training distribution or whether any filtering/selection occurred post-generation.
minor comments (2)
  1. The abstract and Section 3.5 introduce SD-CV and RD-CV without an early, self-contained definition of the content distance metric itself, forcing readers to infer its construction from the reported values.
  2. The four commercial LLM families are referenced but not enumerated in the abstract or early methods overview, which would improve readability for the five-class task description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major comment below and indicate where revisions will be made to improve the manuscript.

read point-by-point responses
  1. Referee: [Section 3.5] Section 3.5: The central claim that SD-CV produces 'genuine stylometric generalization' rests on the content distance metric showing a 2.1× increase while T5 retains high F1; however, the manuscript does not validate that this metric (whatever its exact formulation) is a tight proxy for semantic features an attribution model could exploit, such as shared argument structures, entity distributions, or topic-specific political phrasing across LLM outputs.

    Authors: We acknowledge the point that the content distance metric requires clearer justification as a proxy. The metric is defined in Section 3.5 as the average cosine distance between sentence embeddings (from a fixed pre-trained encoder) of train and test statements. While this captures broad semantic divergence and correlates with the observed performance gap between RD-CV and SD-CV, it does not explicitly test for argument structures or entity distributions. In revision we will (a) provide the precise embedding model and aggregation formula, (b) add a short analysis comparing the metric to n-gram overlap and entity Jaccard similarity on the same splits, and (c) include a limitations paragraph noting that residual content leakage via higher-order features cannot be ruled out by distance alone. These changes clarify rather than overstate the evidence for stylometric generalization. revision: partial

  2. Referee: [Section 3.5] Section 3.5 and results on held-out statements: The 24 completely held-out statements yielding F1=0.978 are load-bearing for the out-of-distribution generalization claim, yet no details are provided on whether their generation prompts, role constraints, or anonymization exactly match the SD-CV training distribution or whether any filtering/selection occurred post-generation.

    Authors: The 24 held-out statements were produced with identical prompt templates, role constraints, and anonymization instructions as the SD-CV training data; generation occurred in a single additional run with no post-hoc filtering or selection. We will insert these procedural details into Section 3.5 and the dataset description to make the distributional match explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation uses explicit held-out splits

full rationale

The paper reports classifier performance (Macro F1 0.991 under SD-CV, 0.978 on held-out statements) measured on statement-disjoint data with an independent content-distance metric. No step reduces a claimed prediction to a fitted parameter by construction, nor relies on self-citation chains or ansatzes imported from prior author work. The SD-CV protocol and distance comparison are defined externally to the model outputs, so the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions in supervised text classification that stylometric features generalize across content-disjoint splits and that the generated political statements are representative of real deployment conditions.

axioms (1)
  • domain assumption Stylometric signals in LLM-generated text are consistent enough to support family-level attribution even after anonymization prompts
    Invoked as the basis for the five-class attribution task and the reported generalization under SD-CV.

pith-pipeline@v0.9.1-grok · 5867 in / 1165 out tokens · 41737 ms · 2026-06-30T17:40:19.186387+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 8 canonical work pages · 5 internal anchors

  1. [1]

    arXiv:2506.17323 (2025),https://arxiv.org/abs/2506.17323(accessed 11 May 2026)

    Bisztray, T., et al.: Code stylometry for LLM authorship attribution. arXiv:2506.17323 (2025),https://arxiv.org/abs/2506.17323(accessed 11 May 2026)

  2. [2]

    Choi, H.K., Zhu, X., Li, S.: When identity skews debate: Anonymization for bias- reducedmulti-agentreasoning.arXiv:2510.07517(2025),https://arxiv.org/abs/ 2510.07517(accessed 11 May 2026)

  3. [3]

    From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis

    Dietrich, J.: From safety risk to design principle: Peer-preservation in multi-agent LLM systems and its implications for orchestrated democratic discourse analysis. arXiv:2604.08465 [cs.AI] (2026),https://arxiv.org/abs/2604.08465(accessed 11 May 2026)

  4. [4]

    Peer Identity Bias in Multi-Agent LLM Evaluation: An Empirical Study Using the TRUST Democratic Discourse Analysis Pipeline

    Dietrich, J.: Peer identity bias in multi-agent LLM evaluation: An empirical study using the TRUST pipeline. arXiv:2604.22971 [cs.AI] (2026),https://arxiv.org/ abs/2604.22971(accessed 11 May 2026)

  5. [5]

    When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

    Dietrich, J.: When roles fail: Epistemic constraints on advocate role fidelity in LLM-based political statement analysis. arXiv:2604.27228 [cs.AI] (2026),https: //arxiv.org/abs/2604.27228(accessed 11 May 2026)

  6. [6]

    Drug Safety48, 287–303 (2025)

    Dietrich, J., Hollstein, A.: Performance and reproducibility of LLMs in named entity recognition. Drug Safety48, 287–303 (2025)

  7. [7]

    Drug Safety46(8), 735–750 (2023)

    Dietrich, J., Kazzer, P.: Fractional stratifiedk-fold cross-validation for training data sufficiency in computer system validation. Drug Safety46(8), 735–750 (2023)

  8. [8]

    In: Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

    Du, Y., et al.: Improving factuality and reasoning through multiagent debate. In: Proceedings of the 41st International Conference on Machine Learning (ICML 2024). PMLR, vol. 235, pp. 11733–11763 (2024)

  9. [9]

    European Parliament: Regulation (EU) 2024/1689 on artificial intelligence. Tech. rep., Official Journal of the European Union (2024),https://eur-lex.europa. eu/legal-content/EN/TXT/?uri=CELEX:32024R1689(accessed 11 May 2026)

  10. [10]

    Proceedings of the National Academy of Sciences (2025)

    Guo, M., et al.: Do LLMs write like humans? variation in grammatical and rhetor- ical styles. Proceedings of the National Academy of Sciences (2025)

  11. [11]

    Journal of the American Society for Information Science and Technology 60(1), 9–26 (2009)

    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attri- bution. Journal of the American Society for Information Science and Technology 60(1), 9–26 (2009)

  12. [12]

    Berkeley Center for Responsible Decentralized Intelligence, UC Berkeley / UC Santa Cruz (2026),https://rdi.berkeley.edu/blog/peer-preservation/ (accessed 11 May 2026)

    Potter, Y., Crispino, N., Siu, V., Wang, C., Song, D.: Peer-preservation in frontier models. Berkeley Center for Responsible Decentralized Intelligence, UC Berkeley / UC Santa Cruz (2026),https://rdi.berkeley.edu/blog/peer-preservation/ (accessed 11 May 2026)

  13. [13]

    Expert Systems with Applications296, 129001 (2026), arXiv:2507.00838 (accessed 11 May 2026)

    Przystalski, K., Argasiński, J.K., Grabska-Gradzińska, I., Ochab, J.K.: Stylometry recognizes human and LLM-generated texts in short samples. Expert Systems with Applications296, 129001 (2026), arXiv:2507.00838 (accessed 11 May 2026)

  14. [14]

    Towards Understanding Sycophancy in Language Models

    Sharma, M., et al.: Towards understanding sycophancy in language models. arXiv:2310.13548 (2023),https://arxiv.org/abs/2310.13548(accessed 11 May 2026)

  15. [15]

    Raising the minimum wage to $20/hour will reduce poverty without significant job losses

    Tihanyi, N., Cherif, B., Dubniczky, R.A., Ferrag, M.A., Bisztray, T.: The hid- den DNA of LLM-generated JavaScript: Structural patterns enable high-accuracy authorship attribution. arXiv:2510.10493 (2025),https://arxiv.org/abs/2510. 10493(accessed 11 May 2026) 20 J. Dietrich A Statement Dataset The complete set of 55 political statements used to generate ...