Hierarchical Certified Semantic Commitment for Byzantine-Resilient LLM-Agent Collaboration

Haoran Xu; Iadh Ounis; Lei Zhang; Xianbin Wang

arxiv: 2606.07316 · v1 · pith:27WLEHMOnew · submitted 2026-06-05 · 💻 cs.MA · cs.AI· cs.DC

Hierarchical Certified Semantic Commitment for Byzantine-Resilient LLM-Agent Collaboration

Haoran Xu , Lei Zhang , Iadh Ounis , Xianbin Wang This is my paper

Pith reviewed 2026-06-27 20:19 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.DC

keywords Byzantine fault toleranceLLM agentssemantic commitmentmulti-agent systemstyped finalityByzantine-resilient collaborationembedding signals

0 comments

The pith

H-CSC turns embedding signals on verdict-conditioned LLM proposals into typed semantic commits, verdict commits, or explicit aborts under Byzantine faults.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Hierarchical Certified Semantic Commitment as a way for groups of LLM agents to reach finality decisions when some agents may be faulty or adversarial. It processes stochastic natural-language proposals by first grouping them by verdict and then applying embedding-based signals to determine whether a round supports a semantic core, a broader verdict margin, or requires an abort with a stated reason. This matters for readers because classical BFT assumes byte-identical messages that LLM outputs do not provide, while simple aggregation conceals whether agreement rests on aligned semantics or merely on verdict alignment. If the approach holds, multi-agent systems gain provenance that distinguishes deep semantic agreement from shallower consensus while preserving safety thresholds. Experiments on a poisoning diagnostic and a claim-verification benchmark illustrate that the protocol aborts exactly when BFT feasibility is violated and supplies semantic digests on most valid rounds.

Core claim

H-CSC is a BFT-inspired protocol that converts embedding-derived finality signals over verdict-conditioned proposal groups into one of three typed outcomes: a semantic_commit when a 2f+1 within-verdict semantic core backs the verdict and emits a parameter-bound digest over the quantised aggregate; a verdict_commit when there is strong verdict margin but dispersed semantic rationale and emits a verdict-level certificate; or an explicit abort with a typed reason. The central claim is that this typed finality, rather than raw accuracy, supplies the needed control primitive for Byzantine LLM-agent collaboration. On BCS_v1 the method commits with 0.31 to 2.04 degrees angular deviation on feasible

What carries the argument

Hierarchical Certified Semantic Commitment (H-CSC), a protocol that classifies embedding-derived finality signals over verdict-conditioned groups into semantic_commit, verdict_commit, or typed abort outcomes.

Load-bearing premise

Embedding-derived finality signals can be turned into reliable typed outcomes without the embeddings themselves being manipulable by Byzantine agents or introducing systematic bias in semantic grouping.

What would settle it

Run the BCS_v1 diagnostic while allowing Byzantine agents to craft proposals that deliberately shift embeddings to create false semantic cores meeting the 2f+1 count, then check whether the protocol still emits a semantic_commit with low angular deviation or begins to emit incorrect typed outcomes.

Figures

Figures reproduced from arXiv: 2606.07316 by Haoran Xu, Iadh Ounis, Lei Zhang, Xianbin Wang.

**Figure 2.** Figure 2: Instantiation of the encoder primitive (CRSE). The contrastive training objective minimises intra-honest distance and [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: MVR-50 strip plot for the six certificate-emitting methods (H-CSC, B3, V1, B0, B2, B1): one method×mode per row, one metric per column. Bars are 95% task-level bootstrap CIs (10 000 resamples, seed 42). Naive baselines (no certificate) omitted; see [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

Byzantine collaboration among large-language-model agents requires a finality-control primitive: given delivered stochastic, structured natural-language proposals, the protocol must decide whether the round supports a commit, what kind of commit, or a typed safe abort. Naive aggregation hides this choice behind a single verdict; classical Byzantine fault tolerance hides it behind byte-identity that LLM proposals do not satisfy. We introduce Hierarchical Certified Semantic Commitment (H-CSC), a BFT-inspired protocol that converts embedding-derived finality signals over verdict-conditioned proposal groups into one of three typed outcomes: a semantic_commit (a 2f+1 within-verdict semantic core backs the verdict, emitting a parameter-bound digest over the quantised aggregate), a verdict_commit (strong verdict margin but dispersed semantic rationale, emitting a verdict-level certificate without claiming a semantic aggregate), or an explicit abort with a typed reason. The contribution is typed finality, not raw commit accuracy. On a controlled semantic-poisoning diagnostic (BCS_v1, 120 episodes), H-CSC commits with low angular deviation on BFT-feasible buckets (0.31 to 2.04 degrees) and aborts 100% of beyond-BFT rounds (n<3f+1) as intended. On a real LLM-agent claim-verification benchmark (MVR-50, 50 tasks) under paired static and rushing Byzantine attacks, H-CSC commits 0.90/0.92 with honest-reference-invalid rates of 0.02/0.00, statistically matching a strong certificate-emitting verdict-only baseline. Unlike that baseline, H-CSC also emits an embedding-backed semantic_commit digest on 74%/72% of rounds, supplying typed provenance. A strict-semantic ablation commits only 0.54/0.48, showing the verdict-level fallback is necessary for coverage (+0.36/+0.44) at the same <=0.04 safety floor; a 100-task cross-model check across four LLMs preserves invalid_hmaj within 0.00 to 0.03.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

H-CSC adds typed semantic versus verdict commits for LLM agents but the embedding step looks open to manipulation.

read the letter

The one thing to know is that this paper defines a BFT-inspired protocol that turns embedding signals over verdict groups into one of three outcomes: a semantic commit with a parameter-bound digest, a verdict commit without the semantic claim, or a typed abort. The BCS_v1 results show 100% aborts outside the n<3f+1 threshold and low angular deviation inside it, while MVR-50 keeps invalid rates at 0.02/0.00 and still produces semantic commits on roughly three-quarters of rounds.

The work does a decent job separating the finality types and showing that the verdict fallback improves coverage without raising the safety floor much. The ablation and cross-model check give some evidence that the numbers are not tied to one setup. That is a practical distinction from plain majority or byte-identity BFT.

The soft spot is the embedding assumption. The protocol needs embeddings to reliably identify a 2f+1 semantic core inside a verdict bucket, yet the abstract gives no account of how embeddings are generated, whether the model is honest, or any binding between raw text and the vector. The reported attacks are on proposals, not on the embedding step, so the stress-test concern stands: a Byzantine agent could potentially shift distances or fabricate a core that meets the numerical threshold but does not reflect genuine agreement. Without those details the numeric results are hard to interpret as strong support.

This is for researchers building multi-agent LLM systems who need more granular finality than current options. A reader who wants a concrete primitive with initial benchmark numbers would find it worth looking at. It deserves peer review because the typed outcomes are distinct and the experiments are at least directionally consistent, even though the embedding provenance and full methods will need to be supplied.

Referee Report

2 major / 1 minor

Summary. The paper introduces Hierarchical Certified Semantic Commitment (H-CSC), a BFT-inspired protocol for LLM-agent collaboration that converts embedding-derived finality signals over verdict-conditioned proposal groups into one of three typed outcomes: semantic_commit (requiring a 2f+1 within-verdict semantic core and emitting a parameter-bound digest), verdict_commit (strong verdict margin but dispersed semantics), or a typed abort. It reports empirical results on the BCS_v1 semantic-poisoning diagnostic (120 episodes) showing low angular deviation (0.31–2.04°) on BFT-feasible buckets and 100% abort on n<3f+1 rounds, and on the MVR-50 claim-verification benchmark (50 tasks) under static/rushing attacks showing commit rates of 0.90/0.92 with honest-reference-invalid rates of 0.02/0.00 while emitting semantic digests on 74%/72% of rounds; an ablation and cross-model check are also presented.

Significance. If the embedding provenance and non-manipulability assumptions hold, the work provides a useful primitive for typed finality in stochastic natural-language multi-agent settings, distinguishing semantic-core agreement from verdict-level agreement and supplying provenance-backed digests; the empirical evaluation on controlled diagnostics and real benchmarks with ablations demonstrates practical coverage gains over verdict-only baselines at comparable safety levels.

major comments (2)

[Abstract] Abstract: The central security claim—that embedding-derived signals reliably yield typed outcomes without spurious semantic_commit digests—rests on the unstated assumption that the embedding model and proposal-to-embedding mapping cannot be adversarially manipulated by Byzantine agents. The reported 100% abort rate on n<3f+1 rounds and low angular deviations on feasible buckets would not demonstrate resilience if Byzantine-controlled proposals could shift angular distances to fabricate a 2f+1 semantic core inside a verdict bucket; no description of embedding provenance, honesty assumptions, or cryptographic binding between text and embedding is provided.
[Abstract] Abstract (MVR-50 results): The claim that H-CSC 'statistically matching a strong certificate-emitting verdict-only baseline' while additionally emitting semantic digests on 74%/72% of rounds presupposes that the semantic-core detection step cannot be gamed to produce semantically invalid yet numerically qualifying commits. The strict-semantic ablation (0.54/0.48) shows the verdict fallback is load-bearing for coverage, yet no analysis addresses whether Byzantine agents controlling proposal generation could bias verdict-conditioned grouping to trigger invalid semantic_commit without violating the n<3f+1 abort condition.

minor comments (1)

[Abstract] The abstract reports numeric results (angular deviations, commit rates, invalid rates) but does not define the embedding model, quantization procedure, or exact criteria for 'within-verdict semantic core'; adding these definitions would improve verifiability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the embedding assumptions and potential for adversarial manipulation in semantic detection. We address each point below and will update the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central security claim—that embedding-derived signals reliably yield typed outcomes without spurious semantic_commit digests—rests on the unstated assumption that the embedding model and proposal-to-embedding mapping cannot be adversarially manipulated by Byzantine agents. The reported 100% abort rate on n<3f+1 rounds and low angular deviations on feasible buckets would not demonstrate resilience if Byzantine-controlled proposals could shift angular distances to fabricate a 2f+1 semantic core inside a verdict bucket; no description of embedding provenance, honesty assumptions, or cryptographic binding between text and embedding is provided.

Authors: The protocol is designed under the assumption that the embedding function is a trusted, fixed component outside the control of Byzantine agents, who can only influence the content of the natural language proposals. This is consistent with the BFT setting where the 'network' and oracles are honest. We did not include cryptographic binding as the focus is on semantic rather than cryptographic identity. We will add an 'Assumptions' paragraph in the introduction and a dedicated subsection in the protocol description to explicitly state the honesty of the embedding model and discuss provenance at the system level. This will clarify that the security claims hold under these assumptions. revision: yes
Referee: [Abstract] Abstract (MVR-50 results): The claim that H-CSC 'statistically matching a strong certificate-emitting verdict-only baseline' while additionally emitting semantic digests on 74%/72% of rounds presupposes that the semantic-core detection step cannot be gamed to produce semantically invalid yet numerically qualifying commits. The strict-semantic ablation (0.54/0.48) shows the verdict fallback is load-bearing for coverage, yet no analysis addresses whether Byzantine agents controlling proposal generation could bias verdict-conditioned grouping to trigger invalid semantic_commit without violating the n<3f+1 abort condition.

Authors: The evaluation on MVR-50 is performed with Byzantine agents controlling proposal generation under both static and rushing attack models. The resulting honest-reference-invalid rates remain low (0.02 and 0.00), providing empirical evidence against successful gaming in the tested conditions. The n<3f+1 abort is independent of semantic distances. We agree that additional analysis on potential bias in grouping would be valuable. We will expand the 'Discussion' section with a paragraph analyzing the hierarchical protection (verdict margin first, then semantic core) and the role of the ablation in showing coverage without sacrificing the safety floor. A full formal game-theoretic analysis of embedding manipulation is beyond the current scope but noted as future work. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical protocol evaluation on benchmarks

full rationale

The paper defines H-CSC as a BFT-inspired protocol that maps embedding-derived signals over verdict groups to typed outcomes (semantic_commit, verdict_commit, or abort). Reported results consist of direct empirical measurements—angular deviations (0.31–2.04°), commit rates (0.90/0.92), abort rates (100% beyond n<3f+1), and invalid rates (0.02/0.00)—on the named BCS_v1 and MVR-50 benchmarks under specified attacks. No equations, fitted parameters, or self-citations are shown that reduce any claimed outcome to an input by construction. The protocol steps are presented as definitional design choices, not derived from prior self-referential results. The central contribution (typed finality with provenance) is validated by ablation and cross-model checks rather than by renaming or self-definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the protocol description implies standard BFT thresholds (2f+1) and embedding similarity but does not detail any fitted values or new postulated objects.

pith-pipeline@v0.9.1-grok · 5919 in / 1243 out tokens · 30851 ms · 2026-06-27T20:19:06.776171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 4 linked inside Pith

[1]

Rui Ai, Yuqi Pan, David Simchi-Levi, Milind Tambe, and Haifeng Xu. 2025. Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information.arXiv preprint arXiv:2510.01499(2025)

Pith/arXiv arXiv 2025
[2]

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2890–2896

2018
[3]

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. 119–129

2017
[4]

Gabriel Bracha. 1987. Asynchronous Byzantine agreement protocols.Information and Computation75, 2 (1987), 130–143

1987
[5]

Miguel Castro and Barbara Liskov. 1999. Practical Byzantine Fault Tolerance. InProceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI). 173–186

1999
[6]

Justin Chih-Yao Chen, Swarnadeep Swaminathan, Maria Singh, Vikram Mohanty, Alex Peridis, Oishi Olaleye, Jason Ruan, and Kyra Zhang. 2024. Reconcile: Round-Table Conference for Consensus Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)

2024
[7]

Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, and Prasant Mohapatra. 2025. Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges.arXiv preprint arXiv:2510.23883(2025)

Pith/arXiv arXiv 2025
[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 4171–4186

2019
[9]

Danny Dolev, Nancy A Lynch, Shlomit S Pinter, Eugene W Stark, and William E Weihl. 1986. Reaching Approximate Agreement in the Presence of Faults.Journal of the ACM (JACM)33, 3 (1986), 499–516

1986
[10]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. 2023. Improving Factuality and Reasoning in Language Models through Multiagent Debate. InInternational Conference on Machine Learning (ICML). PMLR, 8559–8573

2023
[11]

Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony.J. ACM35, 2 (1988), 288–323

1988
[12]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2026. From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows.ICT Express12, 2 (2026), 353–383

2026
[13]

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 8018–8025

2020
[14]

Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem.ACM Transactions on Programming Languages and Systems (TOPLAS)4, 3 (1982), 382–401

1982
[15]

Donghyun Lee and Mo Tiwari. 2024. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems.arXiv preprint arXiv:2410.07283(2024)

Pith/arXiv arXiv 2024
[16]

Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. 2020. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6193–6202

2020
[17]

Yunxuan Li, Yibing Du, Eugene Ie, et al. 2024. Improving Multi-Agent Debate with Sparse Communication Topology. InFindings of the Association for Computational Linguistics: EMNLP 2024. 1–18

2024
[18]

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Bai, Rui Wang, Zhaopeng Wang, and Shuming Shi. 2023. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate.arXiv preprint arXiv:2305.19118(2023)

Pith/arXiv arXiv 2023
[19]

Haoxiang Luo, Gang Sun, Yinqiu Liu, and Dongcheng Zhao. 2025. A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network. arXiv preprint arXiv:2505.05103(2025)

arXiv 2025
[20]

Yihuan Mao et al. 2024. IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems.arXiv preprint arXiv:2410.16237(2024)

arXiv 2024
[21]

Hammurabi Mendes and Maurice Herlihy. 2013. Multidimensional Approximate Agreement in Byzantine Asynchronous Systems. InProceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC). 391–400

2013
[22]

Tijana Milentijević, Mélanie Cambus, Darya Melnyk, and Stefan Schmid. 2025. Approximate Agreement Algorithms for Byzantine Collaborative Learning.arXiv preprint arXiv:2504.01504(2025)

arXiv 2025
[23]

Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. InProceedings of EMNLP 2020: System Demonstrations

2020
[24]

OpenAI. 2024. GPT-4o System Card. https://openai.com/index/gpt-4o-system-card/

2024
[25]

Joon Sung Park, Joseph C O’Keefe, Cai O’Brien, Micheal Baker, Maneesh Tanaka, and Percy Liang. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST). 1–22

2023
[26]

Kakade, and Zaid Harchaoui

Krishna Pillutla, Sham M. Kakade, and Zaid Harchaoui. 2022. Robust Aggregation for Federated Learning.IEEE Transactions on Signal Processing70 (2022), 1142–1154

2022
[27]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3982–3992

2019
[28]

Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. 2018. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. InInternational Conference on Machine Learning (ICML). 5650–5659

2018
[29]

Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. 2019. HotStuff: BFT Consensus in the Lens of Blockchain. InProceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC). 31–40

2019
[30]

Justin Zhao, Flor Miriam Plaza-del Arco, and Amanda Cercas Curry. 2025. Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks. InProceedings of NAACL 2025 (Long Papers)

2025

[1] [1]

Rui Ai, Yuqi Pan, David Simchi-Levi, Milind Tambe, and Haifeng Xu. 2025. Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information.arXiv preprint arXiv:2510.01499(2025)

Pith/arXiv arXiv 2025

[2] [2]

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2890–2896

2018

[3] [3]

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. 119–129

2017

[4] [4]

Gabriel Bracha. 1987. Asynchronous Byzantine agreement protocols.Information and Computation75, 2 (1987), 130–143

1987

[5] [5]

Miguel Castro and Barbara Liskov. 1999. Practical Byzantine Fault Tolerance. InProceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI). 173–186

1999

[6] [6]

Justin Chih-Yao Chen, Swarnadeep Swaminathan, Maria Singh, Vikram Mohanty, Alex Peridis, Oishi Olaleye, Jason Ruan, and Kyra Zhang. 2024. Reconcile: Round-Table Conference for Consensus Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)

2024

[7] [7]

Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, and Prasant Mohapatra. 2025. Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges.arXiv preprint arXiv:2510.23883(2025)

Pith/arXiv arXiv 2025

[8] [8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 4171–4186

2019

[9] [9]

Danny Dolev, Nancy A Lynch, Shlomit S Pinter, Eugene W Stark, and William E Weihl. 1986. Reaching Approximate Agreement in the Presence of Faults.Journal of the ACM (JACM)33, 3 (1986), 499–516

1986

[10] [10]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. 2023. Improving Factuality and Reasoning in Language Models through Multiagent Debate. InInternational Conference on Machine Learning (ICML). PMLR, 8559–8573

2023

[11] [11]

Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony.J. ACM35, 2 (1988), 288–323

1988

[12] [12]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2026. From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows.ICT Express12, 2 (2026), 353–383

2026

[13] [13]

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 8018–8025

2020

[14] [14]

Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem.ACM Transactions on Programming Languages and Systems (TOPLAS)4, 3 (1982), 382–401

1982

[15] [15]

Donghyun Lee and Mo Tiwari. 2024. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems.arXiv preprint arXiv:2410.07283(2024)

Pith/arXiv arXiv 2024

[16] [16]

Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. 2020. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6193–6202

2020

[17] [17]

Yunxuan Li, Yibing Du, Eugene Ie, et al. 2024. Improving Multi-Agent Debate with Sparse Communication Topology. InFindings of the Association for Computational Linguistics: EMNLP 2024. 1–18

2024

[18] [18]

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Bai, Rui Wang, Zhaopeng Wang, and Shuming Shi. 2023. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate.arXiv preprint arXiv:2305.19118(2023)

Pith/arXiv arXiv 2023

[19] [19]

Haoxiang Luo, Gang Sun, Yinqiu Liu, and Dongcheng Zhao. 2025. A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network. arXiv preprint arXiv:2505.05103(2025)

arXiv 2025

[20] [20]

Yihuan Mao et al. 2024. IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems.arXiv preprint arXiv:2410.16237(2024)

arXiv 2024

[21] [21]

Hammurabi Mendes and Maurice Herlihy. 2013. Multidimensional Approximate Agreement in Byzantine Asynchronous Systems. InProceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC). 391–400

2013

[22] [22]

Tijana Milentijević, Mélanie Cambus, Darya Melnyk, and Stefan Schmid. 2025. Approximate Agreement Algorithms for Byzantine Collaborative Learning.arXiv preprint arXiv:2504.01504(2025)

arXiv 2025

[23] [23]

Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. InProceedings of EMNLP 2020: System Demonstrations

2020

[24] [24]

OpenAI. 2024. GPT-4o System Card. https://openai.com/index/gpt-4o-system-card/

2024

[25] [25]

Joon Sung Park, Joseph C O’Keefe, Cai O’Brien, Micheal Baker, Maneesh Tanaka, and Percy Liang. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST). 1–22

2023

[26] [26]

Kakade, and Zaid Harchaoui

Krishna Pillutla, Sham M. Kakade, and Zaid Harchaoui. 2022. Robust Aggregation for Federated Learning.IEEE Transactions on Signal Processing70 (2022), 1142–1154

2022

[27] [27]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3982–3992

2019

[28] [28]

Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. 2018. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. InInternational Conference on Machine Learning (ICML). 5650–5659

2018

[29] [29]

Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. 2019. HotStuff: BFT Consensus in the Lens of Blockchain. InProceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC). 31–40

2019

[30] [30]

Justin Zhao, Flor Miriam Plaza-del Arco, and Amanda Cercas Curry. 2025. Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks. InProceedings of NAACL 2025 (Long Papers)

2025