SGTO-MAS: Secure Gorilla Troops Optimization for Multi-Agent LLM Systems

Saeid Jamshidi

arxiv: 2606.07940 · v1 · pith:LNBORONHnew · submitted 2026-06-06 · 💻 cs.CR

SGTO-MAS: Secure Gorilla Troops Optimization for Multi-Agent LLM Systems

Saeid Jamshidi This is my paper

Pith reviewed 2026-06-27 19:53 UTC · model grok-4.3

classification 💻 cs.CR

keywords multi-agent LLM systemssecure coordinationgorilla troops optimizationtrust modelingrisk-aware evaluationconstrained optimizationswarm intelligenceagent selection

0 comments

The pith

Adapting Gorilla Troops Optimization with trust and risk modeling solves constrained multi-agent LLM coordination problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper formulates multi-agent LLM coordination as a constrained optimization problem to balance performance, security, and computational cost. It proposes SGTO-MAS, which adapts Gorilla Troops Optimization to incorporate trust modeling, risk-aware evaluation, and collective intelligence for adaptive agent selection under threats. Experiments across 500 runs report a stable performance score of 0.5281, consensus of 0.8764, risk held at 0.3000, and average selection of 4.04 agents. A sympathetic reader would care because current heuristic approaches lack mechanisms to manage error propagation and resource waste in adversarial settings. If the claim holds, structured swarm optimization offers a repeatable way to achieve efficient coordination without excessive overhead.

Core claim

The paper claims that multi-agent LLM coordination reduces to a constrained optimization problem whose solution is found by a security-aware variant of Gorilla Troops Optimization; this variant unifies trust modeling, risk-aware evaluation, and collective intelligence inside a single objective, yielding adaptive agent subsets that maintain stable performance, high consensus, and bounded risk across independent runs.

What carries the argument

Security-aware Gorilla Troops Optimization (SGTO) for multi-agent selection, which embeds trust and risk terms into the swarm update rules to jointly optimize performance, security, and subset size.

If this is right

Agent subsets remain compact (around four members) while consensus stays above 0.87 and risk stays at 0.3.
Optimization finishes in roughly 24 seconds per run with score standard deviation below 0.02.
Performance degrades by at most 5 percent when agents are removed or consensus is disrupted.
The same framework produces stable results across 500 independent trials under controlled threat variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trust-plus-risk formulation could be swapped into other swarm algorithms to test whether GTO is essential or whether any population-based optimizer suffices.
Extending the objective to include latency or energy cost would show whether the current compact-subset result generalizes to resource-constrained deployments.
The reported graceful degradation suggests the method could be combined with dynamic agent addition protocols without resetting the optimization state.

Load-bearing premise

The adapted Gorilla Troops Optimization algorithm can reliably locate solutions to the constrained coordination problem under varying threats without post-hoc parameter adjustments that produce the reported metrics.

What would settle it

Running the method on fresh threat models or LLM configurations and observing that average performance falls below 0.45, consensus below 0.80, or selected agent count exceeds 8 while risk exceeds 0.40.

Figures

Figures reproduced from arXiv: 2606.07940 by Saeid Jamshidi.

**Figure 1.** Figure 1: Overview of the proposed SGTO-MAS approach. The system performs threat-aware query analysis, followed by optimization-based agent selection [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Relationship between diversity and total fitness across runs. Controlled [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of output scores across all runs. The narrow spread indicates stable score behavior across repeated executions. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Optimization behavior across iterations. The convergence trend shows stable improvement in fitness, indicating a balanced interaction between [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Relationship between trust and decision score across all runs. The monotonic trend indicates that SGTO-MAS favors higher-trust configurations during [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Collective intelligence behavior across runs, illustrating the interaction between diversity and complementarity and their contribution to stable [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Convergence trajectory of the optimization process, showing stable [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Relationship between confidence and quality across outputs. Both [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Multi-agent large language model (LLM) systems offer strong capabilities for complex reasoning and decision-making, yet coordination across agents introduces error propagation, security risks, and inefficient use of resources. Existing methods often rely on heuristic, static strategies and lack a principled mechanism for balancing performance, security, and computational cost. This paper formulates multi-agent LLM coordination as a constrained optimization problem and proposes a security-aware method for adaptive agent selection. The method integrates trust modeling, risk-aware evaluation, and collective intelligence within a unified optimization objective. To solve the problem efficiently, we use a swarm-intelligence strategy inspired by Gorilla Troops Optimization (GTO), enabling adaptive coordination under varying threat conditions. Controlled experiments across 500 independent runs demonstrate the effectiveness of the proposed method. The system achieves a stable average performance score of 0.5281, with high consensus (0.8764), controlled risk (0.3000), and compact agent subsets averaging 4.04 selected agents. The optimization process converges efficiently, with an average runtime of 24.09 seconds per run and low score variability (standard deviation = 0.0173). Robustness analysis indicates graceful degradation under perturbations, with performance drops limited to 2.5% under agent removal and 5.3% under consensus disruption. These results show that effective multi-agent coordination can be achieved through structured optimization that jointly manages performance, security, and efficiency. The proposed method provides a practical security-aware solution for coordinating multi-agent LLM systems in complex adversarial settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies an existing swarm optimizer to multi-agent LLM agent selection with trust and risk terms, but the effectiveness numbers stand alone without baselines or score definitions.

read the letter

The core of this paper is taking Gorilla Troops Optimization, adding trust modeling and risk-aware terms, and using it to pick small subsets of agents in multi-agent LLM setups. They run 500 trials and report average performance 0.5281, consensus 0.8764, risk 0.3000, and about 4 agents selected, plus runtime and some perturbation tests.

What stands out is the concrete experimental output: they give standard deviation, convergence time, and two robustness checks (agent removal and consensus disruption). That level of reporting is better than many heuristic papers that stop at a single run.

The soft spot is exactly what the stress test flags. The abstract claims the method beats heuristic and static strategies, yet only absolute internal numbers appear. There are no baseline algorithms, no definition of the unified performance score, and no task description that would let a reader judge whether 0.5281 is meaningful. Without those, the numbers cannot be read as evidence of improvement or graceful behavior under threat. The robustness drops (2.5 % and 5.3 %) are also hard to interpret without knowing the perturbation model or the original task.

The work is a straightforward application rather than a new derivation, so its value is mainly practical for people already building multi-agent LLM pipelines who need a ready optimization wrapper. It is coherent on its own terms and shows honest engagement with the experimental side, even if the comparisons are missing.

I would bring it to a reading group for the experimental setup discussion. I would not cite it myself. A serious editor should send it to review rather than desk-reject; the experiments are there and the gap is fixable with added baselines and clearer metric definitions.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes SGTO-MAS, a method that formulates multi-agent LLM coordination as a constrained optimization problem and solves it using a security-aware adaptation of Gorilla Troops Optimization incorporating trust modeling and risk-aware evaluation. It reports results from 500 independent runs demonstrating average performance score of 0.5281, consensus of 0.8764, risk of 0.3000, average of 4.04 selected agents, runtime of 24.09 seconds, and robustness with small performance drops under perturbations.

Significance. The approach of using swarm intelligence for secure multi-agent LLM coordination could be significant for practical deployment in adversarial settings if the claims hold. The experimental design with 500 runs, reporting of standard deviation, and robustness analysis are strengths that provide some reproducibility. However, the absence of baseline comparisons limits the ability to assess the advance over existing methods.

major comments (3)

[Abstract] Abstract: the performance score of 0.5281 is presented without definition or explanation of how it is computed from the unified objective of performance, security, and cost.
[Abstract] Abstract: no comparisons to the heuristic or static strategies referenced in the introduction are reported, so the effectiveness claim relative to alternatives cannot be evaluated.
[Abstract] Abstract: the specific tasks, threat models, and precise definitions of the consensus (0.8764) and risk (0.3000) metrics are absent, which is load-bearing for interpreting the 500-run results and robustness claims.

minor comments (1)

[Abstract] Abstract: the phrase 'graceful degradation' is used without specifying the exact perturbation levels or experimental protocol for agent removal and consensus disruption.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each major point below and will revise the manuscript accordingly to enhance clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: the performance score of 0.5281 is presented without definition or explanation of how it is computed from the unified objective of performance, security, and cost.

Authors: We agree this definition is needed for interpretability in the abstract. The performance score is a normalized weighted combination drawn from the unified objective (performance + security - cost) as defined in Section 3. We will revise the abstract to include a concise explanation of this computation. revision: yes
Referee: [Abstract] Abstract: no comparisons to the heuristic or static strategies referenced in the introduction are reported, so the effectiveness claim relative to alternatives cannot be evaluated.

Authors: The experiments focus on absolute metrics and robustness across 500 runs rather than explicit baselines. We acknowledge the limitation for assessing relative advance and will add comparisons to the referenced heuristic and static strategies in the revised version. revision: yes
Referee: [Abstract] Abstract: the specific tasks, threat models, and precise definitions of the consensus (0.8764) and risk (0.3000) metrics are absent, which is load-bearing for interpreting the 500-run results and robustness claims.

Authors: We will revise the abstract to incorporate brief definitions of the consensus and risk metrics along with a high-level description of the tasks and threat models. Full formal definitions and experimental setup appear in Sections 4 and 5. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical validation independent of any derivation chain

full rationale

The provided abstract formulates multi-agent coordination as a constrained optimization problem and adopts a swarm-intelligence strategy inspired by existing GTO, then reports performance metrics obtained from 500 independent experimental runs. No equations, fitted parameters, or self-citations appear that would reduce the reported scores (0.5281 performance, 0.8764 consensus, etc.) to algebraic identities or prior author work by construction. The results are presented as direct simulation outputs rather than predictions forced by the method's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies insufficient technical detail to enumerate free parameters, axioms, or invented entities; the optimization objective and GTO adaptation are described at a conceptual level only.

pith-pipeline@v0.9.1-grok · 5799 in / 1222 out tokens · 23946 ms · 2026-06-27T19:53:57.694987+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Foundation models and intelligent decision-making: Progress, challenges, and perspectives,

J. Huang, Y . Xu, Q. Wang, Q. C. Wang, X. Liang, F. Wang, Z. Zhang, W. Wei, B. Zhang, L. Huanget al., “Foundation models and intelligent decision-making: Progress, challenges, and perspectives,”The Innova- tion, vol. 6, no. 6, 2025

2025
[2]

On the planning abilities of large language models-a critical investigation,

K. Valmeekam, M. Marquez, S. Sreedharan, and S. Kambhampati, “On the planning abilities of large language models-a critical investigation,” Advances in neural information processing systems, vol. 36, pp. 75 993– 76 005, 2023

2023
[3]

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

K. Tran, M. Nguyenet al., “Large language model based multi-agents: A survey,”arXiv preprint arXiv:2308.10848, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Llm-based multi-agent decision-making: Challenges and future directions,

Y . Sunet al., “Llm-based multi-agent decision-making: Challenges and future directions,”IEEE Robotics and Automation Letters, 2025

2025
[5]

Agentnet: Decentralized evolutionary coordination for llm-based multi- agent systems,

Y . Yang, H. Chai, S. Shao, Y . Song, S. Qi, R. Rui, and W. Zhang, “Agentnet: Decentralized evolutionary coordination for llm-based multi- agent systems,”Advances in Neural Information Processing Systems, vol. 38, pp. 107 309–107 336, 2026

2026
[6]

Llm-based multi-agent systems: Frameworks, evaluation, open challenges, and research frontiers,

S. H. Shaikh, “Llm-based multi-agent systems: Frameworks, evaluation, open challenges, and research frontiers,” inInternational Joint Confer- ence on Computational Intelligence. Springer, 2025, pp. 149–170

2025
[7]

A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,

X. Li, S. Wang, S. Zeng, Y . Wu, and Y . Yang, “A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,” Vicinagearth, vol. 1, no. 1, p. 9, 2024

2024
[8]

On the risk of hallucination propagation in multi-agent llm systems,

X. Zhanget al., “On the risk of hallucination propagation in multi-agent llm systems,”arXiv preprint arXiv:2402.00000, 2024

work page arXiv 2024
[9]

Managing uncertainty in multi-agent large language model systems,

J. Lianget al., “Managing uncertainty in multi-agent large language model systems,”arXiv preprint arXiv:2405.00000, 2024

work page arXiv 2024
[10]

Malf: A multi-agent llm framework for intelligent fuzzing of industrial control protocols,

B. Ning, X. Zong, and K. He, “Malf: A multi-agent llm framework for intelligent fuzzing of industrial control protocols,”arXiv preprint arXiv:2510.02694, 2025

work page arXiv 2025
[11]

Autogen: Enabling next-gen llm applications via multi-agent conversations,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst conference on language modeling, 2024

2024
[12]

Camel: Communicative agents for

G. Li, H. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for” mind” exploration of large language model society,”Advances in neural information processing systems, vol. 36, pp. 51 991–52 008, 2023

2023
[13]

From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,

M. A. Ferraget al., “From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,”Internet of Things and Cyber- Physical Systems, 2025

2025
[14]

Artificial gorilla troops optimizer,

N. Van Thieu and L. Van Quan, “Artificial gorilla troops optimizer,” in Encyclopedia of Engineering Optimization and Heuristics. Springer, 2026, pp. 1–9

2026
[15]

Llm-based multi-agent systems for software engineering: Literature review and vision,

J. He, C. Treude, and D. Lo, “Llm-based multi-agent systems for software engineering: Literature review and vision,”ACM Computing Surveys, 2025

2025
[16]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wuet al., “Autogen: Enabling next-gen llm applications via multi- agent conversation,”arXiv preprint arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Camel: Communicative agents for mind exploration of llm society,

G. Liet al., “Camel: Communicative agents for mind exploration of llm society,” inNeurIPS, 2023

2023
[18]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Y . Duet al., “Improving factuality and reasoning in language models through multiagent debate,”arXiv preprint arXiv:2305.14325, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Llm-powered ai agent systems and their applications in industry,

G. Liang and Q. Tong, “Llm-powered ai agent systems and their applications in industry,”IEEE, 2025

2025
[20]

An in-depth survey of the artificial gorilla troops optimizer: outcomes, variations, and applications

A. G. Hussien, A. Bouaouda, A. Alzaqebah, S. Kumar, G. Hu, and H. Jia, “An in-depth survey of the artificial gorilla troops optimizer: outcomes, variations, and applications.”Artificial intelligence review, vol. 57, no. 9, 2024

2024

[1] [1]

Foundation models and intelligent decision-making: Progress, challenges, and perspectives,

J. Huang, Y . Xu, Q. Wang, Q. C. Wang, X. Liang, F. Wang, Z. Zhang, W. Wei, B. Zhang, L. Huanget al., “Foundation models and intelligent decision-making: Progress, challenges, and perspectives,”The Innova- tion, vol. 6, no. 6, 2025

2025

[2] [2]

On the planning abilities of large language models-a critical investigation,

K. Valmeekam, M. Marquez, S. Sreedharan, and S. Kambhampati, “On the planning abilities of large language models-a critical investigation,” Advances in neural information processing systems, vol. 36, pp. 75 993– 76 005, 2023

2023

[3] [3]

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

K. Tran, M. Nguyenet al., “Large language model based multi-agents: A survey,”arXiv preprint arXiv:2308.10848, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Llm-based multi-agent decision-making: Challenges and future directions,

Y . Sunet al., “Llm-based multi-agent decision-making: Challenges and future directions,”IEEE Robotics and Automation Letters, 2025

2025

[5] [5]

Agentnet: Decentralized evolutionary coordination for llm-based multi- agent systems,

Y . Yang, H. Chai, S. Shao, Y . Song, S. Qi, R. Rui, and W. Zhang, “Agentnet: Decentralized evolutionary coordination for llm-based multi- agent systems,”Advances in Neural Information Processing Systems, vol. 38, pp. 107 309–107 336, 2026

2026

[6] [6]

Llm-based multi-agent systems: Frameworks, evaluation, open challenges, and research frontiers,

S. H. Shaikh, “Llm-based multi-agent systems: Frameworks, evaluation, open challenges, and research frontiers,” inInternational Joint Confer- ence on Computational Intelligence. Springer, 2025, pp. 149–170

2025

[7] [7]

A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,

X. Li, S. Wang, S. Zeng, Y . Wu, and Y . Yang, “A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,” Vicinagearth, vol. 1, no. 1, p. 9, 2024

2024

[8] [8]

On the risk of hallucination propagation in multi-agent llm systems,

X. Zhanget al., “On the risk of hallucination propagation in multi-agent llm systems,”arXiv preprint arXiv:2402.00000, 2024

work page arXiv 2024

[9] [9]

Managing uncertainty in multi-agent large language model systems,

J. Lianget al., “Managing uncertainty in multi-agent large language model systems,”arXiv preprint arXiv:2405.00000, 2024

work page arXiv 2024

[10] [10]

Malf: A multi-agent llm framework for intelligent fuzzing of industrial control protocols,

B. Ning, X. Zong, and K. He, “Malf: A multi-agent llm framework for intelligent fuzzing of industrial control protocols,”arXiv preprint arXiv:2510.02694, 2025

work page arXiv 2025

[11] [11]

Autogen: Enabling next-gen llm applications via multi-agent conversations,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inFirst conference on language modeling, 2024

2024

[12] [12]

Camel: Communicative agents for

G. Li, H. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for” mind” exploration of large language model society,”Advances in neural information processing systems, vol. 36, pp. 51 991–52 008, 2023

2023

[13] [13]

From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,

M. A. Ferraget al., “From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows,”Internet of Things and Cyber- Physical Systems, 2025

2025

[14] [14]

Artificial gorilla troops optimizer,

N. Van Thieu and L. Van Quan, “Artificial gorilla troops optimizer,” in Encyclopedia of Engineering Optimization and Heuristics. Springer, 2026, pp. 1–9

2026

[15] [15]

Llm-based multi-agent systems for software engineering: Literature review and vision,

J. He, C. Treude, and D. Lo, “Llm-based multi-agent systems for software engineering: Literature review and vision,”ACM Computing Surveys, 2025

2025

[16] [16]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wuet al., “Autogen: Enabling next-gen llm applications via multi- agent conversation,”arXiv preprint arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

Camel: Communicative agents for mind exploration of llm society,

G. Liet al., “Camel: Communicative agents for mind exploration of llm society,” inNeurIPS, 2023

2023

[18] [18]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Y . Duet al., “Improving factuality and reasoning in language models through multiagent debate,”arXiv preprint arXiv:2305.14325, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Llm-powered ai agent systems and their applications in industry,

G. Liang and Q. Tong, “Llm-powered ai agent systems and their applications in industry,”IEEE, 2025

2025

[20] [20]

An in-depth survey of the artificial gorilla troops optimizer: outcomes, variations, and applications

A. G. Hussien, A. Bouaouda, A. Alzaqebah, S. Kumar, G. Hu, and H. Jia, “An in-depth survey of the artificial gorilla troops optimizer: outcomes, variations, and applications.”Artificial intelligence review, vol. 57, no. 9, 2024

2024