arxiv: 2605.01147 · v1 · submitted 2026-05-01 · 💻 cs.AI

Recognition: unknown

Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment

Tanav Singh Bajaj , Nikhil Singh , Karan Anand , Eishkaran Singh

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:49 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentic AIinteraction topologyAI safetymulti-agent systemsordering instabilityinformation cascadesfunctional collapse

0 comments

The pith

Safety and fairness in multi-agent AI systems depend on how agents interact rather than on the scale or alignment of their models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper challenges the assumption that safe individual AI models will produce safe behavior when they interact as agents in high-stakes decisions. It argues that the structure of information flow and decision coupling among agents overrides any properties of the underlying models. The authors document three recurring failures that appear across model families and sizes: outcomes that flip with the order of agents, early judgments that spread unchecked, and systems that meet fairness scores while ignoring actual risk differences. Larger models make these failures more pronounced because they reach consensus faster. Standard alignment and evaluation methods that examine models in isolation therefore miss the dominant source of risk.

Core claim

In agentic AI, safety is determined by interaction topology, not model weights. When agents deliberate sequentially or aggregate via parallel voting with a judge, the structure of information flow and decision coupling dominates outcomes. Evidence across model families and scales reveals three persistent topology-driven pathologies: ordering instability, where system behavior depends primarily on agent sequence; information cascades, where early judgments propagate regardless of correctness; and functional collapse, where systems satisfy fairness metrics while abandoning meaningful risk discrimination. Contrary to intuition, scaling to more capable models strengthens these effects by increas

What carries the argument

Interaction topology: the pattern of information flow and decision coupling among agents, which determines system-level safety and fairness independently of individual model scale or alignment.

If this is right

Safety evaluations must test multi-agent systems under varied interaction architectures rather than isolated models.
Regulatory requirements for high-stakes agentic AI should include demonstrated robustness to changes in agent ordering and coupling.
Model scaling alone will not resolve these issues and may intensify consensus-driven failures.
Design of agent systems must treat topology as a controllable variable subject to explicit safety testing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Topologies with randomized ordering or explicit dissent mechanisms could be tested as practical mitigations.
The same topology effects may appear in human-AI collaborative decision systems.
Existing benchmarks for single-model alignment are structurally blind to these interaction failures.

Load-bearing premise

The three observed pathologies are general consequences of any sequential or voting-based interaction structure rather than artifacts of the particular agent implementations or evaluation tasks used.

What would settle it

An experiment showing that reordering agents or changing the aggregation rule eliminates ordering instability, information cascades, and functional collapse while holding model weights fixed, or conversely that the pathologies persist in non-agentic systems with the same models.

Figures

Figures reproduced from arXiv: 2605.01147 by Eishkaran Singh, Karan Anand, Nikhil Singh, Tanav Singh Bajaj.

**Figure 1.** Figure 1: Baseline: a single case is evaluated by one agent to produce an approve or Decline decision. ity will mitigate coordination failures. Larger models are expected to reason more carefully, challenge incorrect premises, and correct earlier mistakes (Kaplan et al., 2020; Hoffmann et al., 2022). Empirically, however, increasing model scale often strengthens internal consistency and consensus formation, reducin… view at source ↗

**Figure 3.** Figure 3: Parallel topology: four agents evaluate the same case; all outputs feed into the Business Decision Agent, which produces the final decision. 2. Conceptual Framework: Topology as a Safety-Critical Variable 2.1. Agentic Systems as Dynamical Systems We model an agentic AI system not as a static collection of independent predictors, but as a dynamical system whose state consists of intermediate beliefs, ration… view at source ↗

**Figure 4.** Figure 4: Llama 3.2 3B: Approval rates by agent position across all 24 orderings. Overall average approval rate: 81.4%. The model exhibits extreme ordering sensitivity, with approval rates ranging from 36.4% to 97.8% depending solely on agent sequence. Notice the dramatic color variation across columns: orderings with Risk Manager or Regulator in early positions (left side, darker) produce substantially lower approv… view at source ↗

**Figure 5.** Figure 5: Llama 3.1 70B: Approval rates by agent position across all 24 orderings. Overall average approval rate: 83.1%. Despite being a 23× larger model with substantially improved reasoning capabilities, ordering sensitivity persists. Approval rates range from 71.5% to 93.2%, a 21.7 percentage point spread. The heatmap shows more uniformity within columns (darker green throughout) compared to Llama 3B, reflecting … view at source ↗

**Figure 6.** Figure 6: Qwen 2.5 72B: Approval rates by agent position across 8 representative orderings. Overall average approval rate: 80.7%. Even with a limited sample of orderings selected to maximize role-position diversity, we observe a 21.5 percentage point range (71.5% to 93.0%). The consistency of topological sensitivity across three model families spanning a 24× scale range demonstrates that ordering instability is a st… view at source ↗

**Figure 7.** Figure 7: Distribution of approval rates across sequential orderings by model family. The violin plots reveal that ordering instability is not a minor perturbation but a first-order determinant of system behavior. Llama 3B exhibits a bimodal distribution spanning a 59.0 percentage point range, with the majority of orderings clustered in two regions: high approval (∼85–95%) and moderate approval (∼60–80%), depending … view at source ↗

**Figure 8.** Figure 8: stratifies approval rates by credit score tier, revealing how parallel topology alters the relationship between applicant risk and system decisions. The comparison demonstrates that parallel aggregation introduces a distinct pathology: the system satisfies aggregate fairness metrics while losing the capacity to perform its core function view at source ↗

read the original abstract

As large language models are increasingly deployed as interacting agents in high-stakes decisions, the AI safety community assumes that safety properties of individual models will compose into safe multi-agent behavior. This position paper argues that this assumption is fundamentally mistaken. In agentic AI, safety is determined by interaction topology, not model weights. When agents deliberate sequentially or aggregate via parallel voting with a judge, the structure of information flow and decision coupling dominates outcomes. Evidence across model families and scales reveals three persistent topology-driven pathologies: ordering instability, where system behavior depends primarily on agent sequence; information cascades, where early judgments propagate regardless of correctness; and functional collapse, where systems satisfy fairness metrics while abandoning meaningful risk discrimination. Contrary to intuition, scaling to more capable models strengthens these effects by increasing consensus formation and reducing the challenge of initial decisions. These failure modes are invisible to model-centric evaluation and alignment procedures. We argue that agentic AI must be treated as a dynamical system rather than a collection of aligned components. Interaction topology must become a primary target of safety evaluation and regulation, with systems required to demonstrate robustness across architectural variations before deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This position paper claims that interaction topology drives safety failures in agentic AI more than model scale or alignment, but the supporting evidence for its three pathologies is not shown.

read the letter

The main point to take away is that the authors argue safety in systems of interacting agents comes from the structure of information flow and decision coupling, not from how well individual models are aligned or how large they are. They identify ordering instability, information cascades, and functional collapse as recurring problems that get worse as models scale because bigger models reach consensus faster and make initial judgments harder to challenge. They also note that these issues can slip past fairness metrics while the system stops doing real risk discrimination. That framing is the clearest new angle here, and it usefully challenges the default assumption that aligned components will compose into safe multi-agent behavior. The paper does a straightforward job spelling out why sequential deliberation or parallel voting with a judge can produce these effects, which is a practical reminder for anyone designing agent deployments. The soft spot is exactly what the stress-test note flags: the abstract asserts evidence across model families and scales, yet the manuscript supplies no agent definitions, prompts, voting rules, metrics, or results. Without those details it is impossible to tell whether the pathologies are general properties of the topologies or tied to the specific setups and evaluation choices the authors used. This is a position paper aimed at the AI safety community working on multi-agent or high-stakes deployments. Readers already thinking about system-level design rather than single-model fixes will find the argument worth considering, but anyone looking for testable claims or reproducible findings will come away wanting more. The work is coherent enough on its own terms to deserve a serious referee who can ask for the missing experimental protocol and controls. I would send it to peer review on the condition that the authors add the data and methods so the central claim can be evaluated rather than just asserted.

Referee Report

2 major / 1 minor

Summary. The manuscript is a position paper arguing that safety and fairness properties in agentic AI systems are determined by interaction topology (e.g., sequential deliberation or parallel voting with a judge) rather than by model scale or alignment. It identifies three topology-driven pathologies—ordering instability, information cascades, and functional collapse—that dominate outcomes and are strengthened by scaling, rendering them invisible to model-centric evaluations. The paper concludes that agentic AI must be analyzed as dynamical systems, with interaction topology becoming a primary target for safety evaluation and regulation.

Significance. If the central claims hold and the pathologies prove robust across setups, the position would meaningfully reorient AI safety research away from individual model alignment toward system-level interaction design. This could affect evaluation protocols and regulatory requirements for deployed multi-agent systems in high-stakes domains.

major comments (2)

[Abstract] Abstract: The claim that 'evidence across model families and scales reveals' the three pathologies (ordering instability, information cascades, functional collapse) is presented without any description of experimental protocols, agent definitions, deliberation prompts, voting rules, fairness metrics, model scales tested, or quantitative results. This absence prevents assessment of whether the effects are invariant properties of topology or artifacts of specific implementations and metric choices.
[Abstract] Abstract: The assertion that 'scaling to more capable models strengthens these effects by increasing consensus formation' is stated without supporting analysis, data, or controls, yet this counterintuitive strengthening is load-bearing for the argument that model-centric approaches are insufficient.

minor comments (1)

[Abstract] The abstract introduces the three pathologies without concise definitions or illustrative examples, which reduces clarity for readers new to the concepts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our position paper. We address the two major comments on the abstract below. As a position paper, the abstract summarizes the core argument while the body provides the supporting experimental details; we will revise the abstract for greater self-containment without exceeding length constraints.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'evidence across model families and scales reveals' the three pathologies (ordering instability, information cascades, functional collapse) is presented without any description of experimental protocols, agent definitions, deliberation prompts, voting rules, fairness metrics, model scales tested, or quantitative results. This absence prevents assessment of whether the effects are invariant properties of topology or artifacts of specific implementations and metric choices.

Authors: We agree that the abstract, due to length limits, does not enumerate protocols. The full manuscript contains dedicated experimental sections that define the agent architectures, deliberation and voting procedures, fairness metrics (including risk discrimination and consensus measures), model families and scales tested, and quantitative results across configurations. These setups were chosen to isolate topology while varying scale and family, demonstrating the pathologies as topology-driven. To address the concern, we will revise the abstract to include a concise high-level description of the experimental approach and the key invariance findings. revision: partial
Referee: [Abstract] Abstract: The assertion that 'scaling to more capable models strengthens these effects by increasing consensus formation' is stated without supporting analysis, data, or controls, yet this counterintuitive strengthening is load-bearing for the argument that model-centric approaches are insufficient.

Authors: The claim is grounded in the manuscript's comparative experiments, which hold topology, prompts, and evaluation metrics fixed while varying model scale and capability. Larger models show measurably higher early consensus rates, which in turn amplify ordering instability and cascades while accelerating functional collapse. We will expand the revised manuscript with an explicit subsection on the scaling analysis, including controls, quantitative comparisons, and discussion of why this strengthening occurs, to make the supporting evidence more transparent. revision: yes

Circularity Check

0 steps flagged

No circularity; declarative position without self-referential derivations or equations

full rationale

The paper is a position statement asserting that safety and fairness in agentic AI are governed by interaction topology rather than model scale or alignment. It invokes three named pathologies (ordering instability, information cascades, functional collapse) as 'persistent topology-driven' effects 'revealed' by evidence across models, but supplies no equations, fitted parameters, predictions derived from inputs, or self-citations that function as load-bearing uniqueness theorems. No step reduces by construction to a prior definition or fit within the paper itself; the argument remains declarative and external to any closed loop of the form 'X is defined via Y which is measured from X'. Absence of empirical protocols is a supportability issue, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that individual-model safety properties are expected to compose into multi-agent safety, which it then challenges. No free parameters or invented entities are introduced.

axioms (1)

domain assumption safety properties of individual models will compose into safe multi-agent behavior
This is the assumption the position paper explicitly states the AI safety community holds and then argues is mistaken.

pith-pipeline@v0.9.0 · 5507 in / 1205 out tokens · 43524 ms · 2026-05-09T18:49:09.123432+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 11 canonical work pages · 5 internal anchors

[1]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=
[2]

On the Opportunities and Risks of Foundation Models

On the Opportunities and Risks of Foundation Models , author=. arXiv preprint arXiv:2108.07258 , year=

work page internal anchor Pith review arXiv
[3]

Advances in Neural Information Processing Systems , year=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , year=
[4]

2025 , note=

Multi-Agent Risks from Advanced AI , author=. 2025 , note=

2025
[5]

Review of Economic Studies , volume=

Bayesian Learning in Social Networks , author=. Review of Economic Studies , volume=
[6]

Advances in Complex Systems , volume=

Mixing Beliefs Among Interacting Agents , author=. Advances in Complex Systems , volume=
[7]

American Economic Journal: Microeconomics , volume=

Naive Learning in Social Networks and the Wisdom of Crowds , author=. American Economic Journal: Microeconomics , volume=
[8]

Journal of Artificial Societies and Social Simulation , volume=

Continuous Opinion Dynamics under Bounded Confidence: A Survey , author=. Journal of Artificial Societies and Social Simulation , volume=
[9]

arXiv preprint arXiv:2512.02682 , year=

Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions , author=. arXiv preprint arXiv:2512.02682 , year=

work page arXiv
[10]

Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[11]

Advances in Neural Information Processing Systems , year=

Training Compute-Optimal Large Language Models , author=. Advances in Neural Information Processing Systems , year=
[12]

arXiv preprint arXiv:2505.21503 , year=

Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making , author=. arXiv preprint arXiv:2505.21503 , year=

work page arXiv
[13]

arXiv preprint arXiv:2406.15492 , year=

On the Principles behind Opinion Dynamics in Multi-Agent Systems of Large Language Models , author=. arXiv preprint arXiv:2406.15492 , year=

work page arXiv
[14]

Journal of Political Economy , volume=

A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades , author=. Journal of Political Economy , volume=
[15]

Quarterly Journal of Economics , volume=

A Simple Model of Herd Behavior , author=. Quarterly Journal of Economics , volume=
[16]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. arXiv preprint arXiv:2306.05685 , year=

work page internal anchor Pith review arXiv
[17]

arXiv preprint arXiv:2403.02101 , year=

Can LLMs Critique and Iterate? Evaluation of LLM-as-a-Judge , author=. arXiv preprint arXiv:2403.02101 , year=

work page arXiv
[18]

ACM UIST , year=

Generative Agents: Interactive Simulacra of Human Behavior , author=. ACM UIST , year=
[19]

arXiv preprint arXiv:2402.01234 , year=

A Survey on Multi-Agent LLM Systems , author=. arXiv preprint arXiv:2402.01234 , year=

work page arXiv
[20]

Proceedings of the IEEE , volume=

Consensus and Cooperation in Networked Multi-Agent Systems , author=. Proceedings of the IEEE , volume=
[21]

Graph Theoretic Methods in Multiagent Networks , author=
[22]

Journal of the American Statistical Association , volume=

Reaching a Consensus , author=. Journal of the American Statistical Association , volume=
[23]

Distributed Consensus in Multi-vehicle Cooperative Control , author=
[24]

A Survey on LLM-as-a-Judge

A Survey on LLM-as-a-Judge , author=. arXiv preprint arXiv:2411.15594 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Advances in Neural Information Processing Systems , year=

Training Language Models to Follow Instructions with Human Feedback , author=. Advances in Neural Information Processing Systems , year=
[26]

Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback , author=. arXiv preprint arXiv:2212.08073 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Transactions on Machine Learning Research , year=

Emergent Abilities of Large Language Models , author=. Transactions on Machine Learning Research , year=
[28]

Proceedings of the 2017 ACM Conference on Economics and Computation , pages=

Fair Public Decision Making , author=. Proceedings of the 2017 ACM Conference on Economics and Computation , pages=. 2017 , organization=

2017
[29]

Artificial Intelligence , volume=

Learning Fair Divisions in the Generalized Cake Cutting Model , author=. Artificial Intelligence , volume=. 2024 , note=

2024
[30]

Advances in Neural Information Processing Systems , year=

Deep Reinforcement Learning from Human Preferences , author=. Advances in Neural Information Processing Systems , year=
[31]

Advances in Neural Information Processing Systems , volume=

Language Models are Few-Shot Learners , author=. Advances in Neural Information Processing Systems , volume=. 2020 , note=

2020
[32]

arXiv preprint arXiv:2406.11776 , year=

Improving Multi-Agent Debate with Sparse Communication Topology , author=. arXiv preprint arXiv:2406.11776 , year=

work page arXiv
[33]

Transactions on Machine Learning Research , year=

Holistic Evaluation of Language Models , author=. Transactions on Machine Learning Research , year=
[34]

Transactions on Machine Learning Research , year=

Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models , author=. Transactions on Machine Learning Research , year=
[35]

Proceedings of the Conference on Fairness, Accountability, and Transparency , pages=

Model Cards for Model Reporting , author=. Proceedings of the Conference on Fairness, Accountability, and Transparency , pages=. 2019 , organization=

2019