pith. machine review for the scientific record. sign in

arxiv: 2604.27132 · v1 · submitted 2026-04-29 · 💻 cs.AI

Recognition: unknown

TRUST: A Framework for Decentralized AI Service v.0.1

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:15 UTC · model grok-4.3

classification 💻 cs.AI
keywords decentralized AI auditingtrustworthy AImulti-agent systemschain of thought verificationconsensus mechanismscausal interaction graphsroot cause attributionsafety profitability theorem
0
0 comments X

The pith

TRUST decentralizes auditing of AI reasoning chains by breaking them into hierarchical graphs and aligning incentives so honest participants profit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TRUST as a framework for verifying reasoning in large models and multi-agent systems without relying on a central authority. It decomposes chain-of-thought steps into five levels of hierarchical directed acyclic graphs for parallel distributed checks, projects agent interactions onto causal graphs via the DAAN protocol for fault tracing, and runs a multi-tier voting system with stake weighting among automated checkers, language-model evaluators, and humans. A Safety-Profitability Theorem is established to show that honest auditors gain while malicious ones lose under up to 30 percent adversarial participation. Experiments report 72.4 percent accuracy across benchmarks, resilience to 20 percent corruption, and 70 percent root-cause attribution with token savings, plus human validation of the design. The approach targets high-stakes AI use cases by removing single points of failure and protecting proprietary reasoning traces through on-chain records and privacy segmentation.

Core claim

TRUST decomposes Chain-of-Thought reasoning into five abstraction levels via Hierarchical Directed Acyclic Graphs for distributed auditing, projects multi-agent interactions into Causal Interaction Graphs through the DAAN protocol for deterministic root-cause attribution, and employs a multi-tier consensus with stake-weighted voting among computational checkers, LLM evaluators, and human experts to guarantee correctness under 30 percent adversarial participation. The Safety-Profitability Theorem ensures honest auditors profit while malicious actors incur losses, with all decisions recorded on-chain and privacy preserved by segmentation. Empirical tests show 72.4 percent accuracy, resilience,

What carries the argument

Hierarchical Directed Acyclic Graphs (HDAGs) for decomposing reasoning, the DAAN protocol for mapping to Causal Interaction Graphs, and the multi-tier stake-weighted consensus among checkers, evaluators, and experts.

Load-bearing premise

The multi-tier consensus with stake-weighted voting guarantees correctness under 30 percent adversarial participation and the DAAN protocol projects interactions to causal graphs without loss of information for root-cause attribution.

What would settle it

An experiment in which 35 percent of participants act adversarially and the system either drops below 60 percent accuracy or allows a malicious actor to profit, violating the Safety-Profitability Theorem.

Figures

Figures reproduced from arXiv: 2604.27132 by Mohan Zhang, Pingzhi Li, Tianlong Chen, Yu-Chao Huang, Zhen Tan, Zhuo Zhang.

Figure 1
Figure 1. Figure 1: Comparison of sound clinical reasoning versus flawed reasoning that produces the correct view at source ↗
Figure 2
Figure 2. Figure 2: The dual decomposition engine of Trust. Linear Chain-of-Thought reasoning is mapped to Hierarchical Directed Acyclic Graphs (a), while Multi-Agent interactions are projected into Causal Interaction Graphs (b). Both representations enable parallel, privacy-preserving verification by distributed auditor networks. 2.1 Hierarchical Directed Acyclic Graphs (HDAGs) For singular Large Reasoning Models (LRMs) prod… view at source ↗
Figure 3
Figure 3. Figure 3: Example HDAG for a mathematical integration problem. Node color indicates the assigned view at source ↗
read the original abstract

Large Reasoning Models (LRMs) and Multi-Agent Systems (MAS) in high-stakes domains demand reliable verification, yet centralized approaches suffer four limitations: (1) Robustness, with single points of failure vulnerable to attacks and bias; (2) Scalability, as reasoning complexity creates bottlenecks; (3) Opacity, as hidden auditing erodes trust; and (4) Privacy, as exposed reasoning traces risk model theft. We introduce TRUST (Transparent, Robust, and Unified Services for Trustworthy AI), a decentralized framework with three innovations: (i) Hierarchical Directed Acyclic Graphs (HDAGs) that decompose Chain-of-Thought reasoning into five abstraction levels for parallel distributed auditing; (ii) the DAAN protocol, which projects multi-agent interactions into Causal Interaction Graphs (CIGs) for deterministic root-cause attribution; and (iii) a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation. We prove a Safety-Profitability Theorem ensuring honest auditors profit while malicious actors incur losses. All decisions are recorded on-chain, while privacy-by-design segmentation prevents reconstruction of proprietary logic. Across multiple LLMs and benchmarks, TRUST attains 72.4% accuracy (4-18% above baselines) and remains resilient against 20% corruption. DAAN reaches 70% root-cause attribution (vs. 54-63% for standard methods) with 60% token savings. Human studies validate the design (F1 = 0.89, Brier = 0.074). The framework supports (A1) decentralized auditing, (A2) tamper-proof leaderboards, (A3) trustless data annotation, and (A4) governed autonomous agents, pioneering decentralized AI auditing for safe, accountable deployment of reasoning-capable systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The paper introduces TRUST, a decentralized framework for auditing Large Reasoning Models and Multi-Agent Systems. It proposes three main innovations: Hierarchical Directed Acyclic Graphs (HDAGs) to decompose Chain-of-Thought reasoning into five abstraction levels for distributed auditing, the DAAN protocol that projects multi-agent interactions into Causal Interaction Graphs (CIGs) for root-cause attribution, and a multi-tier consensus mechanism involving computational checkers, LLM evaluators, and human experts using stake-weighted voting. The authors assert a proved Safety-Profitability Theorem that ensures honest auditors profit while malicious actors lose, and report empirical results of 72.4% accuracy (4-18% above baselines), 70% root-cause attribution (vs. 54-63% for baselines), 60% token savings, and resilience to 20% corruption, with human validation metrics (F1=0.89, Brier=0.074). The framework is positioned to support decentralized auditing, tamper-proof leaderboards, trustless annotation, and governed agents.

Significance. If the Safety-Profitability Theorem were rigorously derived and the performance claims validated with reproducible experiments, the work would offer a novel approach to addressing robustness, scalability, opacity, and privacy issues in AI auditing through decentralization and on-chain recording. The combination of HDAG decomposition with CIG-based attribution and hybrid consensus could advance trustworthy AI deployment in high-stakes domains. However, the manuscript provides no formal derivations, experimental protocols, or data, so the potential impact cannot be assessed at present.

major comments (4)
  1. [Abstract] Abstract: The Safety-Profitability Theorem is asserted as proved, ensuring honest auditors profit under 30% adversarial participation, but the manuscript contains no proof sketch, lemmas, utility functions for participants, or analysis of false-positive/negative rates under collusion or sybil attacks. This is load-bearing for the central claim.
  2. [Abstract] Abstract: Performance claims (72.4% accuracy, 70% root-cause attribution, resilience against 20% corruption, 60% token savings) are stated without any experimental setup, datasets, baselines, number of runs, error bars, or statistical tests. These numbers cannot be evaluated or reproduced from the given text.
  3. [Framework Description] Framework / DAAN protocol: The claim that projection of multi-agent traces to Causal Interaction Graphs enables deterministic root-cause attribution without loss of information is presented as an axiom, but no formal argument, information-theoretic bound, or ablation study shows that the five-level HDAG decomposition preserves causal information.
  4. [Consensus Mechanism] Consensus mechanism: The multi-tier stake-weighted voting is said to guarantee correctness under 30% adversarial participation, yet there is no explicit adversary model, game-theoretic analysis, or bounds on detection rates when adversaries control up to 30% of stake.
minor comments (2)
  1. [Abstract] Abstract: The human studies are summarized only by aggregate metrics (F1 = 0.89, Brier = 0.074) with no details on study design, participant expertise, or task description.
  2. [Notation and Definitions] Notation: The definitions of HDAGs, the five abstraction levels, and CIGs would benefit from explicit mathematical notation or pseudocode to support reproducibility.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript on the TRUST framework. We appreciate the recognition of its potential to address key challenges in decentralized AI auditing. We agree that the current version requires additional formal details and experimental documentation to substantiate the central claims. Below we respond point-by-point to the major comments and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The Safety-Profitability Theorem is asserted as proved, ensuring honest auditors profit under 30% adversarial participation, but the manuscript contains no proof sketch, lemmas, utility functions for participants, or analysis of false-positive/negative rates under collusion or sybil attacks. This is load-bearing for the central claim.

    Authors: We acknowledge that the Safety-Profitability Theorem is central to the framework and that the current manuscript states the theorem and its high-level implications without including the full derivation. This omission stems from length constraints in the initial v0.1 submission. In the revised manuscript we will add a dedicated Theoretical Analysis section containing a complete proof sketch, the utility functions for honest and malicious auditors, supporting lemmas on incentive compatibility, and an explicit analysis of false-positive and false-negative rates under collusion and Sybil attacks, all within the 30% adversarial participation bound. revision: yes

  2. Referee: [Abstract] Abstract: Performance claims (72.4% accuracy, 70% root-cause attribution, resilience against 20% corruption, 60% token savings) are stated without any experimental setup, datasets, baselines, number of runs, error bars, or statistical tests. These numbers cannot be evaluated or reproduced from the given text.

    Authors: We agree that the reported performance figures require full experimental documentation for reproducibility and evaluation. The numbers derive from our internal experiments across multiple LLMs and standard benchmarks, but the manuscript omits the protocol details. In the revision we will insert a comprehensive Experiments section that specifies the datasets, baselines, number of independent runs, error bars, and statistical tests. We will also release the associated code and data artifacts to enable independent verification. revision: yes

  3. Referee: [Framework Description] Framework / DAAN protocol: The claim that projection of multi-agent traces to Causal Interaction Graphs enables deterministic root-cause attribution without loss of information is presented as an axiom, but no formal argument, information-theoretic bound, or ablation study shows that the five-level HDAG decomposition preserves causal information.

    Authors: The DAAN protocol is designed so that the projection onto Causal Interaction Graphs preserves the necessary causal structure via the five-level HDAG decomposition. While the current text presents this property concisely, we accept that a formal justification is needed. We will expand the Framework Description section with a formal argument, including an information-theoretic bound demonstrating preservation of causal information, together with ablation studies that quantify the contribution of each HDAG level to root-cause attribution accuracy. revision: yes

  4. Referee: [Consensus Mechanism] Consensus mechanism: The multi-tier stake-weighted voting is said to guarantee correctness under 30% adversarial participation, yet there is no explicit adversary model, game-theoretic analysis, or bounds on detection rates when adversaries control up to 30% of stake.

    Authors: We recognize that the current description of the multi-tier consensus mechanism is high-level and lacks an explicit adversary model. In the revised manuscript we will augment the Consensus Mechanism section with a formal adversary model, a game-theoretic analysis of the stake-weighted voting incentives, and derived bounds on detection rates and overall correctness under adversarial control of up to 30% of the stake, building on Byzantine fault tolerance principles adapted to the hybrid checker-LLM-human setting. revision: yes

Circularity Check

1 steps flagged

Safety-Profitability Theorem reduces to the framework's own 30% adversarial consensus assumptions by construction

specific steps
  1. self definitional [Abstract]
    "We prove a Safety-Profitability Theorem ensuring honest auditors profit while malicious actors incur losses. ... a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation."

    The theorem's conclusion (honest profit, malicious loss) is identical to the built-in 'guarantees correctness' property of the stake-weighted voting rule at the 30% threshold. The claimed proof therefore reduces to the framework's own design assumptions without additional mathematical content or external adversary analysis.

full rationale

The paper's load-bearing theoretical result is the Safety-Profitability Theorem, which is asserted to follow from the multi-tier consensus mechanism. However, the mechanism is defined to 'guarantee correctness under 30% adversarial participation' via stake-weighted voting, and the theorem simply restates that honest participants profit while malicious ones lose under that same guarantee. No independent game-theoretic model, utility functions, or bounding lemmas are supplied; the profitability outcome is therefore entailed directly by the design choice rather than derived. Experimental accuracy and attribution figures are internal evaluations of the same unverified mechanism and do not break the circularity. This matches the self-definitional pattern exactly.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 3 invented entities

The central claims rest on several newly introduced structures and assumptions that lack independent evidence or prior validation in the abstract. The framework adds multiple free parameters and ad-hoc entities whose correctness is asserted rather than derived from external benchmarks.

free parameters (2)
  • 30% adversarial participation threshold
    Chosen value that the consensus mechanism and Safety-Profitability Theorem are stated to tolerate; directly affects all correctness and resilience claims.
  • stake-weighted voting parameters
    Weights assigned to computational checkers, LLM evaluators, and human experts; required for the multi-tier consensus to function as described.
axioms (2)
  • domain assumption Multi-tier consensus with stake-weighted voting guarantees correctness under up to 30% adversarial participation
    Invoked to support the Safety-Profitability Theorem and the 20% corruption resilience claim.
  • ad hoc to paper Projection of multi-agent interactions into Causal Interaction Graphs enables deterministic root-cause attribution
    Central assumption for the DAAN protocol's claimed 70% attribution accuracy and 60% token savings.
invented entities (3)
  • Hierarchical Directed Acyclic Graphs (HDAGs) no independent evidence
    purpose: Decompose Chain-of-Thought reasoning into five abstraction levels for parallel distributed auditing
    New data structure introduced by the paper with no prior independent evidence cited.
  • DAAN protocol no independent evidence
    purpose: Project multi-agent interactions into Causal Interaction Graphs for deterministic root-cause attribution
    Novel protocol defined in the paper without external validation.
  • Causal Interaction Graphs (CIGs) no independent evidence
    purpose: Represent multi-agent interactions to enable root-cause analysis
    New graph representation introduced to support DAAN.

pith-pipeline@v0.9.0 · 5654 in / 2157 out tokens · 106755 ms · 2026-05-07T08:15:49.625027+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 24 canonical work pages · 12 internal anchors

  1. [1]

    Artificial intelligence risk management framework (ai rmf 1.0)

    AI, N. Artificial intelligence risk management framework (ai rmf 1.0). URL: https://nvlpubs. nist. gov/nistpubs/ai/nist. ai, pp.\ 100--1, 2023

  2. [2]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

  3. [3]

    M., Gebru, T., McMillan-Major, A., and Shmitchell, S

    Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp.\ 610--623, 2021

  4. [4]

    A., MacKnight, R., Kline, B., and Gomes, G

    Boiko, D. A., MacKnight, R., Kline, B., and Gomes, G. Autonomous chemical research with large language models. Nature, 624 0 (7992): 0 570--578, 2023

  5. [5]

    The foundation model transparency index

    Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D., and Liang, P. The foundation model transparency index. arXiv preprint arXiv:2310.12941, 2023

  6. [6]

    Extracting training data from large language models

    Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., et al. Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), pp.\ 2633--2650, 2021

  7. [7]

    Practical byzantine fault tolerance

    Castro, M., Liskov, B., et al. Practical byzantine fault tolerance. In OsDI, volume 99, pp.\ 173--186, 1999

  8. [8]

    Why Do Multi-Agent LLM Systems Fail?

    Cemri, M., Pan, M. Z., Yang, S., Agrawal, L. A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., Zaharia, M., Gonzalez, J. E., and Stoica, I. Why do multi-agent llm systems fail? arXiv preprint arXiv:2503.13657, 2025. URL https://arxiv.org/abs/2503.13657

  9. [9]

    M., and Aletras, N

    Chalkidis, I., Jana, A., Hartung, D., Bommarito, M., Androutsopoulos, I., Katz, D. M., and Aletras, N. Lexglue: A benchmark dataset for legal language understanding in english. arXiv preprint arXiv:2110.00976, 2021

  10. [10]

    ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

    Chan, C.-M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., Fu, J., and Liu, Z. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023

  11. [11]

    H., Chen, S., Liu, Z., Jiang, F., and Wang, B

    Chen, G. H., Chen, S., Liu, Z., Jiang, F., and Wang, B. Humans or llms as the judge? a study on judgement biases. arXiv preprint arXiv:2402.10669, 2024

  12. [12]

    Laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts

    COM, E. Laying down harmonised rules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts. Proposal for a regulation of the European parliament and of the council, 2021

  13. [13]

    B., and Mordatch, I

    Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., and Mordatch, I. Improving factuality and reasoning in language models through multiagent debate. In Forty-first International Conference on Machine Learning, 2023

  14. [14]

    A Survey on LLM-as-a-Judge

    Gu, J., Jiang, X., Shi, Z., Tan, H., Zhai, X., Xu, C., Li, W., Shen, Y., Ma, S., Liu, H., Wang, S., Zhang, K., Wang, Y., Gao, W., Ni, L., and Guo, J. A survey on LLM -as-a-judge. arXiv preprint arXiv:2411.15594, 2024. URL https://arxiv.org/abs/2411.15594

  15. [15]

    Deepseek-r1 incentivizes reasoning in llms through reinforcement learning

    Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645 0 (8081): 0 633--638, 2025

  16. [16]

    D3: Dissecting multi-agent debate for LLM evaluation, 2024

    Harrasse, A., Roch, R., Valentini, E., Bontempi, G., Schmidhuber, J., et al. D3: Dissecting multi-agent debate for LLM evaluation, 2024. URL https://arxiv.org/abs/2410.04663

  17. [17]

    Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S. K. S., Lin, Z., et al. Metagpt: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations, 2023

  18. [18]

    OpenAI o1 System Card

    Jaech, A., Kalai, A., Lerer, A., Richardson, A., El-Kishky, A., Low, A., Helyar, A., Madry, A., Beutel, A., Carney, A., et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024

  19. [19]

    Fault localization using interventional causal learning for cloud-native applications

    Jha, S., Rios, J., Abe, N., Bagehorn, F., and Shwartz, L. Fault localization using interventional causal learning for cloud-native applications. In 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S), pp.\ 141--147. IEEE, 2024

  20. [20]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., Narasimhan, K., Vaswani, A., Jiang, C., R e gis, Y., and Hsu, D. SWE -bench: Can language models resolve real-world GitHub issues?, 2024. URL https://arxiv.org/abs/2310.06770

  21. [21]

    Cot-icl lab: A synthetic framework for studying chain-of-thought learning from in-context demonstrations

    Kothapalli, V., Firooz, H., and Sanjabi, M. Cot-icl lab: A synthetic framework for studying chain-of-thought learning from in-context demonstrations. arXiv preprint arXiv:2502.15132, 2025

  22. [22]

    The byzantine generals problem

    Lamport, L., Shostak, R., and Pease, M. The byzantine generals problem. In Concurrency: the works of leslie lamport, pp.\ 203--226. 2019

  23. [23]

    Measuring Faithfulness in Chain-of-Thought Reasoning

    Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., et al. Measuring faithfulness in chain-of-thought reasoning. arXiv preprint arXiv:2307.13702, 2023

  24. [24]

    Holistic Evaluation of Language Models

    Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., Kumar, A., et al. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022

  25. [25]

    Let's verify step by step

    Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., and Cobbe, K. Let's verify step by step. In The Twelfth International Conference on Learning Representations, 2023

  26. [26]

    Deconstructing long chain-of-thought: A structured reasoning optimization framework for long cot distillation

    Luo, Y., Song, Y., Zhang, X., Liu, J., Wang, W., Chen, G., Su, W., and Zheng, B. Deconstructing long chain-of-thought: A structured reasoning optimization framework for long cot distillation. arXiv preprint arXiv:2503.16385, 2025

  27. [27]

    D., and Gebru, T

    Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru, T. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pp.\ 220--229, 2019

  28. [28]

    Feder Cooper, Daphne Ippolito, Christopher A

    Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., Choquette-Choo, C. A., Wallace, E., Tram \`e r, F., and Lee, K. Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035, 2023

  29. [29]

    GPT-4 Technical Report

    OpenAI . GPT -4 technical report, 2023. URL https://arxiv.org/abs/2303.08774

  30. [30]

    Llm evaluators recognize and favor their own generations

    Panickssery, A., Bowman, S., and Feng, S. Llm evaluators recognize and favor their own generations. Advances in Neural Information Processing Systems, 37: 0 68772--68802, 2024

  31. [31]

    Ignore Previous Prompt: Attack Techniques For Language Models

    Perez, F. and Ribeiro, I. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022

  32. [32]

    https://doi.org/10.48550/ arXiv.2508.05687

    Reid, A., O'Callaghan, S., Carroll, L., and Caetano, T. Risk analysis techniques for governed llm-based multi-agent systems. arXiv preprint arXiv:2508.05687, 2025

  33. [33]

    and Jazayeri, M

    Sarafyazd, M. and Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science, 364 0 (6441): 0 eaav8911, 2019. doi:10.1126/science.aav8911. URL https://www.science.org/doi/abs/10.1126/science.aav8911

  34. [34]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lombrozo, T., Zettlemoyer, L., Liang, P., Hwang, J., Lai, C., Tsvetkov, Y., Ranzato, M., and Kim, Y. Toolformer: Language models can teach themselves to use tools, 2023. URL https://arxiv.org/abs/2302.04761

  35. [35]

    Judging the judges: A systematic study of position bias in LLM -as-a-judge

    Shi, L., Ma, C., Liang, W., Diao, X., Ma, W., and Vosoughi, S. Judging the judges: A systematic study of position bias in LLM -as-a-judge. In Inui, K., Sakti, S., Wang, H., Wong, D. F., Bhattacharyya, P., Banerjee, B., Ekbal, A., Chakraborty, T., and Singh, D. P. (eds.), Proceedings of the 14th International Joint Conference on Natural Language Processing...

  36. [36]

    S., Wei, J., Chung, H

    Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., et al. Large language models encode clinical knowledge. Nature, 620 0 (7972): 0 172--180, 2023

  37. [37]

    Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting

    Turpin, M., Michael, J., Perez, E., and Bowman, S. Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36: 0 74952--74965, 2023

  38. [38]

    Wang, N., Yang, H., and Wang, C. D. Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets. arXiv preprint arXiv:2310.04793, 2023

  39. [39]

    V., Zhou, D., et al

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022

  40. [40]

    Cloud atlas: Efficient fault localization for cloud systems using language models and causal insight, 2024

    Xie, Z., Zheng, Y., Ottens, L., Zhang, K., Kozyrakis, C., and Mace, J. Cloud atlas: Efficient fault localization for cloud systems using language models and causal insight, 2024. URL https://arxiv.org/abs/2407.08694

  41. [41]

    Tree of thoughts: Deliberate problem solving with large language models

    Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., and Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36: 0 11809--11822, 2023 a

  42. [42]

    React: Synergizing reasoning and acting in language models

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023 b

  43. [43]

    Justice or prejudice? quantifying biases in llm-as-a-judge

    Ye, J., Wang, Y., Huang, Y., Chen, D., Zhang, Q., Moniz, N., Gao, T., Geyer, W., Huang, C., Chen, P.-Y., et al. Justice or prejudice? quantifying biases in llm-as-a-judge. arXiv preprint arXiv:2410.02736, 2024

  44. [44]

    Judging llm-as-a-judge with mt-bench and chatbot arena

    Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems, 36: 0 46595--46623, 2023

  45. [45]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023

  46. [46]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...