pith. machine review for the scientific record. sign in

arxiv: 2604.17989 · v1 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

AIT Academy: Cultivating the Complete Agent with a Confucian Three-Domain Curriculum

Jiaqi Li, Lidong Zhai, Lvyang Zhang, Wen Lu, Yang Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:05 UTC · model grok-4.3

classification 💻 cs.AI
keywords AI agent curriculummulti-domain trainingConfucian educationagent capability developmentsecurity awarenesssocial reasoningcurriculum scheduling
0
0 comments X

The pith

AIT Academy organizes AI agent training into three domains of knowledge to overcome specialization deficits in current systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that AI agents remain incomplete specialists because no curriculum theory exists to guide their full development across intelligent behaviors. It proposes the AIT Academy framework, which divides training into three domains drawn from Kagan's Three Cultures and UNESCO classifications, then reinterprets the Confucian Six Arts as trainable behavioral archetypes within those domains. Representative training environments are created for each domain, and experiments on multiple LLMs show measurable gains when the curriculum is applied, including higher security scores from weakest-first scheduling and better social reasoning from attribution modeling. A new diagnostic pattern emerges where over-focus on one domain creates calibration failures on out-of-distribution cases.

Core claim

The AIT Academy supplies a complete curriculum by mapping agent capabilities onto three domains: Natural Science and Technical Reasoning, Humanities and Creative Expression, and Social Science and Ethical Reasoning. The Confucian Six Arts are reframed as behavioral archetypes that guide capability development inside each domain. Training grounds such as the ClawdGO Security Dojo, Athen's Academy, and Alt Mirage Stage instantiate the domains, and controlled experiments produce a 15.9-point security improvement under weakest-first scheduling plus a 7-percentage-point social reasoning gain under principled attribution modeling. The same multi-domain lens reveals Security Awareness Calibration 1

What carries the argument

The AIT Academy tripartite curriculum, which partitions agent development into three knowledge domains and maps Confucian behavioral archetypes onto trainable capabilities inside each domain.

If this is right

  • Weakest-first curriculum scheduling raises security capability scores by 15.9 points across tested backbones.
  • Principled attribution modeling lifts social reasoning performance by 7 percentage points.
  • Multi-domain training surfaces Security Awareness Calibration Pathology that single-domain approaches miss.
  • The framework supplies a diagnostic lens for capability gaps that remain invisible inside any one domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same three-domain structure could be used to design evaluation suites that test agents on balanced rather than isolated capabilities.
  • If the archetypes prove stable, they might serve as a template for aligning future agent training with human educational traditions without requiring new data sources.
  • Cross-domain calibration failures suggest that existing safety benchmarks may need revision to include out-of-distribution tests drawn from the other two domains.

Load-bearing premise

That the tripartite split from Kagan's Three Cultures together with a reinterpretation of the Confucian Six Arts gives a complete and principled description of what any fully developed agent must know, be, and do.

What would settle it

Train a set of agents under the three-domain curriculum and compare them on integrated tasks that require simultaneous technical, creative, and ethical reasoning; if they show no consistent advantage over single-domain specialists, the completeness claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.17989 by Jiaqi Li, Lidong Zhai, Lvyang Zhang, Wen Lu, Yang Zhao.

Figure 1
Figure 1. Figure 1: AIT Academy overall pipeline: from raw agent through three parallel training grounds to cross-domain integration and the complete agent. III. The AIT Framework A. Design Philosophy: Cultivating the Whole Agent The AIT Academy’s design philosophy rests on a single conviction: the goal of agent development is the complete agent, not the highest-scoring specialist. Like the Confu￾cian scholar-practitioner who… view at source ↗
Figure 2
Figure 2. Figure 2: AIT three-domain curriculum framework and its grounding in the Confucian Six Arts. Domain I — Natural Science and Technical Reason￾ing encompasses the capabilities required for systematic, evidence-grounded intervention in the physical and com￾putational world. These include the capacity to reason about causal mechanisms, to identify and neutralize adver￾sarial conditions, to apply formal inference to stru… view at source ↗
Figure 3
Figure 3. Figure 3: L0–L9 Cultivation Path: four developmental stages across three domains, with increasing cross-domain integration toward mastery. • 礼 (lǐ, Rites) — the capacity for norm-aware participa￾tion in collective social structures. Rites encoded the invisible grammar of social coordination — the shared protocols that allow individuals to act collectively without explicit negotiation. In agent terms: role negotiatio… view at source ↗
Figure 4
Figure 4. Figure 4: ASAT training loop with weakest-first scheduling (left) and CSMA four-layer memory hierarchy (right). A. Security Dojo: ClawdGO ClawdGO operationalizes Domain I through two tightly coupled mechanisms ( [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

What does it mean to give an AI agent a complete education? Current agent development produces specialists systems optimized for a single capability dimension, whether tool use, code generation, or security awareness that exhibit predictable deficits wherever they were not trained. We argue this pattern reflects a structural absence: there is no curriculum theory for agents, no principled account of what a fully developed agent should know, be, and be able to do across the full scope of intelligent behavior. This paper introduces the AIT Academy (Agents Institute of Technology Academy), a curriculum framework for cultivating AI agents across the tripartite structure of human knowledge. Grounded in Kagan's Three Cultures and UNESCO ISCED-F 2013, AIT organizes agent capability development into three domains: Natural Science and Technical Reasoning (Domain I), Humanities and Creative Expression (Domain II), and Social Science and Ethical Reasoning (Domain III). The Confucian Six Arts (liuyi) a 2,500-year-old holistic education system are reinterpreted as behavioral archetypes that map directly onto trainable agent capabilities within each domain. Three representative training grounds instantiate the framework across multiple backbone LLMs: the ClawdGO Security Dojo (Domain I), Athen's Academy (Domain II), and the Alt Mirage Stage (Domain III). Experiments demonstrate a 15.9-point improvement in security capability scores under weakest-first curriculum scheduling, and a 7-percentage-point gain in social reasoning performance under principled attribution modeling. A cross-domain finding Security Awareness Calibration Pathology (SACP), in which over-trained Domain I agents fail on out-of-distribution evaluation illustrates the diagnostic value of a multi-domain perspective unavailable to any single-domain framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces the AIT Academy, a curriculum framework for AI agents organized into three domains—Natural Science and Technical Reasoning (Domain I), Humanities and Creative Expression (Domain II), and Social Science and Ethical Reasoning (Domain III)—grounded in Kagan's Three Cultures, UNESCO ISCED-F 2013, and a reinterpretation of the Confucian Six Arts as behavioral archetypes. It describes three training environments (ClawdGO Security Dojo, Athen's Academy, Alt Mirage Stage) and reports experimental results across backbone LLMs, including a 15.9-point improvement in security capability scores under weakest-first scheduling and a 7-percentage-point gain in social reasoning under principled attribution modeling, along with the identification of Security Awareness Calibration Pathology (SACP) as a cross-domain diagnostic observation.

Significance. If the empirical claims hold under rigorous controls, the work could offer a structured alternative to single-domain agent specialization by supplying a philosophically motivated curriculum theory. The cross-domain perspective and SACP observation provide a potentially useful diagnostic lens. The explicit mapping of historical educational systems to trainable agent capabilities is a distinctive framing that could stimulate further research on holistic agent development, though its value depends on demonstrating that the tripartite division adds explanatory power beyond generic curriculum techniques.

major comments (3)
  1. [Abstract] Abstract: The reported 15.9-point improvement in security capability scores and 7-percentage-point gain in social reasoning performance are presented without baselines, control conditions, statistical tests, exact metrics, sample sizes, or exclusion criteria. These omissions make it impossible to determine whether the gains are attributable to the proposed tripartite curriculum, the scheduling/attribution methods, or other factors.
  2. [Abstract] Abstract and experimental description: The SACP finding is introduced as a cross-domain pathology in which over-trained Domain I agents fail on out-of-distribution evaluation, yet no data tables, quantitative characterization, or controlled comparison to single-domain training is supplied. This leaves the claim that a multi-domain perspective is diagnostically superior unsupported by evidence.
  3. [Abstract] Abstract: The tripartite division is asserted to supply a 'principled and complete account' of agent capabilities on the basis of citations to Kagan, UNESCO ISCED-F 2013, and the Confucian Six Arts, but no derivation, coverage argument, or ablation against alternative partitions (e.g., those emphasizing tool-use or long-horizon planning) is provided to show the mapping is non-arbitrary for LLMs.
minor comments (1)
  1. [Abstract] Abstract: 'ClawdGO' appears to be a typographical variant of a common model name and should be standardized for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential of a philosophically grounded curriculum framework for AI agents. We agree that the abstract requires greater self-containment on empirical details and further elaboration on the framework's justification. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The reported 15.9-point improvement in security capability scores and 7-percentage-point gain in social reasoning performance are presented without baselines, control conditions, statistical tests, exact metrics, sample sizes, or exclusion criteria. These omissions make it impossible to determine whether the gains are attributable to the proposed tripartite curriculum, the scheduling/attribution methods, or other factors.

    Authors: We acknowledge that the abstract, as a concise summary, omits these methodological specifics. The full manuscript details the experimental design in Section 4, including baselines (vanilla fine-tuning and single-domain training), control conditions (weakest-first scheduling versus standard ordering), statistical tests (t-tests with reported p-values), exact metrics (security capability as averaged task performance), sample sizes (500 evaluations per condition), and exclusion criteria (agents below 50% baseline proficiency). To address the referee's point, we will revise the abstract to include brief references to these elements and direct readers to the relevant sections, improving verifiability without altering the reported gains. revision: yes

  2. Referee: [Abstract] Abstract and experimental description: The SACP finding is introduced as a cross-domain pathology in which over-trained Domain I agents fail on out-of-distribution evaluation, yet no data tables, quantitative characterization, or controlled comparison to single-domain training is supplied. This leaves the claim that a multi-domain perspective is diagnostically superior unsupported by evidence.

    Authors: The referee correctly identifies that the abstract introduces SACP without accompanying quantitative support. The manuscript provides this in the results section through tables and figures quantifying the pathology (e.g., accuracy degradation on OOD tasks for over-trained agents) and controlled comparisons demonstrating lower incidence under the tripartite curriculum versus single-domain baselines. We will revise the abstract to incorporate a concise quantitative summary and reference to the supporting table or figure, thereby substantiating the diagnostic advantage of the multi-domain view. revision: yes

  3. Referee: [Abstract] Abstract: The tripartite division is asserted to supply a 'principled and complete account' of agent capabilities on the basis of citations to Kagan, UNESCO ISCED-F 2013, and the Confucian Six Arts, but no derivation, coverage argument, or ablation against alternative partitions (e.g., those emphasizing tool-use or long-horizon planning) is provided to show the mapping is non-arbitrary for LLMs.

    Authors: We appreciate the call for stronger justification of the tripartite structure. The domains are explicitly derived from Kagan's Three Cultures and aligned with UNESCO ISCED-F 2013 classifications, with the Confucian Six Arts reinterpreted as behavioral archetypes and mapped to agent capabilities via a table in Section 3. We agree that no ablation against alternative partitions (such as tool-use or planning-centric divisions) is performed. In revision, we will expand the introduction and discussion with a coverage argument based on the cited sources to explain the partition's scope, while noting the lack of empirical ablations as a limitation and direction for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: framework proposed from external sources with independent empirical demonstrations

full rationale

The paper introduces the AIT Academy tripartite curriculum by direct reference to Kagan's Three Cultures, UNESCO ISCED-F 2013, and a reinterpretation of the Confucian Six Arts; these are external citations, not self-citations or self-definitions. The reported gains (15.9-point security improvement under weakest-first scheduling, 7-point social-reasoning gain under attribution modeling) are presented as experimental outcomes measured on the instantiated training grounds, not as quantities derived from or forced by the framework itself. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear. The structure is offered as an organizing proposal whose completeness is justified by the cited philosophical sources rather than by tautological reduction to its own inputs. The SACP observation is a post-hoc diagnostic finding, not a circular confirmation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the tripartite knowledge structure is both complete and directly mappable to trainable agent capabilities; no free parameters are introduced in the abstract, and the only invented entity is the diagnostic label SACP.

axioms (1)
  • domain assumption Kagan's Three Cultures together with UNESCO ISCED-F 2013 supply a complete and principled partition of the knowledge an agent must acquire.
    Invoked to define the three domains and to claim the framework fills a structural absence in agent development.
invented entities (1)
  • Security Awareness Calibration Pathology (SACP) no independent evidence
    purpose: Label for the observed failure mode in which Domain-I-overtrained agents degrade on out-of-distribution tasks.
    Introduced as a cross-domain diagnostic finding; no independent falsifiable prediction or external evidence is supplied.

pith-pipeline@v0.9.0 · 5612 in / 1538 out tokens · 53861 ms · 2026-05-10T05:05:10.472941+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 29 canonical work pages · 11 internal anchors

  1. [1]

    J. Liu, X. Yu et al., ”AgentBench: Evaluating LLMs as Agents,” in Proc. 12th Int. Conf. Learning Representations (ICLR), 2024. arXiv:2308.03688

  2. [2]

    C. E. Jimenez, J. Yang, A. Wettig, S. Yao et al., ”SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” in Proc. ICLR 2024 (Oral), 2024. arXiv:2310.06770

  3. [3]

    Measuring Massive Multitask Language Understanding

    D. Hendrycks, C. Burns, S. Basart et al., ”Measuring Massive Multitask Language Understanding,” in Proc. 9th ICLR, 2021. arXiv:2009.03300

  4. [4]

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    A. Srivastava et al., ”Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models,” Trans. Mach. Learning Research, 2023. arXiv:2206.04615

  5. [5]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    P. Clark et al., ”Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge,” arXiv preprint, 2018. arXiv:1803.05457

  6. [6]

    arXiv:2504.21433

    NGENT Team, ”NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence,” arXiv preprint, 2025. arXiv:2504.21433

  7. [7]

    Q. Wu, G. Bansal, J. Zhang et al., ”AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation,” arXiv preprint, 2023. arXiv:2308.08155

  8. [8]

    S. Hong, X. Zheng, J. Chen et al., ”MetaGPT: Meta Program- ming for a Multi-Agent Collaborative Framework,” in Proc. 12th ICLR, 2024. arXiv:2308.00352

  9. [9]

    Kagan, The Three Cultures: Natural Sciences, Social Sciences, and the Humanities in the 21st Century

    J. Kagan, The Three Cultures: Natural Sciences, Social Sciences, and the Humanities in the 21st Century. Cambridge, UK: Cambridge University Press, 2009

  10. [10]

    Mon- treal: UNESCO Institute for Statistics, 2014

    UNESCO, International Standard Classification of Education: Fields of Education and Training 2013 (ISCED-F 2013). Mon- treal: UNESCO Institute for Statistics, 2014

  11. [11]

    GAIA: a benchmark for General AI Assistants

    G. Mialon, C. Fourrier et al., ”GAIA: A Benchmark for General AI Assistants,” arXiv preprint, 2023. arXiv:2311.12983

  12. [12]

    S. Zhou, F. F. Xu, H. Zhu et al., ”WebArena: A Realistic Web Environment for Building Autonomous Agents,” in Proc. 12th ICLR, 2024. arXiv:2307.13854

  13. [13]

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    T. Xie et al., ”OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments,” in Proc. NeurIPS, 2024. arXiv:2404.07972

  14. [14]

    $\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    S. Yao et al., ” τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains,” arXiv preprint, 2024. arXiv:2406.12045

  15. [15]

    Beyond accuracy: A multi-dimensional framework for evaluating enterprise agentic AI systems

    H. Kapoor et al., ”Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems,” arXiv preprint, 2025. arXiv:2511.14136

  16. [16]

    J. Qi, D. Liu et al., ”WebRL: Training LLM Web Agents via Self- Evolving Online Curriculum Reinforcement Learning,” arXiv preprint, 2024. arXiv:2411.02337

  17. [17]

    Agent-R1 Team, ”Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning,” arXiv preprint,

  18. [18]

    The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

    Y. Hua et al., ”The Landscape of Agentic Reinforce- ment Learning for LLMs: A Survey,” arXiv preprint, 2025. arXiv:2509.02547

  19. [19]

    W. Chen, Y. Su et al., ”AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,” in Proc. 12th ICLR, 2024. arXiv:2308.10848

  20. [20]

    arXiv preprint arXiv:2505.21116 , year=

    X. Ye et al., ”Creativity in LLM-based Multi-Agent Systems: A Survey,” arXiv preprint, 2025. arXiv:2505.21116

  21. [21]

    arXiv preprint arXiv:2508.04652 , year=

    LLM Collaboration Team, ”LLM Collaboration With Multi- Agent Reinforcement Learning,” arXiv preprint, 2025. arXiv:2508.04652

  22. [22]

    Y. Chen, J. Zheng et al., ”ToMBench: Benchmarking Theory of Mind in Large Language Models,” in Proc. 62nd Annual Meeting of the ACL, 2024. arXiv:2402.15052

  23. [23]

    Werewolf arena: A case study in llm evaluation via social deduction, 2024

    P. Bailis, J. Friedhoff, G. Chen, ”Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction,” arXiv preprint, 2024. arXiv:2407.13943

  24. [24]

    Sclar, S

    M. Sclar, S. Kumar, P. West, A. Suhr, Y. Choi, and Y. Tsvetkov, ”Minding Language Models’ (Lack of) Theory of Mind: A Plug- and-Play Multi-Character Belief Tracker,” in Proc. ACL, 2023. arXiv:2306.00924

  25. [25]

    org/abs/1803.04585

    D. Manheim and S. Garrabrant, ”Categorizing Variants of Goodhart’s Law,” arXiv preprint, 2018. arXiv:1803.04585

  26. [26]

    H. H. Kelley, ”Attribution theory in social psychology,” in Nebraska Symposium on Motivation, vol. 15. Lincoln, NE: University of Nebraska Press, 1967, pp. 192–238

  27. [27]

    H. H. Kelley, ”The processes of causal attribution,” Amer- ican Psychologist, vol. 28, no. 2, pp. 107–128, 1973. doi: 10.1037/h0034225

  28. [28]

    L. S. Shapley, ”A value for n-person games,” in Contributions to the Theory of Games, vol. 2, H. W. Kuhn and A. W. Tucker, Eds. Princeton, NJ: Princeton University Press, 1953, pp. 307– 317

  29. [29]

    arXiv:2504.20965

    AegisLLM Team, ”AegisLLM: Scaling Agentic Systems for Self- Reflective Defense in LLM Security,” arXiv preprint, 2025. arXiv:2504.20965

  30. [30]

    ARLAS: Adversarial reinforcement learning for LLM agent safety,

    ARLAS Team, ”Adversarial Reinforcement Learning for Large Language Model Agent Safety,” arXiv preprint, 2025. arXiv:2510.05442

  31. [31]

    Hou, ”Natural Intelligence, Not Artificial: A Confu- cian Reframing of Generative AI in Higher Education,” Ethics and Education, vol

    Y. Hou, ”Natural Intelligence, Not Artificial: A Confu- cian Reframing of Generative AI in Higher Education,” Ethics and Education, vol. 21, no. 1, pp. 73–91, 2026. doi: 10.1080/17449642.2026.2629430

  32. [32]

    Tan, ”Digital Confucius? Exploring the Implications of Artificial Intelligence in Spiritual Education,” Connec- tion Science, vol

    C. Tan, ”Digital Confucius? Exploring the Implications of Artificial Intelligence in Spiritual Education,” Connec- tion Science, vol. 32, no. 3, pp. 280–291, 2020. doi: 10.1080/09540091.2019.1709045

  33. [33]

    Fung and H

    P. Fung and H. Etienne, ”Confucius, cyberpunk and Mr. Science: comparing AI ethics principles between China and the EU,” AI and Ethics, vol. 3, no. 2, pp. 505–511, 2023. doi: 10.1007/s43681- 022-00180-6. arXiv:2111.07555

  34. [34]

    L. Zhai, Z. Qiu, L. Zhang, J. Li, Y. Wang, W. Lu, X. Guo, and G. Sun, ”The Athenian Academy: A Seven-Layer Archi- tecture Model for Multi-Agent Systems,” arXiv preprint, 2025. arXiv:2504.12735