Recognition: unknown
AIT Academy: Cultivating the Complete Agent with a Confucian Three-Domain Curriculum
Pith reviewed 2026-05-10 05:05 UTC · model grok-4.3
The pith
AIT Academy organizes AI agent training into three domains of knowledge to overcome specialization deficits in current systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The AIT Academy supplies a complete curriculum by mapping agent capabilities onto three domains: Natural Science and Technical Reasoning, Humanities and Creative Expression, and Social Science and Ethical Reasoning. The Confucian Six Arts are reframed as behavioral archetypes that guide capability development inside each domain. Training grounds such as the ClawdGO Security Dojo, Athen's Academy, and Alt Mirage Stage instantiate the domains, and controlled experiments produce a 15.9-point security improvement under weakest-first scheduling plus a 7-percentage-point social reasoning gain under principled attribution modeling. The same multi-domain lens reveals Security Awareness Calibration 1
What carries the argument
The AIT Academy tripartite curriculum, which partitions agent development into three knowledge domains and maps Confucian behavioral archetypes onto trainable capabilities inside each domain.
If this is right
- Weakest-first curriculum scheduling raises security capability scores by 15.9 points across tested backbones.
- Principled attribution modeling lifts social reasoning performance by 7 percentage points.
- Multi-domain training surfaces Security Awareness Calibration Pathology that single-domain approaches miss.
- The framework supplies a diagnostic lens for capability gaps that remain invisible inside any one domain.
Where Pith is reading between the lines
- The same three-domain structure could be used to design evaluation suites that test agents on balanced rather than isolated capabilities.
- If the archetypes prove stable, they might serve as a template for aligning future agent training with human educational traditions without requiring new data sources.
- Cross-domain calibration failures suggest that existing safety benchmarks may need revision to include out-of-distribution tests drawn from the other two domains.
Load-bearing premise
That the tripartite split from Kagan's Three Cultures together with a reinterpretation of the Confucian Six Arts gives a complete and principled description of what any fully developed agent must know, be, and do.
What would settle it
Train a set of agents under the three-domain curriculum and compare them on integrated tasks that require simultaneous technical, creative, and ethical reasoning; if they show no consistent advantage over single-domain specialists, the completeness claim would be refuted.
Figures
read the original abstract
What does it mean to give an AI agent a complete education? Current agent development produces specialists systems optimized for a single capability dimension, whether tool use, code generation, or security awareness that exhibit predictable deficits wherever they were not trained. We argue this pattern reflects a structural absence: there is no curriculum theory for agents, no principled account of what a fully developed agent should know, be, and be able to do across the full scope of intelligent behavior. This paper introduces the AIT Academy (Agents Institute of Technology Academy), a curriculum framework for cultivating AI agents across the tripartite structure of human knowledge. Grounded in Kagan's Three Cultures and UNESCO ISCED-F 2013, AIT organizes agent capability development into three domains: Natural Science and Technical Reasoning (Domain I), Humanities and Creative Expression (Domain II), and Social Science and Ethical Reasoning (Domain III). The Confucian Six Arts (liuyi) a 2,500-year-old holistic education system are reinterpreted as behavioral archetypes that map directly onto trainable agent capabilities within each domain. Three representative training grounds instantiate the framework across multiple backbone LLMs: the ClawdGO Security Dojo (Domain I), Athen's Academy (Domain II), and the Alt Mirage Stage (Domain III). Experiments demonstrate a 15.9-point improvement in security capability scores under weakest-first curriculum scheduling, and a 7-percentage-point gain in social reasoning performance under principled attribution modeling. A cross-domain finding Security Awareness Calibration Pathology (SACP), in which over-trained Domain I agents fail on out-of-distribution evaluation illustrates the diagnostic value of a multi-domain perspective unavailable to any single-domain framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the AIT Academy, a curriculum framework for AI agents organized into three domains—Natural Science and Technical Reasoning (Domain I), Humanities and Creative Expression (Domain II), and Social Science and Ethical Reasoning (Domain III)—grounded in Kagan's Three Cultures, UNESCO ISCED-F 2013, and a reinterpretation of the Confucian Six Arts as behavioral archetypes. It describes three training environments (ClawdGO Security Dojo, Athen's Academy, Alt Mirage Stage) and reports experimental results across backbone LLMs, including a 15.9-point improvement in security capability scores under weakest-first scheduling and a 7-percentage-point gain in social reasoning under principled attribution modeling, along with the identification of Security Awareness Calibration Pathology (SACP) as a cross-domain diagnostic observation.
Significance. If the empirical claims hold under rigorous controls, the work could offer a structured alternative to single-domain agent specialization by supplying a philosophically motivated curriculum theory. The cross-domain perspective and SACP observation provide a potentially useful diagnostic lens. The explicit mapping of historical educational systems to trainable agent capabilities is a distinctive framing that could stimulate further research on holistic agent development, though its value depends on demonstrating that the tripartite division adds explanatory power beyond generic curriculum techniques.
major comments (3)
- [Abstract] Abstract: The reported 15.9-point improvement in security capability scores and 7-percentage-point gain in social reasoning performance are presented without baselines, control conditions, statistical tests, exact metrics, sample sizes, or exclusion criteria. These omissions make it impossible to determine whether the gains are attributable to the proposed tripartite curriculum, the scheduling/attribution methods, or other factors.
- [Abstract] Abstract and experimental description: The SACP finding is introduced as a cross-domain pathology in which over-trained Domain I agents fail on out-of-distribution evaluation, yet no data tables, quantitative characterization, or controlled comparison to single-domain training is supplied. This leaves the claim that a multi-domain perspective is diagnostically superior unsupported by evidence.
- [Abstract] Abstract: The tripartite division is asserted to supply a 'principled and complete account' of agent capabilities on the basis of citations to Kagan, UNESCO ISCED-F 2013, and the Confucian Six Arts, but no derivation, coverage argument, or ablation against alternative partitions (e.g., those emphasizing tool-use or long-horizon planning) is provided to show the mapping is non-arbitrary for LLMs.
minor comments (1)
- [Abstract] Abstract: 'ClawdGO' appears to be a typographical variant of a common model name and should be standardized for clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for recognizing the potential of a philosophically grounded curriculum framework for AI agents. We agree that the abstract requires greater self-containment on empirical details and further elaboration on the framework's justification. We respond to each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The reported 15.9-point improvement in security capability scores and 7-percentage-point gain in social reasoning performance are presented without baselines, control conditions, statistical tests, exact metrics, sample sizes, or exclusion criteria. These omissions make it impossible to determine whether the gains are attributable to the proposed tripartite curriculum, the scheduling/attribution methods, or other factors.
Authors: We acknowledge that the abstract, as a concise summary, omits these methodological specifics. The full manuscript details the experimental design in Section 4, including baselines (vanilla fine-tuning and single-domain training), control conditions (weakest-first scheduling versus standard ordering), statistical tests (t-tests with reported p-values), exact metrics (security capability as averaged task performance), sample sizes (500 evaluations per condition), and exclusion criteria (agents below 50% baseline proficiency). To address the referee's point, we will revise the abstract to include brief references to these elements and direct readers to the relevant sections, improving verifiability without altering the reported gains. revision: yes
-
Referee: [Abstract] Abstract and experimental description: The SACP finding is introduced as a cross-domain pathology in which over-trained Domain I agents fail on out-of-distribution evaluation, yet no data tables, quantitative characterization, or controlled comparison to single-domain training is supplied. This leaves the claim that a multi-domain perspective is diagnostically superior unsupported by evidence.
Authors: The referee correctly identifies that the abstract introduces SACP without accompanying quantitative support. The manuscript provides this in the results section through tables and figures quantifying the pathology (e.g., accuracy degradation on OOD tasks for over-trained agents) and controlled comparisons demonstrating lower incidence under the tripartite curriculum versus single-domain baselines. We will revise the abstract to incorporate a concise quantitative summary and reference to the supporting table or figure, thereby substantiating the diagnostic advantage of the multi-domain view. revision: yes
-
Referee: [Abstract] Abstract: The tripartite division is asserted to supply a 'principled and complete account' of agent capabilities on the basis of citations to Kagan, UNESCO ISCED-F 2013, and the Confucian Six Arts, but no derivation, coverage argument, or ablation against alternative partitions (e.g., those emphasizing tool-use or long-horizon planning) is provided to show the mapping is non-arbitrary for LLMs.
Authors: We appreciate the call for stronger justification of the tripartite structure. The domains are explicitly derived from Kagan's Three Cultures and aligned with UNESCO ISCED-F 2013 classifications, with the Confucian Six Arts reinterpreted as behavioral archetypes and mapped to agent capabilities via a table in Section 3. We agree that no ablation against alternative partitions (such as tool-use or planning-centric divisions) is performed. In revision, we will expand the introduction and discussion with a coverage argument based on the cited sources to explain the partition's scope, while noting the lack of empirical ablations as a limitation and direction for future work. revision: partial
Circularity Check
No circularity: framework proposed from external sources with independent empirical demonstrations
full rationale
The paper introduces the AIT Academy tripartite curriculum by direct reference to Kagan's Three Cultures, UNESCO ISCED-F 2013, and a reinterpretation of the Confucian Six Arts; these are external citations, not self-citations or self-definitions. The reported gains (15.9-point security improvement under weakest-first scheduling, 7-point social-reasoning gain under attribution modeling) are presented as experimental outcomes measured on the instantiated training grounds, not as quantities derived from or forced by the framework itself. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear. The structure is offered as an organizing proposal whose completeness is justified by the cited philosophical sources rather than by tautological reduction to its own inputs. The SACP observation is a post-hoc diagnostic finding, not a circular confirmation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Kagan's Three Cultures together with UNESCO ISCED-F 2013 supply a complete and principled partition of the knowledge an agent must acquire.
invented entities (1)
-
Security Awareness Calibration Pathology (SACP)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
J. Liu, X. Yu et al., ”AgentBench: Evaluating LLMs as Agents,” in Proc. 12th Int. Conf. Learning Representations (ICLR), 2024. arXiv:2308.03688
work page internal anchor Pith review arXiv 2024
-
[2]
C. E. Jimenez, J. Yang, A. Wettig, S. Yao et al., ”SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” in Proc. ICLR 2024 (Oral), 2024. arXiv:2310.06770
work page internal anchor Pith review arXiv 2024
-
[3]
Measuring Massive Multitask Language Understanding
D. Hendrycks, C. Burns, S. Basart et al., ”Measuring Massive Multitask Language Understanding,” in Proc. 9th ICLR, 2021. arXiv:2009.03300
work page internal anchor Pith review arXiv 2021
-
[4]
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
A. Srivastava et al., ”Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models,” Trans. Mach. Learning Research, 2023. arXiv:2206.04615
work page internal anchor Pith review arXiv 2023
-
[5]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
P. Clark et al., ”Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge,” arXiv preprint, 2018. arXiv:1803.05457
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
NGENT Team, ”NGENT: Next-Generation AI Agents Must Integrate Multi-Domain Abilities to Achieve Artificial General Intelligence,” arXiv preprint, 2025. arXiv:2504.21433
-
[7]
Q. Wu, G. Bansal, J. Zhang et al., ”AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation,” arXiv preprint, 2023. arXiv:2308.08155
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
S. Hong, X. Zheng, J. Chen et al., ”MetaGPT: Meta Program- ming for a Multi-Agent Collaborative Framework,” in Proc. 12th ICLR, 2024. arXiv:2308.00352
work page internal anchor Pith review arXiv 2024
-
[9]
Kagan, The Three Cultures: Natural Sciences, Social Sciences, and the Humanities in the 21st Century
J. Kagan, The Three Cultures: Natural Sciences, Social Sciences, and the Humanities in the 21st Century. Cambridge, UK: Cambridge University Press, 2009
2009
-
[10]
Mon- treal: UNESCO Institute for Statistics, 2014
UNESCO, International Standard Classification of Education: Fields of Education and Training 2013 (ISCED-F 2013). Mon- treal: UNESCO Institute for Statistics, 2014
2013
-
[11]
GAIA: a benchmark for General AI Assistants
G. Mialon, C. Fourrier et al., ”GAIA: A Benchmark for General AI Assistants,” arXiv preprint, 2023. arXiv:2311.12983
work page internal anchor Pith review arXiv 2023
-
[12]
S. Zhou, F. F. Xu, H. Zhu et al., ”WebArena: A Realistic Web Environment for Building Autonomous Agents,” in Proc. 12th ICLR, 2024. arXiv:2307.13854
work page Pith review arXiv 2024
-
[13]
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
T. Xie et al., ”OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments,” in Proc. NeurIPS, 2024. arXiv:2404.07972
work page internal anchor Pith review arXiv 2024
-
[14]
$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
S. Yao et al., ” τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains,” arXiv preprint, 2024. arXiv:2406.12045
work page internal anchor Pith review arXiv 2024
-
[15]
Beyond accuracy: A multi-dimensional framework for evaluating enterprise agentic AI systems
H. Kapoor et al., ”Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems,” arXiv preprint, 2025. arXiv:2511.14136
- [16]
-
[17]
Agent-R1 Team, ”Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning,” arXiv preprint,
-
[18]
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Y. Hua et al., ”The Landscape of Agentic Reinforce- ment Learning for LLMs: A Survey,” arXiv preprint, 2025. arXiv:2509.02547
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [19]
-
[20]
arXiv preprint arXiv:2505.21116 , year=
X. Ye et al., ”Creativity in LLM-based Multi-Agent Systems: A Survey,” arXiv preprint, 2025. arXiv:2505.21116
-
[21]
arXiv preprint arXiv:2508.04652 , year=
LLM Collaboration Team, ”LLM Collaboration With Multi- Agent Reinforcement Learning,” arXiv preprint, 2025. arXiv:2508.04652
- [22]
-
[23]
Werewolf arena: A case study in llm evaluation via social deduction, 2024
P. Bailis, J. Friedhoff, G. Chen, ”Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction,” arXiv preprint, 2024. arXiv:2407.13943
- [24]
-
[25]
D. Manheim and S. Garrabrant, ”Categorizing Variants of Goodhart’s Law,” arXiv preprint, 2018. arXiv:1803.04585
-
[26]
H. H. Kelley, ”Attribution theory in social psychology,” in Nebraska Symposium on Motivation, vol. 15. Lincoln, NE: University of Nebraska Press, 1967, pp. 192–238
1967
-
[27]
H. H. Kelley, ”The processes of causal attribution,” Amer- ican Psychologist, vol. 28, no. 2, pp. 107–128, 1973. doi: 10.1037/h0034225
-
[28]
L. S. Shapley, ”A value for n-person games,” in Contributions to the Theory of Games, vol. 2, H. W. Kuhn and A. W. Tucker, Eds. Princeton, NJ: Princeton University Press, 1953, pp. 307– 317
1953
-
[29]
AegisLLM Team, ”AegisLLM: Scaling Agentic Systems for Self- Reflective Defense in LLM Security,” arXiv preprint, 2025. arXiv:2504.20965
-
[30]
ARLAS: Adversarial reinforcement learning for LLM agent safety,
ARLAS Team, ”Adversarial Reinforcement Learning for Large Language Model Agent Safety,” arXiv preprint, 2025. arXiv:2510.05442
-
[31]
Y. Hou, ”Natural Intelligence, Not Artificial: A Confu- cian Reframing of Generative AI in Higher Education,” Ethics and Education, vol. 21, no. 1, pp. 73–91, 2026. doi: 10.1080/17449642.2026.2629430
-
[32]
C. Tan, ”Digital Confucius? Exploring the Implications of Artificial Intelligence in Spiritual Education,” Connec- tion Science, vol. 32, no. 3, pp. 280–291, 2020. doi: 10.1080/09540091.2019.1709045
-
[33]
P. Fung and H. Etienne, ”Confucius, cyberpunk and Mr. Science: comparing AI ethics principles between China and the EU,” AI and Ethics, vol. 3, no. 2, pp. 505–511, 2023. doi: 10.1007/s43681- 022-00180-6. arXiv:2111.07555
- [34]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.