pith. machine review for the scientific record. sign in

arxiv: 2604.11364 · v1 · submitted 2026-04-13 · 💻 cs.AI

Recognition: unknown

The Missing Knowledge Layer in Cognitive Architectures for AI Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:27 UTC · model grok-4.3

classification 💻 cs.AI
keywords cognitive architecturesAI agentsknowledge layerpersistence semanticsmemory systemsfour-layer modelCoALAJEPA
0
0 comments X

The pith

Cognitive architectures for AI agents miss a dedicated Knowledge layer, leading to incorrect persistence for facts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that influential frameworks CoALA and JEPA lack an explicit Knowledge layer with distinct persistence semantics. This omission causes systems to apply decay mechanisms meant for memories to factual knowledge or to update facts and experiences with the same rules. The authors propose a four-layer structure consisting of Knowledge with indefinite supersession, Memory with Ebbinghaus decay, Wisdom with evidence-gated revision, and Intelligence with ephemeral inference. They back this proposal with a survey of memory systems revealing eight points of convergence on these issues and provide working implementations in Python and Rust to demonstrate feasibility.

Core claim

The two most influential cognitive architecture frameworks for AI agents lack an explicit Knowledge layer with its own persistence semantics, producing a category error where factual claims receive cognitive decay or facts and experiences share identical update mechanics. Surveying persistence semantics across systems identifies eight convergence points pointing to related gaps. A four-layer decomposition is proposed where Knowledge, Memory, Wisdom, and Intelligence each have fundamentally different persistence semantics: indefinite supersession, Ebbinghaus decay, evidence-gated revision, and ephemeral inference respectively.

What carries the argument

The four-layer decomposition of cognitive architectures into Knowledge, Memory, Wisdom, and Intelligence, each governed by distinct persistence semantics tailored to their function.

If this is right

  • Without a separate Knowledge layer, AI agents will misapply decay to established facts, reducing long-term accuracy.
  • The survey of existing systems shows that multiple approaches converge on the same persistence-related shortcomings.
  • Separate persistence semantics for each layer can be implemented practically, as shown by the Python and Rust code examples.
  • Distinctions between layers are engineering requirements based on update mechanics rather than direct copies of human cognition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agents using this layered approach could maintain stable factual bases while still allowing memories to fade appropriately over time.
  • This separation might help resolve contradictions in knowledge by applying evidence-gated updates only to the Wisdom layer.
  • Testing the model on benchmarks with low contradiction-resolution scores could reveal performance gains from the architectural change.

Load-bearing premise

The gaps in CoALA and JEPA reflect deep architectural shortcomings that cannot be fixed by minor adjustments and instead require adopting the full four-layer redesign.

What would settle it

An experiment that builds two agents, one following the four-layer model and one based on CoALA or JEPA, then measures how well each retains verified facts over time versus how they handle transient experiences; better retention of facts without loss of adaptive memory would support the claim.

Figures

Figures reproduced from arXiv: 2604.11364 by Micha\"el Roynard (LAAS-OASIS).

Figure 1
Figure 1. Figure 1: Four-layer cognitive architecture. Solid arrows (top) denote working pipelines active during inference. Dashed arrows (bottom) denote offline consolidation pipelines. The layers are co-equal substrates with distinct persistence semantics, not a strict hierarchy. upon, while preserving both for provenance and historical queries. Knowledge is shared across agents (any agent querying the knowledge base sees t… view at source ↗
read the original abstract

The two most influential cognitive architecture frameworks for AI agents, CoALA [21] and JEPA [12], both lack an explicit Knowledge layer with its own persistence semantics. This gap produces a category error: systems apply cognitive decay to factual claims, or treat facts and experiences with identical update mechanics. We survey persistence semantics across existing memory systems and identify eight convergence points, from Karpathy's LLM Knowledge Base [10] to the BEAM benchmark's near-zero contradiction-resolution scores [22], all pointing to related architectural gaps. We propose a four-layer decom position (Knowledge, Memory, Wisdom, Intelligence) where each layer has fundamentally different persistence semantics: indefinite supersession, Ebbinghaus decay, evidence-gated revision, and ephemeral inference respectively. Companion implementations in Python and Rust demonstrate the architectural separation is feasible. We borrow terminology from cognitive science as a useful analogy (the Knowledge/Memory distinction echoes Tulving's trichotomy), but our layers are engineering constructs justified by persistence-semantics requirements, not by neural architecture. We argue that these distinctions demand distinct persistence semantics in engineering implementations, and that no current framework or system provides this.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that the two most influential cognitive architecture frameworks for AI agents, CoALA and JEPA, both lack an explicit Knowledge layer with its own persistence semantics. This gap produces a category error in which systems apply cognitive decay to factual claims or treat facts and experiences with identical update mechanics. The paper surveys persistence semantics across existing memory systems, identifies eight convergence points (from Karpathy's LLM Knowledge Base to the BEAM benchmark's low contradiction-resolution scores), and proposes a four-layer decomposition (Knowledge with indefinite supersession, Memory with Ebbinghaus decay, Wisdom with evidence-gated revision, and Intelligence with ephemeral inference). The layers are presented as engineering constructs justified by persistence requirements rather than direct neural analogies, with companion Python and Rust implementations demonstrating feasibility.

Significance. If the identified gaps and the necessity of distinct persistence semantics hold, the work could usefully inform the design of more robust AI agent memory systems by separating indefinite factual retention from decaying episodic memory and evidence-based revision. The survey of eight convergence points across systems and benchmarks, together with the provision of reproducible companion code, are concrete strengths that enable empirical follow-up. The engineering framing (distinct from cognitive-science borrowing) helps ground the proposal in implementation needs.

major comments (2)
  1. [Analysis of CoALA and JEPA (Abstract and survey section)] The central claim that CoALA [21] and JEPA [12] produce a category error by applying cognitive decay to factual claims or identical update mechanics to facts and experiences rests on high-level descriptions rather than a detailed mapping of their persistence mechanisms. No section dissects CoALA's cognitive modules or JEPA's predictive memory to exhibit the specific error (e.g., Ebbinghaus-style decay applied to stored facts). This is load-bearing for the argument that a full four-layer redesign is required, as opposed to differentiated rules added inside existing layers.
  2. [Four-layer decomposition proposal] In the four-layer proposal, the separation between the Knowledge layer (indefinite supersession) and the Wisdom layer (evidence-gated revision) is introduced as a new construct, yet the manuscript provides limited concrete examples or pseudocode showing how their persistence semantics differ in practice from each other or from the Memory layer. This weakens the justification that all four layers are necessary rather than achievable through refinements to a three-layer structure.
minor comments (2)
  1. [Implementation section] The companion code is cited as demonstrating feasibility, but the manuscript would benefit from a brief table or section explicitly linking code modules to the four proposed layers and their persistence rules.
  2. [Terminology and analogies] Terminology note: the 'Wisdom layer' label may invite confusion with other AI literature; a short clarification of its engineering definition versus common usage would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, acknowledging where the manuscript can be strengthened through revision.

read point-by-point responses
  1. Referee: [Analysis of CoALA and JEPA (Abstract and survey section)] The central claim that CoALA [21] and JEPA [12] produce a category error by applying cognitive decay to factual claims or identical update mechanics to facts and experiences rests on high-level descriptions rather than a detailed mapping of their persistence mechanisms. No section dissects CoALA's cognitive modules or JEPA's predictive memory to exhibit the specific error (e.g., Ebbinghaus-style decay applied to stored facts). This is load-bearing for the argument that a full four-layer redesign is required, as opposed to differentiated rules added inside existing layers.

    Authors: We acknowledge that the current analysis of CoALA and JEPA relies primarily on the high-level descriptions in their source publications rather than a granular dissection of internal persistence rules. To address this directly, we will add a dedicated subsection to the survey that maps the persistence semantics of CoALA's cognitive modules and JEPA's predictive memory components, citing specific mechanisms from the original papers. This will explicitly identify cases where factual claims are subject to the same decay or undifferentiated update rules as episodic experiences, thereby reinforcing why a distinct Knowledge layer is required rather than refinements within existing layers. revision: yes

  2. Referee: [Four-layer decomposition proposal] In the four-layer proposal, the separation between the Knowledge layer (indefinite supersession) and the Wisdom layer (evidence-gated revision) is introduced as a new construct, yet the manuscript provides limited concrete examples or pseudocode showing how their persistence semantics differ in practice from each other or from the Memory layer. This weakens the justification that all four layers are necessary rather than achievable through refinements to a three-layer structure.

    Authors: We agree that the manuscript would be strengthened by more explicit illustrations of the layer distinctions. In revision, we will expand the four-layer decomposition section to include pseudocode examples that contrast the operations: indefinite supersession for Knowledge (fact retention without temporal decay), evidence-gated revision for Wisdom (conditional updates based on supporting data), and Ebbinghaus-style decay for Memory. We will also reference specific implementation details from the companion Python and Rust codebases to demonstrate practical separation, which should clarify why four layers are necessary beyond refinements to a three-layer design. revision: yes

Circularity Check

0 steps flagged

No circularity: proposal grounded in external survey and engineering requirements

full rationale

The paper's derivation identifies the absence of an explicit Knowledge layer in CoALA [21] and JEPA [12] via direct reference to those external frameworks, then surveys eight other memory systems to list convergence points on persistence issues. It proposes the four-layer decomposition (Knowledge with indefinite supersession, Memory with Ebbinghaus decay, Wisdom with evidence-gated revision, Intelligence with ephemeral inference) as an engineering construct justified by the distinct semantics requirements, with an explicit note that the Tulving analogy is borrowed only for terminology and not as foundational justification. No equations, fitted parameters, self-citations as load-bearing premises, or reductions of the central claim to the authors' own prior outputs appear; the argument remains independent of its own proposed layers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The proposal introduces four new layers as engineering constructs justified by differing persistence requirements, without fitted parameters or external physical entities.

axioms (1)
  • domain assumption Persistence semantics must differ fundamentally across types of information (facts vs experiences) in cognitive architectures for AI agents.
    Invoked to justify the need for separate layers and to identify the category error in existing frameworks.
invented entities (2)
  • Knowledge layer no independent evidence
    purpose: Stores factual claims with indefinite supersession persistence.
    New engineering construct introduced to address the missing layer in CoALA and JEPA.
  • Wisdom layer no independent evidence
    purpose: Handles evidence-gated revision of higher-order beliefs.
    New engineering construct in the four-layer decomposition.

pith-pipeline@v0.9.0 · 5493 in / 1342 out tokens · 34592 ms · 2026-05-10T15:27:34.037263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    agent state

    AlbericByte: ArqonDB: Unified KV+vector+graph engine for agent memory. GitHub (2026), https://github.com/AlbericByte/ArqonDB, rust. Causal DAG with temporal edges (uni-temporal validity intervals), state branching, custom LSM-tree + HNSW. Still conflates all cognitive layers into a single “agent state” abstraction

  2. [2]

    GitHub (2026), independent convergence on Tulving’s trichotomy (semantic/episodic/procedural) with separate extraction pipelines per type

    alibaizhanov: Mengram: Three-type memory extraction for LLMs. GitHub (2026), independent convergence on Tulving’s trichotomy (semantic/episodic/procedural) with separate extraction pipelines per type

  3. [3]

    Claude Code v2.1.59+ (2026), six memory subsystems with 4-type taxonomy (user/feedback/project/reference)

    Anthropic: Claude code memory architecture. Claude Code v2.1.59+ (2026), six memory subsystems with 4-type taxonomy (user/feedback/project/reference). No supersession, no temporal validity, no structured facts. Most widely deployed AI memory system

  4. [4]

    trained from past states and subsequent intrinsic cost, retrieved from memory

    Bandaru, R.: Deep dive into Yann LeCun’s JEPA. GitHub Pages blog (2024), https://rohitbandaru.github.io/blog/2024/07/01/deep-dive-into-yann-lecuns- jepa.html, identifies that JEPA’s critic is “trained from past states and subsequent intrinsic cost, retrieved from memory” — memory is load-bearing for the critic

  5. [5]

    In: Roediger, H.L., Craik, F.I.M

    Bjork, R.A.: Retrieval inhibition as an adaptive mechanism in human memory. In: Roediger, H.L., Craik, F.I.M. (eds.) Varieties of Memory and Consciousness: Essays in Honour of Endel Tulving, pp. 309–330. Lawrence Erlbaum Associates (1989)

  6. [6]

    Burnell, R., Yamamori, Y., Firat, O., Olszewska, K., Hughes-Fitt, S., Kelly, O., Galatzer-Levy, I.R., Morris, M.R., Dafoe, A., Sny- der, A.M., Goodman, N.D., Botvinick, M., Legg, S.: Measuring progress toward AGI: A cognitive framework. Tech. rep., Google DeepMind (March 2026), https://storage.googleapis.com/deepmind- media/DeepMind.com/Blog/measuring-pro...

  7. [7]

    Science 391(6792) (2026)

    Cheng, M., et al.: Sycophantic AI decreases prosocial intentions and promotes dependence. Science 391(6792) (2026). https://doi.org/10.1126/science.aec8352

  8. [8]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Chhikara, P., et al.: Mem0: Building production-ready AI agents with scalable long-term memory. arXiv preprint arXiv:2504.19413 (2025)

  9. [9]

    Science210(4466), 207–210 (1980)

    Cohen, N.J., Squire, L.R.: Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science210(4466), 207–210 (1980)

  10. [10]

    https://github.com/fronalabs/frona (2026), rust + SurrealDB

    fronalabs: Frona: Self-hosted personal AI assistant. https://github.com/fronalabs/frona (2026), rust + SurrealDB. Two-tier memory: user-scoped + agent-scoped. First Rust peer-competitor

  11. [11]

    Springer (2025), 424 pages

    Gulli, A., Sauco, M.: Agentic AI: Design Patterns and Production-Ready Strategies for Building Intelligent Agents. Springer (2025), 424 pages. Google Office of CTO. Two-tier memory (context + vector store). No temporal model, no forgetting, no wisdom layer, no consumer abstraction

  12. [12]

    https://github.com/Hawksight-AI/semantica (2026), rDF/OWL + property graphs + triple stores

    Hawksight-AI: Semantica: Multi-backend bi-temporal knowledge engine. https://github.com/Hawksight-AI/semantica (2026), rDF/OWL + property graphs + triple stores. W3C PROV-O provenance. Allen Interval Algebra for deterministic temporal consistency. The Missing Knowledge Layer in Cognitive Architectures 17

  13. [13]

    Hu, Y., Liu, S., Yue, Y., Zhang, G., et al.: Memory in the age of AI agents: A survey — forms, functions and dynamics (2025)

  14. [14]

    arXiv preprint arXiv:2602.19320 , year=

    Jiang, D., Li, Y., Wei, S., Yang, J., Kishore, A., Zhao, A., Kang, D., Hu, X., Chen, F., Li, Q., Li, B.: Anatomy of agentic memory: Taxonomy and empirical analysis of evaluation and system limitations. arXiv preprint arXiv:2602.19320 (2026)

  15. [15]

    X (Twitter) (April 2026), widely shared (mil- lions of views)

    Karpathy, A.: LLM knowledge bases. X (Twitter) (April 2026), widely shared (mil- lions of views). Six-component system: ingest, compilation, Q&A, output loop, lint- ing. Independent convergence toward Knowledge layer with no Memory, Wisdom, or temporal semantics

  16. [16]

    Five-level hierarchy

    Latimer, C., Boschi, N., Neeser, A., Bartholomew, C., Srivastava, G., Wang, X., Ramakrishnan, N.: Hindsight is 20/20: Building agent memory that retains, recalls, and reflects (2025), bEAM SOTA: 64.1% at 10M tokens. Five-level hierarchy. No forgetting, no bi-temporal, no supersession

  17. [17]

    OpenReview (2022), https://openreview.net/pdf?id=BZ5a1r-kVsf

    LeCun, Y.: A path towards autonomous machine intelligence. OpenReview (2022), https://openreview.net/pdf?id=BZ5a1r-kVsf

  18. [18]

    Lett, M.: Critical review of LeCun’s introductory JEPA paper. Medium blog (2024), https://medium.com/@malcolmlett/critical-review-of-lecuns- introductory-jepa-paper-f4e5e582caeb, identifies: configurator undefined, System 1/2 mapping incorrect, H-JEPA incompatible with predictive coding

  19. [19]

    GitHub (2026), https://github.com/mastra-ai/mastra, compression-first memory: raw messages → observations → reflections

    Mastra: Mastra observational memory. GitHub (2026), https://github.com/mastra-ai/mastra, compression-first memory: raw messages → observations → reflections. 94.87% on LongMemEval (GPT-5-mini), 84.23% with GPT-4o. Per-category: knowledge-update 96.2%, temporal-reasoning 95.5%, multi-session 87.2% (ceiling across all systems). No cross-session persistence,...

  20. [20]

    Sage Publications (2007)

    McDaniel, M.A., Einstein, G.O.: Prospective memory: An overview and synthesis of an emerging field. Sage Publications (2007)

  21. [21]

    https://github.com/Minns-ai/MinnsDB (2026), aGPL-3.0, Rust

    Minns-ai: MinnsDB: Agentic database with bi-temporal graph. https://github.com/Minns-ai/MinnsDB (2026), aGPL-3.0, Rust. Bi-temporal knowledge graph. MinnsQL with WHEN/AS OF . OWL vocabulary as metadata, not reasoning

  22. [22]

    Morris, M.R., Altman, D., Belfield, H., Goemans, A., Iqbal, H., Burnell, R., Gabriel, I., Albanie, S., Dafoe, A.: Characterizing model jaggedness supports safety and usability. Tech. rep., Google DeepMind (January 2026), monolithic scores hide jagged capability profiles; principled decomposition required

  23. [23]

    In: Bower, G.H

    Nelson, T.O.: Metamemory: A theoretical framework and new findings. In: Bower, G.H. (ed.) The Psychology of Learning and Motivation, vol. 26, pp. 125–173. Aca- demic Press (1990)

  24. [24]

    GitHub (2026), entity/aspect/attribute graph with partial supersession but uniform 0.95ˆdays de- cay on all types

    niloproject: Signet: Persistent cognition layer for AI coding agents. GitHub (2026), entity/aspect/attribute graph with partial supersession but uniform 0.95ˆdays de- cay on all types

  25. [25]

    GitHub (2026), three-tier cognitive decay: episodic 7-day, semantic 69-day, procedural 693- day half-life

    NornicDB: NornicDB: Cognitive graph and vector database for AI agents. GitHub (2026), three-tier cognitive decay: episodic 7-day, semantic 69-day, procedural 693- day half-life. Applies storage-level decay to factual knowledge

  26. [26]

    https://github.com/NousResearch/hermes-agent (2026), five memory layers including Procedural Skill Documents following the agentskills.io open standard

    Nous Research: Hermes agent. https://github.com/NousResearch/hermes-agent (2026), five memory layers including Procedural Skill Documents following the agentskills.io open standard. Closes experience-to-wisdom loop

  27. [27]

    GitHub (2026), three-zone decay (identity 0.1x, knowledge 1.0x, ops 3.0x)

    ori-community: Ori mnemos v0.4: Six-layer retrieval with graph-aware forgetting. GitHub (2026), three-zone decay (identity 0.1x, knowledge 1.0x, ops 3.0x). ACT-R vitality, Tarjan structural protection. Still conflates all data types in one graph. 18 M. Roynard

  28. [28]

    Letta V1 rearchitecture (2025) introduces Context Repositories and tiered memory

    Packer, C., Wooders, S., Lin, K., Fang, V., Patil, S.G., Stoica, I., Gonzalez, J.E.: MemGPT: Towards LLMs as operating systems (2023), virtual context manage- ment for LLMs via OS-inspired memory hierarchy. Letta V1 rearchitecture (2025) introduces Context Repositories and tiered memory

  29. [29]

    https://github.com/Papr-ai/papr-pythonSDK (2026), schema-policy DSL dec- orators for memory operations

    Papr-ai: Papr python SDK: Schema-policy memory engine. https://github.com/Papr-ai/papr-pythonSDK (2026), schema-policy DSL dec- orators for memory operations. First-class LLM callouts at constraint-firing time

  30. [30]

    r/AIMemory community: Proposal: A real benchmark for long-term AI memory systems. https://www.reddit.com/r/AIMemory/comments/1sgvsxb/ (April 2026), memPalace audit: reported 96.6% LongMemEval collapsed to 66.8%; LoCoMo an- swer key 6.4% wrong; LLM judge accepts 63% of incorrect answers

  31. [31]

    Episode-to-fact extraction

    Rasmussen, P., Paliychuk, P., Beauvais, T.: Zep: A temporal knowledge graph architecture for agent memory (2025), bi-temporal model (4 timestamps per edge). Episode-to-fact extraction

  32. [32]

    https://gist.github.com/rohitg00/2067ab416f7bbe447c1977edaaa681e2 (2026), fork of Karpathy pattern

    rohitg00: LLM wiki v2. https://gist.github.com/rohitg00/2067ab416f7bbe447c1977edaaa681e2 (2026), fork of Karpathy pattern. Confidence scoring with time-decay, superses- sion, Ebbinghaus forgetting, four-tier consolidation. Closest informal statement of the four-layer thesis

  33. [33]

    GitHub (GoogleCloudPlatform/generative- ai) (2026), https://github.com/GoogleCloudPlatform/generative- ai/tree/main/gemini/agents/always-on-memory-agent, mIT

    Saboo, S.: Always-on memory agent. GitHub (GoogleCloudPlatform/generative- ai) (2026), https://github.com/GoogleCloudPlatform/generative- ai/tree/main/gemini/agents/always-on-memory-agent, mIT. No vector DB. LLM-driven memory organization in SQLite. 30-minute consolidation loops generating meta-insights. Independently validates the consolidation-as-first-...

  34. [34]

    Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L

    Sumers, T.R., Yao, S., Narasimhan, K., Griffiths, T.L.: Cognitive architectures for language agents. Transactions on Machine Learning Research (TMLR) (2024), https://arxiv.org/abs/2309.02427

  35. [35]

    Ross Mitchell

    Tavakoli, M., Salemi, A., Ye, C., Abdalla, M., Zamani, H., Mitchell, J.R.: BEAM: Beyond a million tokens: Benchmarking and enhancing long-term memory in LLMs. arXiv preprint arXiv:2510.27246 (2026)

  36. [36]

    In: Tulving, E., Donaldson, W

    Tulving, E.: Episodic and semantic memory. In: Tulving, E., Donaldson, W. (eds.) Organization of Memory, pp. 381–403. Academic Press (1972)

  37. [37]

    Yu, Y., Yao, L., Xie, Y., Tan, Q., Feng, J., Li, Y., Wu, L.: Agentic memory: Learning unified long-term and short-term memory management for large language model agents (2026), ebbinghaus forgetting curve adaptation for AI memory

  38. [38]

    Zhang, W., et al.: A-mem: Agentic memory for LLM agents (2025)