pith. sign in

arxiv: 2606.00288 · v3 · pith:Q6ZXC3ZYnew · submitted 2026-05-29 · 💻 cs.AI

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Pith reviewed 2026-06-28 22:10 UTC · model grok-4.3

classification 💻 cs.AI
keywords model-native computingintelligent computing architectureLLM system designcomputer architecture analogydual-plane architectureagent frameworkssemantic localitycontext management
0
0 comments X

The pith

LLM systems gain a six-layer architecture by mapping them to CPUs, caches, memory, and operating systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that engineering challenges in large language models such as cache reuse, context limits, agent scheduling, and permission control repeat classical computer-systems problems. Treating the LLM as a CPU, KV cache as processor cache, context window as main memory, and agent framework as an operating system allows established architecture principles to shape future model-native designs. It introduces the Intelligent Computing Architecture as six functional layers equipped with interface contracts and design axioms. A dual-plane structure separates a probabilistic execution plane from a deterministic control plane, with every layer crossing both. Three Amdahl-style heuristics organize sizing decisions for semantic locality, context budget, and agent speedup.

Core claim

The central claim is that decades of computer-architecture experience can be transferred to model-native stacks through explicit mappings, producing the Intelligent Computing Architecture: six layers with defined interfaces and axioms, unified by a dual-plane architecture in which a probabilistic execution plane handles what can be computed and a deterministic control plane handles what should be computed, each layer graded across the crossover.

What carries the argument

The dual-plane architecture that routes every layer through both a probabilistic execution plane (what can be computed) and a deterministic control plane (what should be computed) with graded crossovers.

If this is right

  • The three heuristics Semantic Locality, Context Budget, and Agent Speedup supply back-of-the-envelope models whose parameter ranges can be checked against published data.
  • Surveyed literature on memory management, tool protocols, multi-agent coordination, and safety governance maps onto distinct layers of the ICA.
  • Every layer of the architecture must pass through both the probabilistic and deterministic planes.
  • Analogy boundaries include differences such as non-deterministic execution in models versus fixed silicon behavior.
  • The principal open task is predictive validation of the heuristics against real deployments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The ICA could serve as a common reference for comparing how different agent frameworks implement the same layer functions.
  • The dual-plane separation might apply to hybrid codebases that combine model calls with conventional deterministic modules.
  • Validation experiments would need to track whether the heuristics correctly forecast performance or failure modes in production LLM workloads.
  • Strong deviations from statistical patterns in model outputs could mark the practical edge of the proposed mappings.

Load-bearing premise

The assumption that the probabilistic and non-deterministic character of model execution permits the same interface contracts and axioms used in deterministic silicon systems to be applied without fundamental revision.

What would settle it

A side-by-side implementation of two equivalent agent systems, one built according to the six-layer ICA with the three heuristics and one built without, that shows no measurable difference in scalability, error handling, or development effort.

Figures

Figures reproduced from arXiv: 2606.00288 by Hai Lin, Hai-Tao Zheng, Hoilam Pao, Shaoxiong Zhan.

Figure 1
Figure 1. Figure 1: Overview analogy mapping between computer architecture and model-native comput￾ing systems 1.1 A Natural Analogy The core mapping between the two worlds is summarized in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ISA: the stable hardware–software contract. Intel x86, ARM, and RISC-V each provide distinct microarchitectural implementations while exposing the same ISA to compilers, operating systems, and applications. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CPU microarchitecture: pipelined and superscalar execution. A pipeline decomposes instruction execution into stages (Fetch, Decode, Execute, Memory, Write-Back) so that multi￾ple instructions overlap in flight simultaneously; a superscalar design further dispatches several instructions per cycle. 2.1.3 Cache Hierarchy: Exploiting Locality to Bridge the Processor–Memory Gap The cache hierarchy addresses a f… view at source ↗
Figure 4
Figure 4. Figure 4: Cache hierarchy: exploiting temporal and spatial locality to bridge the processor– memory speed gap. Each successive tier (L1, L2, L3, DRAM) trades lower latency for higher capacity. 2.1.4 Virtual Memory: The Illusion of Infinite Address Space Virtual memory, managed by the Memory Management Unit (MMU) and page tables, provides each process with an independent address space that can far exceed physical mem… view at source ↗
Figure 5
Figure 5. Figure 5: Virtual memory: the MMU and page tables map each process’s large virtual address space onto non-contiguous physical frames, providing isolation and the illusion of unbounded memory. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The operating system as a resource manager: the kernel mediates access to CPU cores, memory, storage, and network through scheduling, memory management, and a uniform system-call interface. 2.1.6 Distributed Systems: Scaling Beyond a Single Node Distributed systems extend the computing stack across multiple nodes organized into a logically unified whole. The fundamental challenge is maintaining correctness… view at source ↗
Figure 7
Figure 7. Figure 7: Distributed systems: the Raft consensus algorithm replicates a shared log across a leader and follower nodes, maintaining consistency despite individual node failures; the CAP theorem bounds what is simultaneously achievable under network partitions. Amdahl’s Law. Amdahl’s Law establishes a theoretical upper bound on the speedup achiev￾able through parallelization: S = 1 (1 − f) + f /p (1) where f is the f… view at source ↗
Figure 8
Figure 8. Figure 8: Model layer: the Transformer’s self-attention mechanism serves as the general-purpose inference core; Mixture-of-Experts (MoE) introduces sparse routing so that only a subset of pa￾rameters is activated per inference step, enabling capacity to scale without proportional compute cost. 2.2.2 Inference Layer: Systematic Serving Optimizations On the inference front, a growing body of work has transformed LLM s… view at source ↗
Figure 9
Figure 9. Figure 9: Inference layer: PagedAttention organizes the KV cache into demand-allocated pages and enables sharing of prefix pages across requests; continuous batching allows new requests to join in-flight batches at each decoding step; FlashAttention tiles attention computation to reduce HBM traffic. 2.2.3 Memory Layer: Reconciling Finite Windows with Persistent State At the memory layer, the core challenge is that t… view at source ↗
Figure 10
Figure 10. Figure 10: Memory layer: a three-tier architecture spanning the bounded context window (hot working memory), a warm retrieval store (vector database, accessed via RAG), and cold long-term archival storage; MemGPT automates swap-in and swap-out across tiers. 2.2.4 Agent Layer: From Single-Turn Inference to Complex Runtimes At the agent layer, LLMs have evolved beyond single-turn inference engines into controllers wit… view at source ↗
Figure 11
Figure 11. Figure 11: Agent layer: the ReAct loop interleaves reasoning with tool invocations inside a sand￾boxed runtime; sub-agents execute concurrently under permission control, and the orchestrator merges their results. 3 Related Work The central thesis of this paper is that the conceptual framework of computer architecture offers a productive lens for envisioning the complete layered design of future model-native computin… view at source ↗
Figure 12
Figure 12. Figure 12: Coverage of ICA layers by existing works. Cells are classified as Focused (primary contribution), Touched (layer addressed but not central), or Not addressed. The Score column shows the weighted coverage percentage (2×Focused + Touched, out of a maximum of 12), color￾coded as high (≥50%, green), medium (33–49%, amber), or low (<33%, red). The bottom row identifies the specific open gap in each layer that … view at source ↗
Figure 13
Figure 13. Figure 13: Dual-plane architecture illustrated through a concrete software-engineering scenario. When a developer submits “Refactor this function,” the deterministic control plane (blue, top) executes a fully auditable, policy-driven sequence—permission check, context loading, task decom￾position, human-approval gate for file writes, and audit logging—while the probabilistic execution plane (orange, bottom) performs… view at source ↗
Figure 14
Figure 14. Figure 14: illustrates the structural parallel between the CPU–ISA contract and the foundation model’s prompt-and-tool-schema interface [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: illustrates the structural correspondence between the classical CPU cache hierar￾chy and the emerging LLM KV cache hierarchy. L1 Cache ∼1 ns 64 KB L2 Cache ∼4 ns 512 KB L3 Cache ∼12 ns 8–32 MB Main Memory (DRAM) ∼60 ns 8–64 GB Classical CPU Cache Session KV Cache <1 ms GPU VRAM Prefix / Shared Cache ∼1–5 ms CPU Memory Semantic Cache ∼10 ms SSD / Object Store External Memory (Vector DB) ∼50–200 ms Unlimite… view at source ↗
Figure 16
Figure 16. Figure 16: The hot/warm/cold tiered context management hierarchy (left) and its classical virtual-memory analogy (right). Both exploit temporal locality and demand-loading to give con￾sumers the illusion of abundant capacity. The critical distinction is that context management is lossy: summaries and semantic retrieval introduce irreversible information loss absent in classical paging. convinces each agent that it p… view at source ↗
Figure 17
Figure 17. Figure 17: Context management as virtual memory with semantics (§5.3). Both systems vir￾tualize a limited physical resource (RAM / context window) into a larger logical space through tiered storage and demand loading. The critical difference is lossiness: OS paging swaps bytes exactly, whereas context management compresses via summarization—information is inevitably distorted, not merely delayed. 5.4 Agent Runtime: … view at source ↗
Figure 18
Figure 18. Figure 18: Agent runtime as intelligent operating system (§5.4). Both take ownership of the same three architectural concerns: resource management, isolation, and standardized interfaces (system calls / MCP tool calls). The critical difference is predictability: an OS schedules deterministic processes whose execution time can be bounded, whereas an agent runtime schedules probabilistic reasoning whose behavior may d… view at source ↗
Figure 19
Figure 19. Figure 19: Tool bus and agent interconnection as I/O system (§5.5). Classical I/O evolved from per-device bespoke drivers through standardized buses (PCIe, USB) to a unified network stack (TCP/IP); agent I/O is undergoing the same transition, from hand-coded API adapters to MCP (vertical agent-to-tool bus) and A2A (horizontal agent-to-agent interconnect). The critical difference is side-effect semantics: read(fd) is… view at source ↗
Figure 20
Figure 20. Figure 20: Multi-agent collaboration as distributed computing with semantics (§5.6). Both domains face the same coordination topologies (hub-and-spoke, pipeline, peer-to-peer) and the same fundamental tension between consistency and availability (CAP theorem / semantic CAP: wait for all agents vs. respond fast). The critical difference is failure mode: distributed-node failure is binary (alive or dead), whereas agen… view at source ↗
Figure 21
Figure 21. Figure 21: ICA layered architecture for model-native computing. Top to bottom: L6 Appli￾cation, L5 Orchestration, L4 Semantic Interface (Deterministic Control Plane, orange), dashed L3–L4 interface boundary, L3 Context Management, L2 Inference Serving (Model Core + Session KV + Prefix/Shared KV), L1 Physical Execution (Probabilistic Execution Plane, blue), and Dis￾tributed Substrate. Thick orange bypass arrows show … view at source ↗
Figure 22
Figure 22. Figure 22: Request flow through the model-native computing architecture as a dual-plane swim￾lane diagram. Blue nodes belong to the probabilistic execution plane; orange nodes belong to the deterministic control plane. The split-coloured diamond represents the control-plane decision that gates both paths. Grey boxes span both planes. analog of the ALU and floating-point unit in classical architecture: it carries out… view at source ↗
Figure 23
Figure 23. Figure 23: ICA inter-layer interface contracts. Each card specifies the API operations, invari￾ant, and key metric for the corresponding layer boundary. Orange cards (L4–L6) belong to the deterministic control plane; blue cards (L1–L3) belong to the probabilistic execution plane; the amber card (L3–L4) marks the graded transition layer. 6.2.1 L1–L2 Interface: Tensor Operation Contract L2 requests tensor operations f… view at source ↗
Figure 24
Figure 24. Figure 24: Applicability heatmap of the six ICA design axioms across the six architecture layers. Primary indicates the axiom directly governs design decisions at that layer; secondary indicates significant but indirect applicability; tertiary indicates contextual relevance. A1 (Locality) is most critical at the inference and KV-cache layers; A5 (Least Privilege) and A6 (Observability) dominate the interface and orc… view at source ↗
Figure 25
Figure 25. Figure 25: Two perspectives on the attention retention rate β(L). Left: Empirically observed U-shaped curve (Liu et al. [93]): information at the beginning and end of the context is recalled well, while the middle region suffers a sharp drop in retention. Right: Exponential decay model β(L) ≈ β0e −λL/C adopted in Heuristic II for analytical tractability; the shaded area represents the effective working set Weff = C … view at source ↗
Figure 26
Figure 26. Figure 26: Characteristic curves of three design heuristics. (a) Semantic Locality: S = 1/((1 − H)+H/α). (b) Context Budget: Weff = C ·β¯ (representative exponential-decay envelope shown). (c) Agent Speedup: S = 1/((1 − F) + F/(NE)) with F = 0.8. Shaded areas indicate typical operating ranges. 8 Agent Framework Evolution: From ReAct to the Model-Native OS An agent framework is a software system built around a large … view at source ↗
Figure 27
Figure 27. Figure 27: Agent framework evolution from single-turn reasoning to model-native operating systems. Top: representative frameworks with analogous OS milestones. Bottom: quantitative trajectory of parallelizable fraction (F), orchestration efficiency (E), and agent speedup (Sagent) per Heuristic III; green band marks the target regime (F >0.8, E>0.8). usage followed a single-turn, question-answer pattern: the user sup… view at source ↗
Figure 28
Figure 28. Figure 28: Generation I agent frameworks versus 1950s batch-processing systems. The ReAct Think–Act–Observe loop is structurally analogous to a single-job batch system: one task executes to completion with no concurrency, no persistent state, and no resource isolation. The core advance was establishing the interleaved reasoning–action paradigm as the fundamental agent execution primitive. 8.2 Generation II: Persiste… view at source ↗
Figure 29
Figure 29. Figure 29: Generation II agent frameworks versus 1960s multiprogramming systems. Auto￾GPT’s task queue, vector-database memory, and Voyager’s skill library correspond to a multi￾programming system’s job scheduler, shared memory, and program library. The defining advance is persistence and autonomous goal decomposition; the remaining gap is the absence of isolation and failure recovery. 8.3 Generation III: General-Pu… view at source ↗
Figure 30
Figure 30. Figure 30: Generation III agent frameworks versus time-sharing operating systems (Unix/Mul￾tics). AIOS’s Agent Scheduler, Context Manager, and Tool Manager correspond to a time-sharing OS’s process scheduler, MMU address isolation, and file permission system. The key advance: each agent becomes a schedulable process with an independent state space and resource budget. 8.4 Generation IV: Engineering-Grade Operating S… view at source ↗
Figure 31
Figure 31. Figure 31: Generation IV agent frameworks versus modern operating systems (Linux/Win￾dows NT). Codex’s sub-agent spawning maps to fork(); its sandbox maps to chroot/container isolation; Claude Code’s hooks parallel Linux Security Module (LSM) hooks; and the approval gate corresponds to sudo privilege elevation. The key advance: confronting real-world codebases with irreversible side effects demands formal isolation … view at source ↗
Figure 32
Figure 32. Figure 32: Generation V agent frameworks versus trustworthy operating systems (SELin￾ux/seL4). ArbiterOS’s probabilistic CPU and deterministic governor correspond to user￾mode/kernel-mode separation; CaMeL’s one-time capability tokens correspond to POSIX fine￾grained capabilities; and IronClaw’s behavioral modeling corresponds to seL4’s formal verification. The key advance: security is embedded as a first architectu… view at source ↗
Figure 33
Figure 33. Figure 33: Capability maturity matrix across the five generations of agent framework evolution. Each cell reflects the maturity level of a key architectural dimension: None (—) through High. The progressive darkening from Generation I to Generation V illustrates the systematic increase in both the parallelisable fraction F and the orchestration efficiency E captured by Heuristic III. and E (the orchestration efficie… view at source ↗
Figure 34
Figure 34. Figure 34: Illustrative quantitative trajectory of agent framework parameters across five gen￾erations, estimated from Heuristic III (Sagent = 1/((1 − F) + F/(N·E)) with representative N values per generation). (a) Parallelizable fraction F grows from near zero (Gen I serial loop) to ≈0.72 (Gen V governed OS). (b) Orchestration efficiency E initially near 1 (trivial single-agent overhead) but dips in Gen IV due to s… view at source ↗
Figure 35
Figure 35. Figure 35: illustrates the three-way trade-off. Any two of the three objectives can be jointly optimised, but improving all three simultaneously is structurally infeasible. Low Latency (fast response) Low Cost (efficient resource) High Throughput (batch efficiency) small batch, dedicated GPU more GPUs, higher cost large batch, queue longer Interactive agent Batch pipeline RT serving Classical analogy: response time … view at source ↗
Figure 36
Figure 36. Figure 36: illustrates the lifecycle and ICA layer mapping of these three state types. Ephemeral State Scope: single inference step Persistence: none required Analogy: CPU registers Examples: KV activations, intermediate logits Session State Scope: within-task agent chain Consistency: causal ordering Analogy: distributed message passing (Raft) Examples: agent handoff state, intermediate results Committed State Scope… view at source ↗
Figure 37
Figure 37. Figure 37: Cross-session state management in an AI agent system. A three-day project illus￾trates the three categories of persistent state: ephemeral state (current inference step only, like CPU registers), session state with causal ordering (task history requiring causal consistency), and committed state (permanent changes to files or databases, requiring full audit and rollback support). The Day 2 concurrent scena… view at source ↗
Figure 38
Figure 38. Figure 38: maps each challenge to the ICA layers it impacts most, showing primary, sec￾ondary, and tertiary impact levels across the full stack. L1 Physical L2 Inference L3 Context L4 Semantic L5 Orchestration L6 Application Probabilistic Deterministic Latency–Throughput –Cost Trilemma State Management Interface Drift Security & Privacy Governance & Accountability Primary Primary Tertiary Secondary – – – Primary Pri… view at source ↗
Figure 39
Figure 39. Figure 39: Governance as a structural division of labor in the dual-plane architecture. The probabilistic execution plane (left) handles what the agent can reason about: understanding user intent, generating action sequences, interpreting results, and producing outputs. The deterministic control plane (right) enforces what should happen: checking capability permissions, approving or rejecting tool calls, maintaining… view at source ↗
Figure 40
Figure 40. Figure 40: Structural parallels between CPU and LLM system evolution. Curved arrows connect milestones that share the same architectural driver: Dennard scaling collapse and the inference energy wall both arise from a Power Wall; the multi-core and MoE transitions reflect a Parallelism Shift; big.LITTLE and heterogeneous model orchestration embody Heterogeneity; x86 ISA and the token/tool interface represent ISA Abs… view at source ↗
Figure 41
Figure 41. Figure 41: Three-stage evolution of heterogeneous scheduling, mapped between classical ARM big.LITTLE (top half of each panel) and LLM model orchestration (bottom half). Stage 1 (cluster migration / single model) activates only one resource type at a time. Stage 2 (HMP / plan-execute routing) runs both simultaneously with coarse-grained task assignment. Stage 3 (EAS / cost￾capability routing) uses real-time telemetr… view at source ↗
Figure 42
Figure 42. Figure 42: The megahertz myth (left) and its LLM analog (right). In the 1990s, CPU clock speed was marketed as the definitive performance metric, yet SPEC benchmarks revealed that real-workload performance depended on pipeline depth, cache size, and IPC—not clock rate alone. LLM evaluation faces the same maturation: MMLU leaderboard averages are the “megahertz” of intelligent systems [48], obscuring the fact that a … view at source ↗
Figure 43
Figure 43. Figure 43: OS-level heterogeneous scheduling: Linux Energy Aware Scheduling (EAS) versus the intelligent OS Capability-Cost Model. EAS (Linux 5.0, 2019) uses an Energy Model to route each task to the most energy-efficient capable CPU and falls back to load balancing under overutilization. The Capability-Cost Model is the direct analog: a Task Director routes each subtask to the optimal compute core (large model, sma… view at source ↗
Figure 44
Figure 44. Figure 44: Parallel disaggregation trajectories: semiconductor industry (top) versus the emerg￾ing AI industry (bottom). The semiconductor industry evolved through four stages over 40 years: vertically integrated IDMs (Intel, TI) → pure-play foundry (TSMC, 1987) → fabless design (NVIDIA, Qualcomm) → IP licensing (ARM). The AI industry currently occupies the IDM phase (NVIDIA controls hardware, software, and ecosyste… view at source ↗
Figure 45
Figure 45. Figure 45: Three strategies for protecting model weights from copyability, arranged along a physicalization spectrum. TEE encryption (NVIDIA H100 Confidential Computing) decrypts weights only inside a hardware-verified enclave. PUF-based compute-in-memory binds correct inference to a device-specific silicon fingerprint. Silicon burning (Taalas HC1) permanently embeds weights into Mask ROM, creating an inseparable mo… view at source ↗
Figure 46
Figure 46. Figure 46: Predicted three-layer Fabless-AI industry structure (right) and its semiconductor ana￾log (left). Fabless model designers (the future ARM/Qualcomm of AI) design architectures and weights without owning compute. AI foundries (TSMC/Samsung plus hyperscale cloud) manu￾facture dedicated inference chips and provide large-scale training. OS-layer gateways (ChatGPT, Claude, Gemini) control the user interface and… view at source ↗
Figure 47
Figure 47. Figure 47: Open-source ecosystem mapped to the six-layer ICA architecture. maps them to non-contiguous physical GPU memory through a block table, effectively replicating the virtual memory management of a conventional operating system [78, 149]. Its Automatic Prefix Caching feature extends this analogy by enabling cross-request reuse of shared prompt prefixes, functionally equivalent to shared read-only code pages i… view at source ↗
Figure 48
Figure 48. Figure 48: Mapping of the five implementation recommendations (R1–R5) onto ICA layers (L1–L6) and their governing design axioms. Key cells indicate the primary ICA layer at which the recommendation must be enforced; Partial cells indicate supporting roles. R1 (control/data plane separation) targets L4–L5; R2 (context tiering) targets L3; R3 (versioning) targets L4; R4 (resource quotas) and R5 (failure recovery) both… view at source ↗
Figure 49
Figure 49. Figure 49: Research roadmap organized by ICA layer and time horizon. Horizontal lanes corre￾spond to ICA layers L1–L6; columns represent short-term (1–2 yr), mid-term (2–4 yr), and long￾term (4–8 yr) research phases. Bar colors indicate which heuristic or axiom each topic targets: Heur. I (Semantic Locality, blue), Heur. II (Context Budget, orange), Heur. III (Agent Speedup, red), and Axiom (architectural principles… view at source ↗
Figure 50
Figure 50. Figure 50: Short-term research targets: current state versus goal for the four key metrics. KV cache hit rate H must rise from the current 0.50–0.70 range to above 0.85 to realize the 5–10× speedup predicted by Heuristic I (Semantic Locality). Context utilization Weff/C must improve by at least 50% through the unified context compiler. Orchestration efficiency E must exceed 0.60 to yield meaningful agent speedup und… view at source ↗
Figure 51
Figure 51. Figure 51: Research enabling dependencies across the three roadmap phases. Short-term items produce validated metrics (H, Weff, E) and interface contracts that mid-term work extends into cross-layer co-design. Mid-term results create the architectural substrate (semantic memory, shared-state consistency, security primitives) on which long-term innovations (Intelligent ISA, stateful agents, distributed fabric, physic… view at source ↗
Figure 52
Figure 52. Figure 52: Agent permission model (right) mapped onto the classical OS process permission model (left). The four governance questions—who may create agents, what resources they may access, how permissions are scoped per operation, and how misbehaving agents are terminated— have direct OS analogs in process-creation privileges, file permission bits, POSIX capabilities, and sandbox/signal-based termination. Under the … view at source ↗
Figure 53
Figure 53. Figure 53: Four sequential governance gates applied to every memory block entering the L3 context layer. Gate 1 decides whether information should be persisted or scoped to the current session. Gate 2 assigns a Time-To-Live annotation balancing EU AI Act retention requirements against GDPR’s Right to Be Forgotten. Gate 3 enforces reference-counted deletion: physical removal requires all referencing agents to consent… view at source ↗
Figure 54
Figure 54. Figure 54: System-level behavioral-trace explainability decomposed by ICA layer. L5 (Orches￾tration) logs privilege requests, approval decisions, and state commits. L4 (Semantic Interface) logs tool invocations, arguments, results, and permission denials. L3 (Context Management) logs context-compilation decisions, summary provenance, and eviction events. L2 (Inference Serv￾ing) logs cache hits, misses, and the evict… view at source ↗
read the original abstract

Large language models are undergoing a transition from model technology to system technology. Engineering challenges like cache reuse, context capacity, agent scheduling, and permission control resemble classical computer systems problems. This raises a question: if we treat the LLM as a CPU, KV cache as processor cache, context window as main memory, and agent framework as an operating system, can decades of computer architecture wisdom guide next generation model native systems? This paper pursues this analogy as a visionary survey. We map computer architecture concepts onto the emerging model native stack, survey literature across LLM as OS, memory management, agent frameworks, tool protocols, multi agent coordination, cognitive architectures, and safety governance, finding that each addresses a different layer without a unifying model. We propose the Intelligent Computing Architecture (ICA): six functional layers with interface contracts and design axioms. We resolve the tension over whether the LLM resembles a CPU or OS via a dual plane architecture a probabilistic execution plane (what can be computed) and a deterministic control plane (what should be computed), with every layer passing through as a graded crossover. We propose three Amdahl style design heuristics Semantic Locality, Context Budget, and Agent Speedup as organizing back of envelope models, illustrate their parameter ranges with published data, and identify predictive validation as the principal open task. We articulate analogy boundaries, note differences between silicon and model era architectures, and propose a research roadmap. This is a conceptual and survey contribution with no new experimental results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript is a conceptual survey and proposal arguing that classical computer architecture principles can be transferred to model-native systems by treating LLMs as CPUs, KV caches as processor caches, context windows as main memory, and agent frameworks as operating systems. It surveys literature on LLM-as-OS, memory management, agent frameworks, tool protocols, multi-agent coordination, cognitive architectures, and safety, identifies the lack of a unifying model, and proposes the Intelligent Computing Architecture (ICA) as six functional layers with interface contracts and design axioms. The central contribution is a dual-plane architecture (probabilistic execution plane for what can be computed and deterministic control plane for what should be computed) with graded crossovers at every layer, plus three Amdahl-style heuristics (Semantic Locality, Context Budget, Agent Speedup) illustrated with published data ranges; predictive validation is explicitly identified as the principal open task, along with articulation of analogy boundaries.

Significance. If the proposed ICA framework and dual-plane organization prove to be a productive organizing lens, the work could help structure the emerging field of model-native systems by providing a common vocabulary and set of interface contracts that connect disparate research threads. The manuscript's explicit acknowledgment that it contains no new experimental results, its framing of the three heuristics as back-of-the-envelope models whose predictive power remains to be tested, and its discussion of analogy boundaries constitute appropriate scholarly restraint and strengthen the contribution as a survey rather than an overclaimed derivation.

minor comments (3)
  1. [Abstract] Abstract: the sentence 'We resolve the tension over whether the LLM resembles a CPU or OS via a dual plane architecture a probabilistic execution plane...' is missing punctuation and a connecting phrase, reducing readability of the central architectural claim.
  2. [Abstract] Abstract: 'back of envelope models' should be hyphenated as 'back-of-the-envelope models' for standard usage.
  3. The manuscript would benefit from an explicit table or diagram summarizing the six ICA layers, their interface contracts, and how the probabilistic/deterministic planes cross each layer, to make the proposal more immediately usable by readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and constructive summary of our manuscript. The positive assessment of the ICA framework, dual-plane architecture, and the explicit acknowledgment of its conceptual nature and lack of new experiments aligns with our intent. The recommendation for minor revision is noted. No specific major comments were provided in the report, so we have no point-by-point revisions to address.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a conceptual survey and architectural proposal that maps existing literature onto a six-layer ICA model with dual probabilistic/deterministic planes and three Amdahl-style heuristics. No equations, fitted parameters, or closed-form derivations appear; the contribution consists of organizing analogies and interface contracts rather than any result that reduces to its own inputs by construction. Self-citations, if present, are not load-bearing for any central claim, and the paper explicitly flags predictive validation as future work rather than asserting transfer guarantees. The derivation chain is therefore self-contained as a framing exercise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The paper introduces several new conceptual entities and a domain-level assumption without independent evidence or validation beyond the computer-architecture analogy.

axioms (1)
  • domain assumption Computer-architecture principles transfer usefully to LLM-based systems via the stated component mappings
    The entire proposal rests on this transferability holding sufficiently for design guidance.
invented entities (3)
  • Intelligent Computing Architecture (ICA) with six functional layers no independent evidence
    purpose: Unifying model for the model-native stack
    Newly defined in the paper.
  • Dual probabilistic execution plane and deterministic control plane no independent evidence
    purpose: Resolve CPU-vs-OS tension in LLM systems
    Invented construct for the proposed architecture.
  • Semantic Locality, Context Budget, and Agent Speedup heuristics no independent evidence
    purpose: Amdahl-style back-of-envelope design rules
    Newly proposed organizing models.

pith-pipeline@v0.9.1-grok · 5805 in / 1463 out tokens · 34956 ms · 2026-06-28T22:10:35.333653+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

179 extracted references · 17 linked inside Pith

  1. [1]

    Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint, 2024

    Marah Abdin et al. Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint, 2024

  2. [2]

    Neuromor- phic principles for efficient large language models on intel loihi 2.arXiv preprint arXiv:2503.18002, 2025

    Steven Abreu, Sumit Bam Shrestha, Rui-Jie Zhu, and Jason Eshraghian. Neuromor- phic principles for efficient large language models on intel loihi 2.arXiv preprint arXiv:2503.18002, 2025

  3. [3]

    Taming throughput-latency tradeoff in llm inference with sarathi-serve

    Anshuman Agrawal, Vivek Kedia, Jayashree Panwar, Aayush Mohanty, Aviral Malviya, Nikhil Mangal, Apurv Arya, et al. Taming throughput-latency tradeoff in llm inference with sarathi-serve. InUSENIX Symposium on Operating Systems Design and Implemen- tation, 2024

  4. [4]

    Gene M. Amdahl. Validity of the single processor approach to achieving large scale com- puting capabilities. InProceedings of the April 18–20, 1967, Spring Joint Computer Con- ference (AFIPS), pages 483–485, 1967

  5. [5]

    2026 agentic coding trends report: How coding agents are reshaping software development.https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding% 20Trends%20Report.pdf, 2026

    Anthropic. 2026 agentic coding trends report: How coding agents are reshaping software development.https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding% 20Trends%20Report.pdf, 2026. Accessed: 2026-05-29

  6. [6]

    Configure the sandboxed bash tool – claude code docs, 2026

    Anthropic. Configure the sandboxed bash tool – claude code docs, 2026. Accessed: 2026- 05-29

  7. [7]

    Connect claude code to tools via mcp – claude code docs, 2026

    Anthropic. Connect claude code to tools via mcp – claude code docs, 2026. Accessed: 2026-05-29

  8. [8]

    Create custom subagents – claude code docs, 2026

    Anthropic. Create custom subagents – claude code docs, 2026. Accessed: 2026-05-29

  9. [9]

    Extend claude with skills – claude code docs, 2026

    Anthropic. Extend claude with skills – claude code docs, 2026. Accessed: 2026-05-29

  10. [10]

    How claude remembers your project – claude code docs, 2026

    Anthropic. How claude remembers your project – claude code docs, 2026. Accessed: 2026-05-29

  11. [11]

    Overview – claude code docs, 2026

    Anthropic. Overview – claude code docs, 2026. Accessed: 2026-05-29

  12. [12]

    Security – claude code docs, 2026

    Anthropic. Security – claude code docs, 2026. Accessed: 2026-05-29

  13. [13]

    Apple M1 chip: An Apple silicon breakthrough, 2020

    Apple. Apple M1 chip: An Apple silicon breakthrough, 2020. Announced November 2020. 109

  14. [14]

    big.LITTLE processing with ARM cortex-a15 & cortex-a7

    ARM. big.LITTLE processing with ARM cortex-a15 & cortex-a7. ARM Whitepaper, 2012

  15. [15]

    Arpaci-Dusseau and Andrea C

    Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. Operating systems: Three easy pieces, 2023. Version 1.10; accessed 2026-05-29

  16. [16]

    What is the autogpt platform?, 2026

    AutoGPT. What is the autogpt platform?, 2026. Accessed: 2026-05-29

  17. [17]

    Longbench v2: Towards deeper understanding and reasoning on realistic long-context multitasks

    Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, et al. Longbench v2: Towards deeper understanding and reasoning on realistic long-context multitasks. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3639–3664, 2025

  18. [18]

    Hier- archical caching for agentic workflows: A multi-level architecture to reduce tool execution overhead.Machine Learning and Knowledge Extraction, 8(2):30, 2026

    Farhana Begum, Craig Scott, Kofi Nyarko, Mansoureh Jeihani, and Fahmi Khalifa. Hier- archical caching for agentic workflows: A multi-level architecture to reduce tool execution overhead.Machine Learning and Knowledge Extraction, 8(2):30, 2026

  19. [19]

    Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020

  20. [20]

    Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence

    Weize Chen, Ziming You, Ran Li, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, et al. Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence. InInternational Conference on Learning Representa- tions, volume 2025, pages 36374–36411, 2025

  21. [21]

    AI industry landscape report 2025

    China Europe International Business School (CEIBS). AI industry landscape report 2025. Industry Report, 2025. Analysis of AI industry structure, agent market competition and ecosystem dynamics

  22. [22]

    Systems security foundations for agentic computing.arXiv preprint arXiv:2512.01295, 2025

    MihaiChristodorescu, EarlenceFernandes, AshishHooda, SomeshJha, JohannRehberger, Kamalika Chaudhuri, Xiaohan Fu, Khawaja Shams, Guy Amir, Jihye Choi, et al. Systems security foundations for agentic computing.arXiv preprint arXiv:2512.01295, 2025

  23. [23]

    The right to be forgotten vs

    Cloud Security Alliance. The right to be forgotten vs. AI’s infinite memory. https://www.dpo-india.com/Blogs/right-to-forgot/, 2025. Accessed: 2026-05-29

  24. [24]

    Devin: The first autonomous AI software engineer, 2024

    Cognition AI. Devin: The first autonomous AI software engineer, 2024. Announced March 2024; accessed 2026-05-29

  25. [25]

    Intense competition across the AI stack

    Computer & Communications Industry Association (CCIA). Intense competition across the AI stack. Policy Analysis, 2025. Analysis of competition dynamics across the AI technology stack

  26. [26]

    An experimental time-sharing system

    Fernando J Corbató, Marjorie Merwin-Daggett, and Robert C Daley. An experimental time-sharing system. InProceedings of the May 1-3, 1962, spring joint computer confer- ence, pages 335–344, 1962

  27. [27]

    Introduction and overview of the multics sys- tem

    Fernando J Corbató and Victor A Vyssotsky. Introduction and overview of the multics sys- tem. InProceedings of the November 30–December 1, 1965, fall joint computer conference, part I, pages 185–196, 1965

  28. [28]

    Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J

    James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. Spanner: Google’s globally-distributed database. InUSENIX Symposium on Operating Systems Design and Implementation, 2012. 110

  29. [29]

    CL1: The first biological computer, 2025

    Cortical Labs. CL1: The first biological computer, 2025. Commercial biological computing system; accessed 2026-05-29

  30. [30]

    xv6: a simple, unix-like teaching operating system, 2022

    Russ Cox, Frans Kaashoek, and Robert Morris. xv6: a simple, unix-like teaching operating system, 2022

  31. [31]

    Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

    Damai Dai, Chengqi Deng, Chenggang Zhao, RX Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Yu Wu, et al. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1280–1297, 2024

  32. [32]

    Flashattention-2: Faster attention with better parallelism and work partition- ing

    Tri Dao. Flashattention-2: Faster attention with better parallelism and work partition- ing. InInternational Conference on Learning Representations, volume 2024, pages 35549– 35562, 2024

  33. [33]

    Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashatten- tion: Fast and memory-efficient exact attention with io-awareness.Advances in Neural Information Processing Systems, 35:16344–16359, 2022

  34. [34]

    Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

  35. [35]

    Dennard, Fritz H

    Robert H. Dennard, Fritz H. Gaensslen, Hwa-Nien Yu, V. Leo Rideout, Ernest Bassous, and Andre R. LeBlanc. Design of ion-implanted MOSFET’s with very small physical dimensions.IEEE Journal of Solid-State Circuits, 9(5):256–268, 1974

  36. [36]

    Peter J. Denning. Thrashing: Its causes and prevention. InProceedings of the AFIPS Fall Joint Computer Conference, pages 915–922, 1968

  37. [37]

    Working set analytics.ACM Computing Surveys (CSUR), 53(6):1–36, 2021

    Peter J Denning. Working set analytics.ACM Computing Surveys (CSUR), 53(6):1–36, 2021

  38. [38]

    Mcp adoption statistics 2026: Model context protocol, 2026

    Digital Applied. Mcp adoption statistics 2026: Model context protocol, 2026. Accessed: 2026-05-29

  39. [39]

    Longrope: Extending llm context window beyond 2 million tokens.International Conference on Machine Learning, 2024

    Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. Longrope: Extending llm context window beyond 2 million tokens.International Conference on Machine Learning, 2024

  40. [40]

    Sea change in software development: Economic and productivity analysis of the AI-powered developer lifecycle.arXiv preprint arXiv:2306.15033, 2023

    Thomas Dohmke, Marco Iansiti, and Greg Richards. Sea change in software development: Economic and productivity analysis of the AI-powered developer lifecycle.arXiv preprint arXiv:2306.15033, 2023

  41. [41]

    Neuronal wiring diagram of an adult brain.Nature, 634(8032):124–138, 2024

    Sven Dorkenwald, Arie Matsliah, Amy R Sterling, Philipp Schlegel, Szi-Chieh Yu, Claire E McKellar, Albert Lin, Marta Costa, Katharina Eichler, Yijie Yin, et al. Neuronal wiring diagram of an adult brain.Nature, 634(8032):124–138, 2024

  42. [42]

    What every programmer should know about memory.Red Hat, Inc., 2007

    Ulrich Drepper. What every programmer should know about memory.Red Hat, Inc., 2007

  43. [43]

    Memory for autonomous llm agents: Mechanisms, evaluation, and emerging frontiers.arXiv preprint arXiv:2603.07670, 2026

    Pengfei Du. Memory for autonomous llm agents: Mechanisms, evaluation, and emerging frontiers.arXiv preprint arXiv:2603.07670, 2026. 111

  44. [44]

    Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. A survey of agent interoperability protocols: Model context protocol (mcp), agent communication protocol (acp), agent-to-agent protocol (a2a), and agent network protocol (anp).arXiv preprint arXiv:2505.02279, 2025

  45. [45]

    Amant, Karthikeyan Sankaralingam, and Doug Burger

    Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. InProceedings of the 38th annual international symposium on Computer architecture, pages 365–376, 2011

  46. [46]

    Euartificialintelligenceact: Officialdevelopmentsandcompliance,

    EuropeanCommission. Euartificialintelligenceact: Officialdevelopmentsandcompliance,

  47. [47]

    Accessed: 2026-05-29

  48. [48]

    Switch transformers: Scaling to tril- lion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to tril- lion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

  49. [49]

    Line goes up? inherent limitations of benchmarks for evaluating large language models.arXiv preprint arXiv:2502.14318, 2025

    James Fodor. Line goes up? inherent limitations of benchmarks for evaluating large language models.arXiv preprint arXiv:2502.14318, 2025

  50. [50]

    Top strategic technology trends 2026: AI and beyond

    Gartner. Top strategic technology trends 2026: AI and beyond. Gartner Research, 2026. Prediction on AI technology trends

  51. [51]

    Llm as os, agents as apps: Envisioning aios, agents and the aios-agent ecosystem.arXiv preprint arXiv:2312.03815, 2023

    Yingqiang Ge, Yujie Ren, Wenyue Hua, Shuyuan Xu, Juntao Tan, and Yongfeng Zhang. Llm as os, agents as apps: Envisioning aios, agents and the aios-agent ecosystem.arXiv preprint arXiv:2312.03815, 2023

  52. [52]

    Ai and memory wall.IEEE Micro, 44(3):33–39, 2024

    Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W Mahoney, and Kurt Keutzer. Ai and memory wall.IEEE Micro, 44(3):33–39, 2024

  53. [53]

    Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services.SIGACT News, 33(2):51–59, 2002

    Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services.SIGACT News, 33(2):51–59, 2002

  54. [54]

    Prompt cache: Modular attention reuse for low-latency inference.Proceedings of Machine Learning and Systems, 6:325–338, 2024

    In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. Prompt cache: Modular attention reuse for low-latency inference.Proceedings of Machine Learning and Systems, 6:325–338, 2024

  55. [55]

    Spinnaker2: A large-scale neuromorphic system for event-based and asynchronous machine learning.arXiv preprint arXiv:2401.04491, 2024

    Hector A Gonzalez, Jiaxin Huang, Florian Kelber, Khaleelulla Khan Nazeer, Tim Langer, Chen Liu, Matthias Lohrmann, Amirhossein Rostami, Mark Schöne, Bernhard Vogginger, et al. Spinnaker2: A large-scale neuromorphic system for event-based and asynchronous machine learning.arXiv preprint arXiv:2401.04491, 2024

  56. [56]

    A2A: Agent-to-agent protocol, 2025

    Google. A2A: Agent-to-agent protocol, 2025. Accessed: 2026-05-29

  57. [57]

    Google donates agent2agent (a2a) protocol to the linux foundation, 2025

    Google Cloud. Google donates agent2agent (a2a) protocol to the linux foundation, 2025. Accessed: 2026-05-29

  58. [58]

    Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

  59. [59]

    Hennessy and David A

    John L. Hennessy and David A. Patterson.Computer Architecture: A Quantitative Ap- proach. Morgan Kaufmann, 6 edition, 2017

  60. [60]

    Hennessy and David A

    John L. Hennessy and David A. Patterson. A new golden age for computer architecture. Communications of the ACM, 62(2):48–60, 2019. Turing Lecture

  61. [61]

    L2mac: Large language model automatic computer for extensive code generation

    Samuel Holt, Max Ruiz Luyten, and Mihaela van der Schaar. L2mac: Large language model automatic computer for extensive code generation. InInternational Conference on Learning Representations, volume 2024, pages 36762–36822, 2024. 112

  62. [62]

    Metagpt: Meta program- ming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Steven Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta program- ming for a multi-agent collaborative framework. InInternational Conference on Learning Representations, volume 2024, pages 23247–23275, 2024

  63. [63]

    Coleman Hooper, Sehoon Kim, others, and Michael W. Mahoney. KVQuant: Towards 10 million context length LLM inference with KV cache quantization. InAdvances in Neural Information Processing Systems, 2024

  64. [64]

    RULER: What’s the real context size of your long-context language models?arXiv preprint arXiv:2404.06654, 2024

    Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. RULER: What’s the real context size of your long-context language models?arXiv preprint arXiv:2404.06654, 2024

  65. [65]

    Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model

    Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32779–32798, 2025

  66. [66]

    Text generation inference documentation, 2026

    Hugging Face. Text generation inference documentation, 2026. Accessed: 2026-05-29

  67. [67]

    Intel thread director technology

    Intel. Intel thread director technology. Intel Developer Documentation, 2022. Related to 12th Gen Alder Lake heterogeneous scheduling

  68. [68]

    Intel builds world’s largest neuromorphic system to enable more sustainable ai

    Intel. Intel builds world’s largest neuromorphic system to enable more sustainable ai. Intel Labs, 2024. Announced April 2024

  69. [69]

    Intel 64 and ia-32 architectures software developer’s manual, 2026

    Intel. Intel 64 and ia-32 architectures software developer’s manual, 2026. Accessed: 2026- 05-29

  70. [70]

    Efficient context management for LLM coding agents, 2025

    JetBrains Research. Efficient context management for LLM coding agents, 2025. Accessed: 2026-05-29

  71. [71]

    Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts.arXiv preprint arXiv:2401.04088, 2024

  72. [72]

    Swe-bench: Can language models resolve real-world github issues? InInternational Conference on Learning Representations, volume 2024, pages 54107–54157, 2024

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues? InInternational Conference on Learning Representations, volume 2024, pages 54107–54157, 2024

  73. [73]

    In vitro neurons learn and exhibit sentience when embodied in a simulated game-world.Neuron, 110(23):3952–3969, 2022

    Brett J Kagan, Andy C Kitchen, Nhi T Tran, Forough Habibollahi, Moein Khajehne- jad, Bradyn J Parker, Anjali Bhat, Ben Rollo, Adeel Razi, and Karl J Friston. In vitro neurons learn and exhibit sentience when embodied in a simulated game-world.Neuron, 110(23):3952–3969, 2022

  74. [74]

    Memory OS of AI agent

    Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory OS of AI agent. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

  75. [75]

    LLM as OS

    Andrej Karpathy. LLMs are the kernel process of a new operating system. Public remark, UC Berkeley BAIR colloquium and social media, October 2023, 2023. Widely circulated articulation of the “LLM as OS” analogy

  76. [76]

    Quixer: A quantum transformer model.arXiv preprint arXiv:2406.04305, 2024

    Nikhil Khatri, Gabriel Matos, Luuk Coopmans, and Stephen Clark. Quixer: A quantum transformer model.arXiv preprint arXiv:2406.04305, 2024. 113

  77. [77]

    Klemmer, Stefan Albert Horstmann, and Nikhil Patnaik

    Jan H. Klemmer, Stefan Albert Horstmann, and Nikhil Patnaik. Using AI assistants in software development: A qualitative study on security practices and concerns.arXiv preprint arXiv:2405.06371, 2024

  78. [78]

    Single-isa heterogeneous multi-core architectures: The potential for pro- cessor power reduction

    Rakesh Kumar, Keith I Farkas, Norman P Jouppi, Parthasarathy Ranganathan, and Dean M Tullsen. Single-isa heterogeneous multi-core architectures: The potential for pro- cessor power reduction. InProceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36., pages 81–92. IEEE, 2003

  79. [79]

    Gonzalez, Hao Zhang, and Ion Stoica

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention.Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2023

  80. [80]

    Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, et al. Can long-context language models subsume retrieval, RAG, SQL, and more?arXiv preprint arXiv:2406.13121, 2024

Showing first 80 references.