CHIA introduces a framework for building and deploying agentic AI co-design flows as CHIA loops with tool nodes, reliability mechanisms, and five case-study demonstrations.
hub Canonical reference
Barbarians at the gate: How AI is upending systems research
Canonical reference. 75% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 18representative citing papers
Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.
SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.
CCL-Bench packages traces and metadata to compute detailed compute, memory, and communication efficiency metrics, surfacing performance insights unavailable from end-to-end benchmarks.
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
DWDP distributes MoE weights across GPUs for independent execution without collective synchronization, improving output TPS/GPU by 8.8 percent on GB200 NVL72 for DeepSeek-R1 under 8K input and 1K output lengths.
New Text-to-Big SQL metrics show that LLM agents must balance accuracy with cost and speed at scale, where GPT-4o trades some accuracy for up to 12x speedup and GPT-5.2 proves more cost-effective than Gemini 3 Pro on large inputs.
Symbolon learns diverse code transformations via search on small programs, distills them into agent skills, and applies them to improve KLEE symbolic execution, yielding 3.69x coverage gains and 21 new Linux kernel bugs.
An economy of agents using auctions and wealth accumulation produces emergent multi-step reasoning that outperforms monolithic baselines on five agentic tasks.
BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.
ScientistOne introduces Chain-of-Evidence and an audit system that achieves zero hallucinated references, perfect score verification, and top method-code alignment while matching or beating human experts on five frontier tasks and generalizing to six more.
A universal LLM optimizer for text artifacts achieves SOTA results on six tasks including tripling ARC-AGI accuracy and cutting cloud costs by 40% via cross-task transfer and side information.
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
R^3 optimizes full scientific applications on GPUs better than tuning kernel parameters or compiler flags alone while running nearly an order of magnitude faster than modern evolutionary search methods.
Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.
Automated architectural discovery engines can outperform human design teams by exploring massive design spaces and compressing development cycles from months to weeks.
DDS introduces typed contracts at intent, operator DAG, skills, and runtime layers to bound agentic search for data system compositions, achieving convergence on a trading workload where unbounded iteration fails.
PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.
citing papers explorer
-
SemaTune: Semantic-Aware Online OS Tuning with Large Language Models
SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.