KernelEvolve: Scaling agentic kernel coding for heterogeneous AI accelerators at meta

Gang Liao, Hongsen Qin, Ying Wang, Alicia Golden, Michael Kuchnik, Yavuz Yetim, Jia Jiunn Ang, Chunli Fu, Yihan He, Samuel Hsia, Zewei Jiang, Dianshi Li, Liyuan Li, Uladzimir Pashkevich, Varna Puvvada, CIDR’27, January 19-22 · 2027 · arXiv 2512.23236

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FastKernels: Benchmarking GPU Kernel Generation in Production

cs.LG · 2026-05-22 · conditional · novelty 8.0

FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

cs.AI · 2026-05-28 · unverdicted · novelty 7.0

PassNet provides a dataset of 18K graphs and PassBench for LLM-generated compiler passes, with fine-tuned models achieving 2.67x gains on long-tail tasks where TorchInductor underperforms.

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

cs.LG · 2026-05-27 · conditional · novelty 7.0

A two-stage agent skill system enables autonomous end-to-end deployment of eight decoder-only LLMs on AMD XDNA 2 NPU with numerical correctness in 0.5-4 hours each, generalizing from a human-guided Llama-3.2-1B reference.

Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.

Experience Graphs: The Data Foundation for Self-Improving Agents

cs.DB · 2026-06-29 · unverdicted · novelty 6.0

Trellis treats agent experience graphs as first-class database state so that search patterns become queries, enabling crash recovery, scaling, and closed-loop training as architectural byproducts.

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

cs.CL · 2026-06-27 · unverdicted · novelty 6.0

Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

cs.AI · 2026-05-15 · unverdicted · novelty 6.0

Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

cs.PL · 2026-05-02 · unverdicted · novelty 6.0

DITRON introduces a hierarchical multi-level tiling compiler for distributed tensor programs that matches or exceeds expert CUDA libraries with 6-30% speedups and has been deployed to improve training MFU by over 10% while saving hundreds of thousands of GPU hours monthly.

AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

cs.LG · 2026-03-24 · unverdicted · novelty 5.0

AscendOptimizer combines kernel rewinding for reusable experience with evolutionary search on hardware feedback to optimize Ascend NPU operators, delivering 1.21x geometric-mean speedup and faster performance on 53.47% of 101 tested operators versus baseline.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation cs.AI · 2026-05-28 · unverdicted · none · ref 3
PassNet provides a dataset of 18K graphs and PassBench for LLM-generated compiler passes, with fine-tuned models achieving 2.67x gains on long-tail tasks where TorchInductor underperforms.
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design cs.AI · 2026-05-15 · unverdicted · none · ref 2
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.

KernelEvolve: Scaling agentic kernel coding for heterogeneous AI accelerators at meta

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer