KernelEvolve: Scaling agentic kernel coding for heterogeneous AI accelerators at meta

Gang Liao, Hongsen Qin, Ying Wang, Alicia Golden, Michael Kuchnik, Yavuz Yetim, Jia Jiunn Ang, Chunli Fu, Yihan He, Samuel Hsia, et al · 2025 · arXiv 2512.23236

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FastKernels: Benchmarking GPU Kernel Generation in Production

cs.LG · 2026-05-22 · conditional · novelty 8.0

FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

cs.AI · 2026-05-28 · unverdicted · novelty 7.0

PassNet provides a dataset of 18K graphs and PassBench for LLM-generated compiler passes, with fine-tuned models achieving 2.67x gains on long-tail tasks where TorchInductor underperforms.

Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

cs.AI · 2026-05-15 · unverdicted · novelty 6.0

Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

cs.PL · 2026-05-02 · unverdicted · novelty 6.0

DITRON introduces a hierarchical multi-level tiling compiler for distributed tensor programs that matches or exceeds expert CUDA libraries with 6-30% speedups and has been deployed to improve training MFU by over 10% while saving hundreds of thousands of GPU hours monthly.

AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

cs.LG · 2026-03-24 · unverdicted · novelty 5.0

AscendOptimizer combines kernel rewinding for reusable experience with evolutionary search on hardware feedback to optimize Ascend NPU operators, delivering 1.21x geometric-mean speedup and faster performance on 53.47% of 101 tested operators versus baseline.

citing papers explorer

Showing 6 of 6 citing papers.

FastKernels: Benchmarking GPU Kernel Generation in Production cs.LG · 2026-05-22 · conditional · none · ref 13
FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.
PassNet: Scaling Large Language Models for Graph Compiler Pass Generation cs.AI · 2026-05-28 · unverdicted · none · ref 3
PassNet provides a dataset of 18K graphs and PassBench for LLM-generated compiler passes, with fine-tuned models achieving 2.67x gains on long-tail tasks where TorchInductor underperforms.
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics cs.DC · 2026-04-08 · unverdicted · none · ref 24
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design cs.AI · 2026-05-15 · unverdicted · none · ref 2
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs cs.PL · 2026-05-02 · unverdicted · none · ref 14
DITRON introduces a hierarchical multi-level tiling compiler for distributed tensor programs that matches or exceeds expert CUDA libraries with 6-30% speedups and has been deployed to improve training MFU by over 10% while saving hundreds of thousands of GPU hours monthly.
AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization cs.LG · 2026-03-24 · unverdicted · none · ref 18
AscendOptimizer combines kernel rewinding for reusable experience with evolutionary search on hardware feedback to optimize Ascend NPU operators, delivering 1.21x geometric-mean speedup and faster performance on 53.47% of 101 tested operators versus baseline.

KernelEvolve: Scaling agentic kernel coding for heterogeneous AI accelerators at meta

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer