FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.
Towards robust agentic cuda kernel benchmarking, verification, and optimization
9 Pith papers cite this work. Polarity classification is still indexing.
years
2026 9representative citing papers
CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.
AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.
KernelBenchX benchmark shows task category explains nearly three times more variance in LLM kernel correctness than method choice, iterative refinement boosts correctness but reduces performance, and quantization remains unsolved.
Kernel Contracts is a specification language that formalizes correctness requirements for ML kernels to ensure consistent results across heterogeneous silicon platforms.
KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.
Metal-Sci is a benchmark and harness for LLM evolutionary optimization of Apple Silicon Metal kernels that uses held-out sizes to detect silent regressions missed by in-distribution scores.
HTAM builds a Hierarchical Transition Graph to organize coarse global directions and detailed local strategies for guiding LLM-based CUDA kernel optimization, improving results on KernelBench.
KEET uses LLM agents to generate data-grounded natural language explanations of performance issues in GPU kernels from Nsight Compute profiles and shows these improve downstream LLM-based optimization tasks.
citing papers explorer
-
Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages
KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.