Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

· 2026 · cs.CL · arXiv 2603.28342

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA GPUs and Maca on MetaX GPUs. On the training side, we convert long-horizon evolution trajectories into step-centric supervision and reinforcement learning signals by retaining correctness-preserving, high-gain revisions, so that the model is optimized as a strong local improver inside the evolutionary loop rather than as a one-shot generator. Under a unified evolutionary protocol, Kernel-Smith-235B-RL achieves state-of-the-art overall performance on KernelBench with Nvidia Triton backend, attaining the best average speedup ratio and outperforming frontier proprietary models including Gemini-3.0-pro and Claude-4.6-opus. We further validate the framework on the MetaX MACA backend, where our Kernel-Smith-MACA-30B surpasses large-scale counterparts such as DeepSeek-V3.2-think and Qwen3-235B-2507-think, highlighting potential for seamless adaptation across heterogeneous platforms. Beyond benchmark results, the same workflow produces upstream contributions to production systems including SGLang and LMDeploy, demonstrating that LLM-driven kernel optimization can transfer from controlled evaluation to practical deployment.

representative citing papers

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

cs.CL · 2026-06-27 · unverdicted · novelty 6.0

Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.

SpecGen: Accelerating Agentic Kernel Optimization with Speculative Generation

cs.DC · 2026-06-16 · unverdicted · novelty 6.0

SpecGen introduces speculative generation to fork non-reasoning kernel candidates during LLM reasoning traces, enabling early termination and parallel profiling to reduce end-to-end optimization time on H200 GPUs.

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks cs.CL · 2026-06-27 · unverdicted · none · ref 34 · internal anchor
Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

fields

years

verdicts

representative citing papers

citing papers explorer