Autotriton: Automatic triton programming with reinforcement learning in llms

Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun · 2025 · arXiv 2507.05687

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

representative citing papers

FastKernels: Benchmarking GPU Kernel Generation in Production

cs.LG · 2026-05-22 · conditional · novelty 8.0

FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

cs.LG · 2026-05-06 · conditional · novelty 7.0 · 2 refs

KernelBenchX benchmark shows task category explains nearly three times more variance in LLM kernel correctness than method choice, iterative refinement boosts correctness but reduces performance, and quantization remains unsolved.

MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

MusaCoder combines kernel-oriented data synthesis, diversity-preserving fine-tuning, and stabilized RL with MooreEval to produce correct, fast GPU kernels, with its 27B model setting new SOTA on KernelBench and a MUSA variant.

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.

AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

cs.CL · 2026-03-30 · unverdicted · novelty 6.0

Kernel-Smith combines evolutionary search with RL post-training to generate optimized GPU kernels, achieving SOTA speedups on KernelBench that beat Gemini-3.0-pro and Claude-4.6-opus on NVIDIA Triton and generalize to MetaX MACA.

AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

cs.LG · 2026-03-24 · unverdicted · novelty 5.0

AscendOptimizer combines kernel rewinding for reusable experience with evolutionary search on hardware feedback to optimize Ascend NPU operators, delivering 1.21x geometric-mean speedup and faster performance on 53.47% of 101 tested operators versus baseline.

citing papers explorer

Showing 8 of 8 citing papers.

FastKernels: Benchmarking GPU Kernel Generation in Production cs.LG · 2026-05-22 · conditional · none · ref 10
FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.
AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents cs.CL · 2026-05-16 · unverdicted · none · ref 11
AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels cs.LG · 2026-05-06 · conditional · none · ref 5 · 2 links
KernelBenchX benchmark shows task category explains nearly three times more variance in LLM kernel correctness than method choice, iterative refinement boosts correctness but reduces performance, and quantization remains unsolved.
MusaCoder: Native GPU Kernel Generation with Full-Stack Training on Moore Threads GPU cs.CV · 2026-06-03 · unverdicted · none · ref 1
MusaCoder combines kernel-oriented data synthesis, diversity-preserving fine-tuning, and stabilized RL with MooreEval to produce correct, fast GPU kernels, with its 27B model setting new SOTA on KernelBench and a MUSA variant.
Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages cs.AI · 2026-05-27 · unverdicted · none · ref 15
KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.
AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code cs.CL · 2026-05-18 · unverdicted · none · ref 76
AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization cs.CL · 2026-03-30 · unverdicted · none · ref 11
Kernel-Smith combines evolutionary search with RL post-training to generate optimized GPU kernels, achieving SOTA speedups on KernelBench that beat Gemini-3.0-pro and Claude-4.6-opus on NVIDIA Triton and generalize to MetaX MACA.
AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization cs.LG · 2026-03-24 · unverdicted · none · ref 15
AscendOptimizer combines kernel rewinding for reusable experience with evolutionary search on hardware feedback to optimize Ascend NPU operators, delivering 1.21x geometric-mean speedup and faster performance on 53.47% of 101 tested operators versus baseline.

Autotriton: Automatic triton programming with reinforcement learning in llms

fields

years

verdicts

representative citing papers

citing papers explorer