A self-improving LLM agent with optimization memory raises average kernel throughput from 45-49% to 59-61% of peak on Trainium accelerators and matches proprietary models at 26x lower cost.
This improves performance by reducing loop overhead and enhancing data locality through processing larger chunks of data per iteration
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
A self-improving LLM agent with optimization memory raises average kernel throughput from 45-49% to 59-61% of peak on Trainium accelerators and matches proprietary models at 26x lower cost.