NNGPT: Rethinking AutoML with Large Language Models

Avi Goyal; Chandini Vysyaraju; Dmitry Ignatov; Furui Qin; Radu Timofte; Raghuvir Duvvuri; Roman Kochnev; Tolgay Atinc Uzun; Waleed Khalid; Xi Zhang

arxiv: 2511.20333 · v1 · pith:MNCYARR5new · submitted 2025-11-25 · 💻 cs.AI · cs.LG· cs.NE

NNGPT: Rethinking AutoML with Large Language Models

Roman Kochnev , Waleed Khalid , Tolgay Atinc Uzun , Xi Zhang , Yashkumar Sanjaybhai Dhameliya , Furui Qin , Chandini Vysyaraju , Raghuvir Duvvuri

show 3 more authors

Avi Goyal Dmitry Ignatov Radu Timofte

This is my paper

classification 💻 cs.AI cs.LGcs.NE

keywords nngptautomlmodelsaccuracyachievesarchitecturecodecode-aware

0 comments

read the original abstract

Building self-improving AI systems remains a fundamental challenge in the AI domain. We present NNGPT, an open-source framework that turns a large language model (LLM) into a self-improving AutoML engine for neural network development, primarily for computer vision. Unlike previous frameworks, NNGPT extends the dataset of neural networks by generating new models, enabling continuous fine-tuning of LLMs based on closed-loop system of generation, assessment, and self-improvement. It integrates within one unified workflow five synergistic LLM-based pipelines: zero-shot architecture synthesis, hyperparameter optimization (HPO), code-aware accuracy/early-stop prediction, retrieval-augmented synthesis of scope-closed PyTorch blocks (NN-RAG), and reinforcement learning. Built on the LEMUR dataset as an audited corpus with reproducible metrics, NNGPT emits from a single prompt and validates network architecture, preprocessing code, and hyperparameters, executes them end-to-end, and learns from result. The PyTorch adapter makes NNGPT framework-agnostic, enabling strong performance: NN-RAG achieves 73% executability on 1,289 targets, 3-shot prompting boosts accuracy on common datasets, and hash-based deduplication saves hundreds of runs. One-shot prediction matches search-based AutoML, reducing the need for numerous trials. HPO on LEMUR achieves RMSE 0.60, outperforming Optuna (0.64), while the code-aware predictor reaches RMSE 0.14 with Pearson r=0.78. The system has already generated over 5K validated models, proving NNGPT as an autonomous AutoML engine. Upon acceptance, the code, prompts, and checkpoints will be released for public access to enable reproducibility and facilitate community usage.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design
cs.CV 2025-12 conditional novelty 6.0

Three-example few-shot prompting optimizes LLM-generated vision architectures while a whitespace-normalized hash provides 100x faster duplicate detection than AST parsing across seven benchmarks.
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
cs.LG 2025-11 unverdicted novelty 4.0

FractalNet automatically generates and tests over 1,200 CNN architectures based on recursive fractal templates, achieving up to 80.18% accuracy on CIFAR-10 after five training epochs.
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
cs.LG 2025-11 unverdicted novelty 3.0

Fractal templates enable systematic creation of more than 1,200 neural network variants that show strong performance and computational efficiency when trained on CIFAR-10 for five epochs.