Iterative LLM-NAS is equivalent to a parametric cross-entropy method with proven monotonic quality improvement, geometric convergence of elite probability, and a closed-form proxy reliability rho_S = (6/pi) arcsin(rho_P(SNR)/2), partially confirmed on 3300 architectures.
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs -- DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B -- across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy): DeepSeek-Coder reaches 75.3% valid rate and 65.8% mean accuracy; Qwen2.5-Coder 72.1%/64.6%; Mistral 66.6%/66.1%. On CIFAR-10, best first-epoch accuracies reach 85.5% (Mistral), 85.2% (DeepSeek), 80.6% (Qwen) -- well above 63.98% full generation and 71.5% for the concurrent approach of Gu et al. Output lengths are 30-50 lines versus 200+ for full generation (75-85% reduction). A 50-epoch study confirms the 1-epoch proxy preserves rankings (Mistral: Spearman $\rho$ = 0.926). Delta-based generation is a token-efficient, multi-domain, LLM-agnostic alternative to full-model synthesis for LLM-driven NAS.
fields
cs.LG 3years
2026 3representative citing papers
Automated search of 4463 heterogeneous 4-expert MoE models found enumeration bias anchoring the space to AirNet and ranked ShuffleNet/MobileNetV3 as top performers.
Empirical grid search over 18 loss-optimizer pairs on 33 LEMUR architectures shows cross-entropy with Adam/AdamW is most robust while NGL and SGD-based pairings vary sharply by model family.
citing papers explorer
-
Towards Robust Training in NNGPT AutoML Pipeline: A Loss-Optimizer Pairing Selection Study
Empirical grid search over 18 loss-optimizer pairs on 33 LEMUR architectures shows cross-entropy with Adam/AdamW is most robust while NGL and SGD-based pairings vary sharply by model family.