Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

· 2026 · cs.LG · arXiv 2605.04903

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs -- DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B -- across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy): DeepSeek-Coder reaches 75.3% valid rate and 65.8% mean accuracy; Qwen2.5-Coder 72.1%/64.6%; Mistral 66.6%/66.1%. On CIFAR-10, best first-epoch accuracies reach 85.5% (Mistral), 85.2% (DeepSeek), 80.6% (Qwen) -- well above 63.98% full generation and 71.5% for the concurrent approach of Gu et al. Output lengths are 30-50 lines versus 200+ for full generation (75-85% reduction). A 50-epoch study confirms the 1-epoch proxy preserves rankings (Mistral: Spearman $\rho$ = 0.926). Delta-based generation is a token-efficient, multi-domain, LLM-agnostic alternative to full-model synthesis for LLM-driven NAS.

representative citing papers

Convergence Theory for Iterative LLM-Based Neural Architecture Search: A Parametric Cross-Entropy Framework with Closed-Form Proxy Reliability

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Iterative LLM-NAS is equivalent to a parametric cross-entropy method with proven monotonic quality improvement, geometric convergence of elite probability, and a closed-form proxy reliability rho_S = (6/pi) arcsin(rho_P(SNR)/2), partially confirmed on 3300 architectures.

Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search

cs.LG · 2026-06-21 · unverdicted · novelty 5.0

Automated search of 4463 heterogeneous 4-expert MoE models found enumeration bias anchoring the space to AirNet and ranked ShuffleNet/MobileNetV3 as top performers.

Towards Robust Training in NNGPT AutoML Pipeline: A Loss-Optimizer Pairing Selection Study

cs.LG · 2026-06-18 · conditional · novelty 4.0

Empirical grid search over 18 loss-optimizer pairs on 33 LEMUR architectures shows cross-entropy with Adam/AdamW is most robust while NGL and SGD-based pairings vary sharply by model family.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Towards Robust Training in NNGPT AutoML Pipeline: A Loss-Optimizer Pairing Selection Study cs.LG · 2026-06-18 · conditional · none · ref 2 · internal anchor
Empirical grid search over 18 loss-optimizer pairs on 33 LEMUR architectures shows cross-entropy with Adam/AdamW is most robust while NGL and SGD-based pairings vary sharply by model family.

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

fields

years

verdicts

representative citing papers

citing papers explorer