Recognition: unknown
Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models
read the original abstract
Channel-configuration search, the optimization of layer specifications such as channel widths in deep neural networks, presents a combinatorial challenge constrained by tensor-shape compatibility and computational budgets. We investigate whether large language models (LLMs) can support neural architecture search (NAS) by reasoning over architectural code structures in ways that complement traditional search heuristics. We apply an LLM-driven NAS framework to channel-configuration search, formulating the task as conditional code generation in which the LLM refines architectural specifications using performance feedback. To address data scarcity, we generate a corpus of valid, shape-consistent architectures through abstract syntax tree (AST) mutations. Although these mutated networks are not necessarily optimized for performance, they provide structural examples that help the LLM learn executable architectural patterns and relate channel configurations to model performance. Experimental results on CIFAR-100 show that the closed-loop LLM improves upon the initial AST-generated architecture population under the same proxy-evaluation protocol. Our analysis further shows that the generated architectures reflect domain-specific design patterns, including non-standard channel widths and late-stage expansion, highlighting the potential of language-driven design for code-level NAS. The code and prompts are publicly available at https://github.com/ABrain-One/NN-GPT, and the generated deep neural networks are published at https://github.com/ABrain-One/NN-Dataset under model names with the prefix ast-dimension-.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs
Fine-tuned 7B LLMs generating unified diffs for neural architecture refinement achieve 66-75% valid rates and 64-66% mean first-epoch accuracy, outperforming full-generation baselines by large margins while cutting ou...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.