Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

Andrea Mattia Garavagno; Antonio Frisoli; Edoardo Ragusa; Paolo Gastaldo

arxiv: 2606.14824 · v1 · pith:BGBRUPICnew · submitted 2026-06-12 · 💻 cs.AR · cs.AI· cs.LG

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

Andrea Mattia Garavagno , Edoardo Ragusa , Paolo Gastaldo , Antonio Frisoli This is my paper

Pith reviewed 2026-06-27 04:47 UTC · model grok-4.3

classification 💻 cs.AR cs.AIcs.LG

keywords hardware-aware neural architecture searchembedded devicestiny CNNsmicrocontroller unitsVisual Wake WordTinyMLIoTneural architecture search

0 comments

The pith

A hardware-aware neural architecture search runs directly on embedded devices under 512 MB RAM to produce tiny CNNs that reach state-of-the-art accuracy on the Visual Wake Word human-recognition task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a hardware-aware neural architecture search method designed to operate within the memory constraints of embedded devices. This allows the search to run locally on microcontrollers or similar hardware without relying on external servers. The technique generates compact convolutional neural networks optimized for low-end MCUs used in IoT and wearable applications. It demonstrates state-of-the-art performance on the Visual Wake Word dataset for detecting humans in images across multiple embedded platforms. This matters because it supports privacy-preserving, on-device customization of AI models for resource-limited environments.

Core claim

The proposed hardware-aware neural architecture search considers the resources available on the computing platform, enabling its execution on various embedded devices. It produces tiny convolutional neural networks targeting low-end microcontroller units and achieves state-of-the-art results in human-recognition tasks on the Visual Wake Word dataset on several embedded devices.

What carries the argument

The hardware-aware search algorithm that factors in the target device's memory and compute limits to explore and select efficient CNN architectures.

If this is right

A gateway can run the search on locally acquired data to tailor CNN architectures without external servers, preserving privacy.
The method opens use cases for IoT and wearable robotics by producing custom tiny CNNs on low-end MCUs.
The generated networks achieve state-of-the-art results on the standard TinyML Visual Wake Word human-recognition benchmark.
Execution stays inside the resource envelope of typical low-end microcontroller units.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support fully decentralized model adaptation across networks of sensors without any central server.
Similar local searches might be tested on devices with even tighter limits such as 256 MB RAM to map the feasible range.
On-device architecture search could allow robots or wearables to adjust models in response to changing real-world conditions.

Load-bearing premise

The hardware-aware search algorithm can execute within the 512 MB RAM limit of the target embedded device while exploring a sufficiently useful space of architectures.

What would settle it

Running the NAS process on a target device and recording either peak memory usage above 512 MB or final model accuracy below the claimed state-of-the-art level on the Visual Wake Word dataset.

read the original abstract

This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims on-device HW-NAS under 512MB for privacy-sensitive TinyML but supplies no measurements showing the search process itself fits that limit.

read the letter

The main claim is that hardware-aware neural architecture search can run directly on the target embedded device, letting a gateway or MCU tailor small CNNs to local data without external servers. This targets a concrete privacy need in IoT and wearable robotics.

They produce tiny models that reach state-of-the-art accuracy on the Visual Wake Word human-recognition benchmark across several low-end MCUs. The direction is practical because many edge deployments cannot ship data off-device.

The soft spot is exactly the one flagged in the stress-test note. No peak RAM trace, resident-set breakdown, or timing for the supernet training and sampling loop appears in the description. The abstract simply states the search runs under 512 MB; without those numbers it is impossible to judge whether the central feasibility claim holds or whether the explored space was artificially narrowed to make it fit.

If the full paper contains concrete memory profiles and shows the search stays inside the envelope while still finding competitive architectures, the result becomes more credible. Right now the memory constraint reads as an assumption rather than a demonstrated property.

This is for the TinyML and embedded-ML community. Readers working on on-device adaptation or privacy-preserving edge systems would get the most from the benchmark numbers and the stated use case.

It deserves peer review so the implementation details and memory data can be checked.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a hardware-aware neural architecture search (HW-NAS) technique that executes directly on embedded devices under a 512 MB RAM limit. It generates compact CNNs for low-end MCUs used in IoT and wearable robotics and reports state-of-the-art accuracy on the Visual Wake Word dataset for human-recognition tasks, enabling privacy-preserving on-device model customization without external servers.

Significance. Successful demonstration of on-device HW-NAS within the stated memory envelope would enable new privacy-preserving workflows in TinyML. The reported SOTA results on a standard benchmark would strengthen the contribution if the memory-feasibility claim is substantiated.

major comments (1)

[Abstract] Abstract: the central claim that the full HW-NAS procedure (supernet training, architecture sampling, and evaluation) executes inside the 512 MB envelope on the target MCU is asserted without any reported peak RAM measurements, resident-set-size traces, or per-component memory breakdown for the search loop itself. This datum is required to convert the feasibility statement from an assumption into a demonstrated property.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on substantiating the memory-feasibility claim. We address the point below and will revise the manuscript to include the requested empirical data.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the full HW-NAS procedure (supernet training, architecture sampling, and evaluation) executes inside the 512 MB envelope on the target MCU is asserted without any reported peak RAM measurements, resident-set-size traces, or per-component memory breakdown for the search loop itself. This datum is required to convert the feasibility statement from an assumption into a demonstrated property.

Authors: We agree that the abstract (and results section) would benefit from explicit memory measurements to convert the feasibility claim into a demonstrated result. In the revised manuscript we will add a new subsection (or table) under Experiments reporting: (i) peak RAM usage during the full search loop on each evaluated MCU, (ii) resident-set-size traces where instrumentation permits, and (iii) a per-component breakdown (supernet training, architecture sampling, evaluation) that stays within the 512 MB envelope. These data will be obtained by instrumenting the search code with platform-specific memory profilers. revision: yes

Circularity Check

0 steps flagged

No circularity detected; no derivation chain present

full rationale

The manuscript describes an engineering method for hardware-aware NAS runnable on embedded devices under a 512 MB RAM limit and reports empirical results on the Visual Wake Word dataset. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the supplied text. The central feasibility claim is an empirical assertion about on-device execution rather than a mathematical reduction that could collapse to its own inputs by construction. No load-bearing steps matching any of the enumerated circularity patterns exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no free parameters, axioms, or invented entities can be extracted from the provided text.

pith-pipeline@v0.9.1-grok · 5650 in / 997 out tokens · 31585 ms · 2026-06-27T04:47:02.423426+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 8 canonical work pages · 5 internal anchors

[1]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Mobilenets: Efficient convolutional neural networks for mobile vision applications , author=. arXiv preprint arXiv:1704.04861 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Mobilenetv2: Inverted residuals and linear bottlenecks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[3]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Searching for mobilenetv3 , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[4]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Shufflenet: An extremely efficient convolutional neural network for mobile devices , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[5]

Proceedings of the European conference on computer vision (ECCV) , pages=

Shufflenet v2: Practical guidelines for efficient cnn architecture design , author=. Proceedings of the European conference on computer vision (ECCV) , pages=
[6]

International conference on machine learning , pages=

Efficientnet: Rethinking model scaling for convolutional neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

2019
[7]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Mnasnet: Platform-aware neural architecture search for mobile , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[8]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[9]

2019 International Conference on Field-Programmable Technology (ICFPT) , pages=

FPNet: customized convolutional neural network for FPGA platforms , author=. 2019 International Conference on Field-Programmable Technology (ICFPT) , pages=. 2019 , organization=

2019
[10]

2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=

Best of both worlds: Automl codesign of a cnn and its hardware accelerator , author=. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=. 2020 , organization=

2020
[11]

2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=

Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks , author=. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=. 2020 , organization=

2020
[12]

Advances in Neural Information Processing Systems , volume=

Mcunet: Tiny deep learning on iot devices , author=. Advances in Neural Information Processing Systems , volume=
[13]

arXiv preprint arXiv:1908.09791 , year=

Once-for-all: Train one network and specialize it for efficient deployment , author=. arXiv preprint arXiv:1908.09791 , year=

work page arXiv 1908
[14]

Proceedings of the 39th International Conference on Computer-Aided Design , pages=

NASCaps: A framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks , author=. Proceedings of the 39th International Conference on Computer-Aided Design , pages=
[15]

DARTS: Differentiable Architecture Search

Darts: Differentiable architecture search , author=. arXiv preprint arXiv:1806.09055 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Proceedings of Machine Learning and Systems , volume=

Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers , author=. Proceedings of Machine Learning and Systems , volume=
[17]

International Journal of Computer Vision , pages=

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning , author=. International Journal of Computer Vision , pages=. 2023 , publisher=

2023
[18]

Cognitive Computation , pages=

DLW-NAS: Differentiable Light-Weight Neural Architecture Search , author=. Cognitive Computation , pages=. 2022 , publisher=

2022
[19]

arXiv preprint arXiv:2301.08727 , year=

Neural Architecture Search: Insights from 1000 Papers , author=. arXiv preprint arXiv:2301.08727 , year=

work page arXiv
[20]

Neural networks: tricks of the trade: second edition , pages=

Early stopping—but when? , author=. Neural networks: tricks of the trade: second edition , pages=. 2012 , publisher=

2012
[21]

Visual Wake Words Dataset

Visual wake words dataset , author=. arXiv preprint arXiv:1906.05721 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1906
[22]

arXiv preprint arXiv:2003.04821 , year=

Benchmarking tinyml systems: Challenges and direction , author=. arXiv preprint arXiv:2003.04821 , year=

work page arXiv 2003
[23]

Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14 , pages=

Identity mappings in deep residual networks , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14 , pages=. 2016 , organization=

2016
[24]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very deep convolutional networks for large-scale image recognition , author=. arXiv preprint arXiv:1409.1556 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

mcunet-10fps\_vww.tflite , url =
[27]

vww2\_50\_50\_INT8.tflite , url =
[28]

A hardware-aware neural architecture search algorithm targeting low-end microcontrollers , year=

Garavagno, Andrea Mattia and Ragusa, Edoardo and Frisoli, Antonio and Gastaldo, Paolo , booktitle=. A hardware-aware neural architecture search algorithm targeting low-end microcontrollers , year=
[29]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Fast and practical neural architecture search , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[1] [1]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Mobilenets: Efficient convolutional neural networks for mobile vision applications , author=. arXiv preprint arXiv:1704.04861 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Mobilenetv2: Inverted residuals and linear bottlenecks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[3] [3]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Searching for mobilenetv3 , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[4] [4]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Shufflenet: An extremely efficient convolutional neural network for mobile devices , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[5] [5]

Proceedings of the European conference on computer vision (ECCV) , pages=

Shufflenet v2: Practical guidelines for efficient cnn architecture design , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

[6] [6]

International conference on machine learning , pages=

Efficientnet: Rethinking model scaling for convolutional neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

2019

[7] [7]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Mnasnet: Platform-aware neural architecture search for mobile , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[8] [8]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[9] [9]

2019 International Conference on Field-Programmable Technology (ICFPT) , pages=

FPNet: customized convolutional neural network for FPGA platforms , author=. 2019 International Conference on Field-Programmable Technology (ICFPT) , pages=. 2019 , organization=

2019

[10] [10]

2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=

Best of both worlds: Automl codesign of a cnn and its hardware accelerator , author=. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=. 2020 , organization=

2020

[11] [11]

2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=

Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks , author=. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=. 2020 , organization=

2020

[12] [12]

Advances in Neural Information Processing Systems , volume=

Mcunet: Tiny deep learning on iot devices , author=. Advances in Neural Information Processing Systems , volume=

[13] [13]

arXiv preprint arXiv:1908.09791 , year=

Once-for-all: Train one network and specialize it for efficient deployment , author=. arXiv preprint arXiv:1908.09791 , year=

work page arXiv 1908

[14] [14]

Proceedings of the 39th International Conference on Computer-Aided Design , pages=

NASCaps: A framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks , author=. Proceedings of the 39th International Conference on Computer-Aided Design , pages=

[15] [15]

DARTS: Differentiable Architecture Search

Darts: Differentiable architecture search , author=. arXiv preprint arXiv:1806.09055 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Proceedings of Machine Learning and Systems , volume=

Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers , author=. Proceedings of Machine Learning and Systems , volume=

[17] [17]

International Journal of Computer Vision , pages=

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning , author=. International Journal of Computer Vision , pages=. 2023 , publisher=

2023

[18] [18]

Cognitive Computation , pages=

DLW-NAS: Differentiable Light-Weight Neural Architecture Search , author=. Cognitive Computation , pages=. 2022 , publisher=

2022

[19] [19]

arXiv preprint arXiv:2301.08727 , year=

Neural Architecture Search: Insights from 1000 Papers , author=. arXiv preprint arXiv:2301.08727 , year=

work page arXiv

[20] [20]

Neural networks: tricks of the trade: second edition , pages=

Early stopping—but when? , author=. Neural networks: tricks of the trade: second edition , pages=. 2012 , publisher=

2012

[21] [21]

Visual Wake Words Dataset

Visual wake words dataset , author=. arXiv preprint arXiv:1906.05721 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1906

[22] [22]

arXiv preprint arXiv:2003.04821 , year=

Benchmarking tinyml systems: Challenges and direction , author=. arXiv preprint arXiv:2003.04821 , year=

work page arXiv 2003

[23] [23]

Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14 , pages=

Identity mappings in deep residual networks , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14 , pages=. 2016 , organization=

2016

[24] [24]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very deep convolutional networks for large-scale image recognition , author=. arXiv preprint arXiv:1409.1556 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

mcunet-10fps\_vww.tflite , url =

[27] [27]

vww2\_50\_50\_INT8.tflite , url =

[28] [28]

A hardware-aware neural architecture search algorithm targeting low-end microcontrollers , year=

Garavagno, Andrea Mattia and Ragusa, Edoardo and Frisoli, Antonio and Gastaldo, Paolo , booktitle=. A hardware-aware neural architecture search algorithm targeting low-end microcontrollers , year=

[29] [29]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Fast and practical neural architecture search , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=