An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms

Andrea Mattia Garavagno; Antonio Frisoli; Edoardo Ragusa; Paolo Gastaldo

arxiv: 2606.16290 · v1 · pith:OI273O2Gnew · submitted 2026-06-15 · 💻 cs.LG · cs.AI

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms

Andrea Mattia Garavagno , Edoardo Ragusa , Antonio Frisoli , Paolo Gastaldo This is my paper

Pith reviewed 2026-06-27 03:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords hardware-aware neural architecture searchultra-low-power microcontrollerstiny CNNsembedded devicescomputer vision benchmarksneural architecture searchmodel deployment

0 comments

The pith

A hardware-aware search generates tiny CNNs that run on ultra-low-power microcontrollers while preserving accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a hardware-aware neural architecture search tailored for ultra-low-power microcontrollers. The approach includes a lightweight search procedure that can execute even on the target embedded devices. It produces convolutional neural networks fitting prearranged hardware constraints. Tests on three benchmarks for tiny computer vision demonstrate state-of-the-art classification accuracy without degradation.

Core claim

The proposed HW-NAS generates tiny CNNs for ultra-low-power microcontrollers by using a lightweight search procedure that enables execution on embedded devices, achieving state-of-the-art accuracy on standard benchmarks.

What carries the argument

The lightweight search procedure that incorporates hardware constraints directly into the architecture generation for CNNs.

If this is right

Architectures satisfy hardware constraints without requiring post-search adjustments.
Search can be performed directly on the low-power devices.
Models maintain accuracy on benchmarks for tiny computer vision.
Deployment becomes feasible on sensing nodes with strict power limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such methods could reduce reliance on cloud-based model optimization for IoT applications.
Extending the search to other tasks like object detection might broaden applicability.
Integration with existing tinyML frameworks could accelerate adoption.

Load-bearing premise

The search remains lightweight enough to run on the ultra-low-power devices without exceeding their resources or causing accuracy loss.

What would settle it

Observing that the search procedure exceeds the power budget or memory of the target microcontroller, or that generated models underperform state-of-the-art accuracy.

Figures

Figures reproduced from arXiv: 2606.16290 by Andrea Mattia Garavagno, Antonio Frisoli, Edoardo Ragusa, Paolo Gastaldo.

read the original abstract

Hardware-aware neural architecture search (HW-NAS) allows the integration of Convolutional Neural Networks (CNNs) in microcontrollers devices by automatically designing neural architectures that can fit prearranged hardware constraints. However, state-of-the-art HW-NAS target high-performance microcontrollers, whose power consumption does not meet sensing nodes requirements. This work presents a HW-NAS generating tiny CNNs that can run on ultra-low-power microcontrollers, featuring a lightweight search procedure enabling its execution even on embedded devices. Empirical results on three well-known benchmarks for tiny computer vision proved that the proposed HW-NAS was able to generate tiny CNNs while preserving state-of-the-art classification accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical HW-NAS for ultra-low-power MCUs with an on-device search that preserves accuracy on three standard tiny-vision benchmarks.

read the letter

The main thing to know is that the authors extended hardware-aware NAS to ultra-low-power microcontrollers by making the search itself lightweight enough to run on the target embedded devices, and the reported results on three common benchmarks show the generated CNNs meet the power constraints without accuracy loss.

The work takes the existing HW-NAS approach and applies it to a stricter power regime than most prior targets. The abstract and stress-test note indicate the full paper supplies the search-cost numbers, hardware-in-the-loop checks, and baseline comparisons that were absent from the abstract alone. That moves the contribution from unverified claim to something that can be examined.

The empirical side looks straightforward: standard benchmarks, accuracy preservation, and explicit hardware constraint enforcement. No load-bearing fitting or circular definitions appear in the description.

The soft spots are the usual ones for this kind of engineering paper. The core idea of folding hardware metrics into the search is already in the literature the authors cite, so the advance is mainly the power level and the on-device search implementation rather than a new algorithm. Real deployment could still hit gaps between modeled and measured power or latency, though the paper apparently includes validation steps to address that.

This is useful for embedded ML groups working on battery-constrained IoT nodes who need concrete architectures that fit tight power budgets. A reader looking for reproducible methods in that niche would get value from the details.

It deserves peer review. The combination of a new target regime, on-device search, and reported results is concrete enough to be worth referee time even if revisions are needed on the hardware modeling or additional benchmarks.

Referee Report

0 major / 2 minor

Summary. The paper presents a hardware-aware neural architecture search (HW-NAS) method for generating compact CNNs deployable on ultra-low-power microcontrollers. It introduces a lightweight search procedure executable on embedded devices and reports empirical results on three tiny computer vision benchmarks demonstrating that the generated architectures meet hardware constraints while preserving state-of-the-art classification accuracy.

Significance. If the reported results and search-cost measurements hold, the work provides a practical advance for on-device NAS in resource-constrained IoT sensing nodes, where existing HW-NAS methods target higher-power platforms. The combination of hardware-in-the-loop validation and on-device search execution is a concrete strength that could support reproducible deployment pipelines.

minor comments (2)

[Abstract] Abstract: the three benchmarks are described only as 'well-known'; naming them explicitly (with references) would improve immediate clarity without lengthening the text.
[Section 5] The manuscript would benefit from a brief statement in the experimental section on how accuracy was measured (e.g., top-1 on validation vs. test split) and whether hardware constraints were enforced strictly during search or only post-search.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report, so we have no point-by-point responses to address at this time.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a hardware-aware NAS method whose central claim rests on empirical validation across three standard tiny-vision benchmarks, reporting that the generated CNNs satisfy ultra-low-power constraints while retaining state-of-the-art accuracy. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the abstract or described procedure. The search cost, hardware-in-the-loop checks, and accuracy measurements are presented as independent experimental outcomes rather than reductions to prior inputs by construction, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on unstated modeling choices about hardware constraints and benchmark definitions.

pith-pipeline@v0.9.1-grok · 5649 in / 1026 out tokens · 37886 ms · 2026-06-27T03:47:45.434426+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Affordance segmentation using tiny networks for sensing systems in wearable robotic devices,

E. Ragusa, S. Dosen, R. Zunino, and P. Gastaldo, “Affordance segmentation using tiny networks for sensing systems in wearable robotic devices,”IEEE Sensors Journal, 2023

2023
[2]

Bi-directional lstm model for accurate and real-time landslide detection: A case study in mawiongrim, meghalaya, india,

J. S. Gidon, J. Borah, S. Sahoo, S. Majumdar, and M. Fujita, “Bi-directional lstm model for accurate and real-time landslide detection: A case study in mawiongrim, meghalaya, india,”IEEE Internet of Things Journal, 2023

2023
[3]

Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers,

C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V . Janapa Reddi, M. Mattina, and P. Whatmough, “Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers,”Proc. of Machine Learning and Systems, vol. 3, pp. 517–532, 2021

2021
[4]

Mcunet: Tiny deep learning on iot devices,

J. Lin, W.-M. Chen, Y . Lin, J. Cohn, C. Gan, and S. Han, “Mcunet: Tiny deep learning on iot devices,”Advances in Neural Information Processing Systems, vol. 33, pp. 11 711–11 722, 2020

2020
[5]

𝜇nas: Constrained neural architecture search for microcontrollers,

E. Liberis, Ł. Dudziak, and N. D. Lane, “𝜇nas: Constrained neural architecture search for microcontrollers,” inProc. of the 1st Workshop on Machine Learning and Systems, 2021, pp. 70–79

2021
[6]

Aicarebreath: Iot enabled location invariant novel unified model for predicting air pollutants to avoid related respiratory disease,

J. Borah, S. Kumar, N. Kumar, M. S. M. Nadzir, M. G. Cayetano, H. Ghayvat, S. Majumdar, and N. Kumar, “Aicarebreath: Iot enabled location invariant novel unified model for predicting air pollutants to avoid related respiratory disease,” IEEE Internet of Things Journal, 2023

2023
[7]

Colabnas: Obtaining lightweight task-specific convolutional neural networks following occam’s razor,

A. M. Garavagno, D. Leonardis, and A. Frisoli, “Colabnas: Obtaining lightweight task-specific convolutional neural networks following occam’s razor,”Future Generation Computer Systems, vol. 152, pp. 152–159, 2024

2024
[8]

A hardware-aware neural architecture search algorithm targeting low-end microcontrollers,

A. M. Garavagno, E. Ragusa, A. Frisoli, and P. Gastaldo, “A hardware-aware neural architecture search algorithm targeting low-end microcontrollers,” in18th Conference on Ph. D Research in Microelectronics and Electronics (PRIME). IEEE, 2023, pp. 281–284

2023
[9]

Running hardware-aware neural architecture search on embedded devices under 512mb of ram,

——, “Running hardware-aware neural architecture search on embedded devices under 512mb of ram,” in2024 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2024, pp. 1–2

2024
[10]

Searching for mobilenetv3,

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevanet al., “Searching for mobilenetv3,” inProc. of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324

2019
[11]

Shufflenet v2: Practical guidelines for efficient cnn architecture design,

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” inProc. of the European conference on computer vision (ECCV), 2018, pp. 116–131

2018
[12]

Efficientnet: Rethinking model scaling for convolutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” inInt. conference on machine learning. PMLR, 2019, pp. 6105–6114

2019
[13]

Mnasnet: Platform-aware neural architecture search for mobile,

M. Tan, B. Chen, R. Pang, V . Vasudevan, M. Sandler, A. Howard, and Q. V . Le, “Mnasnet: Platform-aware neural architecture search for mobile,” inProc. of the IEEE/CVF conf. on computer vision and pattern recognition, 2019, pp. 2820–2828

2019
[14]

Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,

B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 734–10 742

2019
[15]

[Online]

32-bit arm cortex mcus. [Online]. Available: https://www.st.com/en/ microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus.html
[16]

Tensorflow lite micro: Embedded machine learning for tinyml systems,

R. David, J. Duke, A. Jain, V . Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natrajet al., “Tensorflow lite micro: Embedded machine learning for tinyml systems,”Proc. of Machine Learning and Systems, vol. 3, pp. 800–811, 2021

2021
[17]

Visual Wake Words Dataset

A. Chowdhery, P. Warden, J. Shlens, A. Howard, and R. Rhodes, “Visual wake words dataset,”arXiv preprint arXiv:1906.05721, 2019. VOL. 1, NO. 3, JUL Y 2017 0000000

work page internal anchor Pith review Pith/arXiv arXiv 1906
[18]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009

2009
[19]

Melanoma skin cancer dataset of 10000 images,

M. H. Javid, “Melanoma skin cancer dataset of 10000 images,” 2022. [Online]. Available: https://www.kaggle.com/dsv/3376422

work page arXiv 2022
[20]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[1] [1]

Affordance segmentation using tiny networks for sensing systems in wearable robotic devices,

E. Ragusa, S. Dosen, R. Zunino, and P. Gastaldo, “Affordance segmentation using tiny networks for sensing systems in wearable robotic devices,”IEEE Sensors Journal, 2023

2023

[2] [2]

Bi-directional lstm model for accurate and real-time landslide detection: A case study in mawiongrim, meghalaya, india,

J. S. Gidon, J. Borah, S. Sahoo, S. Majumdar, and M. Fujita, “Bi-directional lstm model for accurate and real-time landslide detection: A case study in mawiongrim, meghalaya, india,”IEEE Internet of Things Journal, 2023

2023

[3] [3]

Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers,

C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V . Janapa Reddi, M. Mattina, and P. Whatmough, “Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers,”Proc. of Machine Learning and Systems, vol. 3, pp. 517–532, 2021

2021

[4] [4]

Mcunet: Tiny deep learning on iot devices,

J. Lin, W.-M. Chen, Y . Lin, J. Cohn, C. Gan, and S. Han, “Mcunet: Tiny deep learning on iot devices,”Advances in Neural Information Processing Systems, vol. 33, pp. 11 711–11 722, 2020

2020

[5] [5]

𝜇nas: Constrained neural architecture search for microcontrollers,

E. Liberis, Ł. Dudziak, and N. D. Lane, “𝜇nas: Constrained neural architecture search for microcontrollers,” inProc. of the 1st Workshop on Machine Learning and Systems, 2021, pp. 70–79

2021

[6] [6]

Aicarebreath: Iot enabled location invariant novel unified model for predicting air pollutants to avoid related respiratory disease,

J. Borah, S. Kumar, N. Kumar, M. S. M. Nadzir, M. G. Cayetano, H. Ghayvat, S. Majumdar, and N. Kumar, “Aicarebreath: Iot enabled location invariant novel unified model for predicting air pollutants to avoid related respiratory disease,” IEEE Internet of Things Journal, 2023

2023

[7] [7]

Colabnas: Obtaining lightweight task-specific convolutional neural networks following occam’s razor,

A. M. Garavagno, D. Leonardis, and A. Frisoli, “Colabnas: Obtaining lightweight task-specific convolutional neural networks following occam’s razor,”Future Generation Computer Systems, vol. 152, pp. 152–159, 2024

2024

[8] [8]

A hardware-aware neural architecture search algorithm targeting low-end microcontrollers,

A. M. Garavagno, E. Ragusa, A. Frisoli, and P. Gastaldo, “A hardware-aware neural architecture search algorithm targeting low-end microcontrollers,” in18th Conference on Ph. D Research in Microelectronics and Electronics (PRIME). IEEE, 2023, pp. 281–284

2023

[9] [9]

Running hardware-aware neural architecture search on embedded devices under 512mb of ram,

——, “Running hardware-aware neural architecture search on embedded devices under 512mb of ram,” in2024 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2024, pp. 1–2

2024

[10] [10]

Searching for mobilenetv3,

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevanet al., “Searching for mobilenetv3,” inProc. of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324

2019

[11] [11]

Shufflenet v2: Practical guidelines for efficient cnn architecture design,

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” inProc. of the European conference on computer vision (ECCV), 2018, pp. 116–131

2018

[12] [12]

Efficientnet: Rethinking model scaling for convolutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” inInt. conference on machine learning. PMLR, 2019, pp. 6105–6114

2019

[13] [13]

Mnasnet: Platform-aware neural architecture search for mobile,

M. Tan, B. Chen, R. Pang, V . Vasudevan, M. Sandler, A. Howard, and Q. V . Le, “Mnasnet: Platform-aware neural architecture search for mobile,” inProc. of the IEEE/CVF conf. on computer vision and pattern recognition, 2019, pp. 2820–2828

2019

[14] [14]

Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,

B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” inProc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 734–10 742

2019

[15] [15]

[Online]

32-bit arm cortex mcus. [Online]. Available: https://www.st.com/en/ microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus.html

[16] [16]

Tensorflow lite micro: Embedded machine learning for tinyml systems,

R. David, J. Duke, A. Jain, V . Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natrajet al., “Tensorflow lite micro: Embedded machine learning for tinyml systems,”Proc. of Machine Learning and Systems, vol. 3, pp. 800–811, 2021

2021

[17] [17]

Visual Wake Words Dataset

A. Chowdhery, P. Warden, J. Shlens, A. Howard, and R. Rhodes, “Visual wake words dataset,”arXiv preprint arXiv:1906.05721, 2019. VOL. 1, NO. 3, JUL Y 2017 0000000

work page internal anchor Pith review Pith/arXiv arXiv 1906

[18] [18]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009

2009

[19] [19]

Melanoma skin cancer dataset of 10000 images,

M. H. Javid, “Melanoma skin cancer dataset of 10000 images,” 2022. [Online]. Available: https://www.kaggle.com/dsv/3376422

work page arXiv 2022

[20] [20]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014