On-Device Neural Architecture Search

Andrea Mattia Garavagno; Antonio Frisoli; Claudio Loconsole; Edoardo Ragusa; Paolo Gastaldo

arxiv: 2606.24900 · v1 · pith:HKXUMCZEnew · submitted 2026-06-12 · 💻 cs.LG · cs.AI

On-Device Neural Architecture Search

Andrea Mattia Garavagno , Edoardo Ragusa , Paolo Gastaldo , Antonio Frisoli , Claudio Loconsole This is my paper

Pith reviewed 2026-06-27 05:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords on-device neural architecture searchembedded systemstiny neural networkssurface electromyographyhuman-machine interfacesRaspberry Pisign language recognition

0 comments

The pith

Lightweight neural architecture search can run directly on embedded devices to tailor tiny models to specific users' sensor data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a lightweight version of neural architecture search can be run directly on resource-limited devices such as the Raspberry Pi. This allows the device to discover a custom tiny neural network suited to the specific sensor data patterns of an individual user after a brief guided collection session. Experiments on the Italian Sign Language dataset demonstrate the resulting models use 0.63 times less RAM while delivering 5.96 percentage points more accuracy than previous methods. Similar gains appear on the CWRU fault diagnosis dataset.

Core claim

By designing a lightweight NAS that executes on embedded hardware, the authors demonstrate that optimal tiny neural architectures can be found using only data from a short guided user session, resulting in models that occupy 0.63 times less RAM with 5.96 percentage points higher accuracy on the ISL dataset and 0.44 times less RAM with 0.2 percentage points higher accuracy on the CWRU dataset when tested on a Raspberry Pi 4.

What carries the argument

lightweight Neural Architecture Search algorithm engineered to operate under the memory and compute constraints of embedded systems like the Raspberry Pi

If this is right

The discovered architectures require significantly less RAM than state-of-the-art alternatives while maintaining or improving classification accuracy on both datasets.
Personalization of models for new users becomes feasible through on-device search after collecting a small amount of labeled data in a guided session.
The approach applies across domains, including sign language recognition from sEMG signals and mechanical fault diagnosis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar on-device search might enable continuous adaptation as sensor data distributions shift over time without external servers.
Keeping data and search local could reduce privacy risks in biometric human-machine interface applications.
The method might scale to other microcontroller platforms with tighter resource limits than the Raspberry Pi 4.

Load-bearing premise

The search procedure itself can be executed efficiently on the target embedded hardware using only the limited data collected during a guided user session, without requiring external compute or large validation sets.

What would settle it

Running the proposed NAS on a Raspberry Pi 4 with the ISL dataset and verifying whether the resulting architecture simultaneously achieves 0.63 times less RAM occupancy and 5.96 percentage points higher accuracy than state-of-the-art methods.

read the original abstract

This paper proposes a new approach to near-sensor computing, in which a lightweight Neural Architecture Search (NAS) is performed directly on the deployment device to find the best tiny neural architecture for analyzing the real-time data acquired through sensors. This new adaptation capability can be particularly useful in the case of human-machine interfaces for which the neural network analyzing the biometrical data can be re-designed each time the user changes, after a guided data collection procedure, fighting the typical data variations between individuals on a new level. To implement the proposed approach a new NAS has been designed and then validated on the Italian Sign Language dataset (ISL), a collection of surface electromyography (sEMG) signals of the signs of the Italian alphabet, using several embedded systems. Moreover, further validation on the Case Western Reserve University dataset (CWRU), a benchmark for intelligent fault diagnosis, is presented to suggest another possible application of the proposed approach. When run on a Raspberry Pi 4, the proposed NAS performs beyond the state of the art proposing a tiny neural architecture having 0.63 times less RAM occupancy and 5.96 percentage points of more accuracy in the case of the ISL dataset; and 0.44 times less RAM occupancy and 0.2 percentage points of more accuracy in the case of the CWRU dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract claims a full NAS search runs on Raspberry Pi 4 with only small per-user data and beats prior results on RAM and accuracy, but supplies none of the numbers needed to check that precondition.

read the letter

The paper's main point is that a lightweight NAS can be executed directly on embedded hardware such as the Raspberry Pi 4 to produce tiny models tailored to sEMG sign language data or bearing fault signals after a short user-specific collection session. On the ISL dataset the resulting architecture uses 0.63 times less RAM and gains 5.96 accuracy points; on CWRU it uses 0.44 times less RAM and gains 0.2 points. The framing emphasizes adaptation to individual variation in biometric interfaces.

What is actually new is the concrete application of on-device NAS to these two sensor problems together with reported measurements on real embedded platforms. The datasets are standard for their domains, and the hardware results give a practical sense of what is achievable for near-sensor computing.

The soft spot is exactly the one flagged in the stress test. The abstract states that the NAS itself runs on the Pi 4 but reports no search-time memory footprint, no search duration, no count of evaluated architectures, and no description of how the search space or validation set is restricted to the small guided-session data. Without those numbers the headline claim that the search fits the device constraints cannot be checked, so it is impossible to know whether external compute was used. The accuracy and RAM figures are presented without baseline details or error analysis either.

The paper is aimed at engineers working on embedded ML for wearables or industrial sensors who want an example of on-device adaptation in those settings. A reader seeking new search algorithms or tightly controlled experiments will find little.

I would not bring this to a reading group. It does not deserve peer review until the search procedure is documented with the missing metrics on cost and data usage.

Referee Report

2 major / 1 minor

Summary. The paper proposes a lightweight Neural Architecture Search (NAS) executed directly on embedded deployment hardware such as the Raspberry Pi 4 to discover optimal tiny neural networks for real-time sensor data analysis. The approach is motivated by personalization needs in human-machine interfaces and is validated on the Italian Sign Language (ISL) sEMG dataset and the CWRU bearing fault diagnosis dataset, with claims of architectures that occupy less RAM and achieve higher accuracy than prior methods when the full NAS runs on-device after a guided data collection session.

Significance. If the on-device search can be shown to complete within the memory, time, and data constraints of the target hardware using only the small per-user dataset, the work would demonstrate a practical route to adaptive edge models that do not require external compute or large validation sets. The two-dataset, real-hardware evaluation provides a concrete starting point for assessing utility in embedded signal-processing applications.

major comments (2)

[Abstract] Abstract: the central claim that the NAS search itself executes on the Raspberry Pi 4 using only limited guided-session data is unsupported by any reported figures for search-time memory footprint, number of architectures evaluated, or search duration; without these quantities the headline RAM and accuracy improvements cannot be verified as resulting from an on-device procedure.
[Abstract] Abstract: quantitative gains (0.63 imes less RAM and +5.96 pp accuracy on ISL; 0.44 imes less RAM and +0.2 pp on CWRU) are presented without baseline descriptions, number of runs, or error analysis, preventing assessment of whether the improvements are attributable to the proposed NAS rather than implementation choices or dataset specifics.

minor comments (1)

A methods subsection describing the search algorithm, search-space constraints, and validation-set construction from the guided user session would clarify how on-device execution is achieved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate planned revisions to strengthen the presentation of our on-device NAS results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the NAS search itself executes on the Raspberry Pi 4 using only limited guided-session data is unsupported by any reported figures for search-time memory footprint, number of architectures evaluated, or search duration; without these quantities the headline RAM and accuracy improvements cannot be verified as resulting from an on-device procedure.

Authors: The full manuscript details the on-device execution of the NAS on Raspberry Pi 4 in the experimental setup and results sections, confirming use of guided-session data only. However, we agree the abstract would benefit from explicit quantitative support for this claim. We will revise the abstract to include search duration, peak memory footprint during search, and the number of architectures evaluated, with corresponding details added to the main text for verification. revision: yes
Referee: [Abstract] Abstract: quantitative gains (0.63 times less RAM and +5.96 pp accuracy on ISL; 0.44 times less RAM and +0.2 pp on CWRU) are presented without baseline descriptions, number of runs, or error analysis, preventing assessment of whether the improvements are attributable to the proposed NAS rather than implementation choices or dataset specifics.

Authors: The gains are reported relative to prior methods detailed in the related work and experimental sections. We acknowledge that the abstract would be clearer with explicit baseline references, run counts, and error measures. We will update the abstract to briefly note the baselines and include multi-run statistics with standard deviations, ensuring consistency with the detailed analysis already present in the results. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental validation with no derivations or self-referential fits

full rationale

The paper proposes an on-device NAS method and reports empirical results on ISL and CWRU datasets using embedded hardware. No equations, derivations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the abstract or described content. Performance claims rest on direct experimental comparisons rather than any chain that reduces to its own inputs by construction. The central precondition (search executing on-device with limited data) is an empirical assumption tested in the work, not a definitional or fitted tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no mathematical derivations, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5773 in / 975 out tokens · 22474 ms · 2026-06-27T05:02:55.170917+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

Near-sensor and in-sensor computing,

F. Zhou and Y . Chai, “Near-sensor and in-sensor computing,” Nature Electronics, vol. 3, no. 11, pp. 664–671, 2020

2020
[2]

Embracing change: Continual learning in deep neural networks,

R. Hadsell, D. Rao, A. A. Rusu, and R. Pascanu, “Embracing change: Continual learning in deep neural networks,” Trends in cognitive sciences, vol. 24, no. 12, pp. 1028–1040, 2020

2020
[3]

Authentication gets personal with biometrics,

J. Ortega-Garcia, J. Bigun, D. Reynolds, and J. Gonzalez-Rodriguez, “Authentication gets personal with biometrics,” IEEE signal processing magazine, vol. 21, no. 2, pp. 50–62, 2004

2004
[4]

Quantization and training of neural networks for efficient integer-arithmetic-only inference,

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704– 2713

2018
[5]

Mcunet: Tiny deep learning on iot devices,

J. Lin, W.-M. Chen, Y . Lin, C. Gan, S. Han et al., “Mcunet: Tiny deep learning on iot devices,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 711–11 722, 2020

2020
[6]

Mcunetv2: Memory- efficient patch-based inference for tiny deep learning,

J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han, “Mcunetv2: Memory- efficient patch-based inference for tiny deep learning,” arXiv preprint arXiv:2110.15352, 2021

work page arXiv 2021
[7]

On- device training under 256kb memory,

J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, “On- device training under 256kb memory,” Advances in Neural Information Processing Systems, vol. 35, pp. 22 941–22 954, 2022

2022
[8]

An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,

J. Lee and H.-J. Yoo, “An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,” IEEE Open Journal of the Solid-State Circuits Society, vol. 1, pp. 115–128, 2021

2021
[9]

Efficient on-device training via gradient filtering,

Y . Yang, G. Li, and R. Marculescu, “Efficient on-device training via gradient filtering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3811–3820

2023
[10]

Running hardware-aware neural architecture search on embedded devices under 512mb of ram,

A. M. Garavagno, E. Ragusa, A. Frisoli, and P. Gastaldo, “Running hardware-aware neural architecture search on embedded devices under 512mb of ram,” in 2024 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2024, pp. 1–2

2024
[11]

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms,

——, “An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms,” IEEE Sensors Letters, 2024

2024
[12]

Combining com- pressed sensing and neural architecture search for sensor-near vibration diagnostics,

E. Ragusa, F. Zonzini, P. Gastaldo, and L. De Marchi, “Combining com- pressed sensing and neural architecture search for sensor-near vibration diagnostics,” IEEE Transactions on Industrial Informatics, 2024

2024
[13]

Lightweight neural archi- tecture search for temporal convolutional networks at the edge,

M. Risso, A. Burrello, F. Conti, L. Lamberti, Y . Chen, L. Benini, E. Macii, M. Poncino, and D. J. Pagliari, “Lightweight neural archi- tecture search for temporal convolutional networks at the edge,” IEEE Transactions on Computers, vol. 72, no. 3, pp. 744–758, 2022

2022
[14]

Neural architecture search for 1d cnns—different approaches tests and measurements,

J. Rala Cordeiro, A. Raimundo, O. Postolache, and P. Sebasti ˜ao, “Neural architecture search for 1d cnns—different approaches tests and measurements,” Sensors, vol. 21, no. 23, p. 7990, 2021

2021
[15]

Fast hardware- aware neural architecture search,

L. L. Zhang, Y . Yang, Y . Jiang, W. Zhu, and Y . Liu, “Fast hardware- aware neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 692–693

2020
[16]

Colabnas: Obtaining lightweight task-specific convolutional neural networks following oc- cam’s razor,

A. M. Garavagno, D. Leonardis, and A. Frisoli, “Colabnas: Obtaining lightweight task-specific convolutional neural networks following oc- cam’s razor,”Future Generation Computer Systems, vol. 152, pp. 152– 159, 2024

2024
[17]

A comprehensive review on applications of raspberry pi,

S. E. Mathe, H. K. Kondaveeti, S. Vappangi, S. D. Vanambathina, and N. K. Kumaravelu, “A comprehensive review on applications of raspberry pi,” Computer Science Review, vol. 52, p. 100636, 2024

2024
[18]

Electromyography ges- tures sensing with deeply quantized neural networks,

D. P. Pau and M. D. Randriatsimiovalaza, “Electromyography ges- tures sensing with deeply quantized neural networks,” in 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE). IEEE, 2023, pp. 711–716

2023
[19]

An improved fault diagnosis using 1d-convolutional neural network model,

C.-C. Chen, Z. Liu, G. Yang, C.-C. Wu, and Q. Ye, “An improved fault diagnosis using 1d-convolutional neural network model,” Electronics, vol. 10, no. 1, p. 59, 2020

2020
[20]

Italian sign language alphabet recognition from surface emg and imu sensors with a deep neural network

P. Sernani, I. Pacifici, N. Falcionelli, S. Tomassini, A. F. Dragoni et al., “Italian sign language alphabet recognition from surface emg and imu sensors with a deep neural network.” in RTA-CSIT, 2021, pp. 74–83

2021
[21]

Rolling element bearing diagnostics using the case western reserve university data: A benchmark study,

W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using the case western reserve university data: A benchmark study,” Mechanical systems and signal processing, vol. 64, pp. 100–131, 2015

2015

[1] [1]

Near-sensor and in-sensor computing,

F. Zhou and Y . Chai, “Near-sensor and in-sensor computing,” Nature Electronics, vol. 3, no. 11, pp. 664–671, 2020

2020

[2] [2]

Embracing change: Continual learning in deep neural networks,

R. Hadsell, D. Rao, A. A. Rusu, and R. Pascanu, “Embracing change: Continual learning in deep neural networks,” Trends in cognitive sciences, vol. 24, no. 12, pp. 1028–1040, 2020

2020

[3] [3]

Authentication gets personal with biometrics,

J. Ortega-Garcia, J. Bigun, D. Reynolds, and J. Gonzalez-Rodriguez, “Authentication gets personal with biometrics,” IEEE signal processing magazine, vol. 21, no. 2, pp. 50–62, 2004

2004

[4] [4]

Quantization and training of neural networks for efficient integer-arithmetic-only inference,

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704– 2713

2018

[5] [5]

Mcunet: Tiny deep learning on iot devices,

J. Lin, W.-M. Chen, Y . Lin, C. Gan, S. Han et al., “Mcunet: Tiny deep learning on iot devices,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 711–11 722, 2020

2020

[6] [6]

Mcunetv2: Memory- efficient patch-based inference for tiny deep learning,

J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han, “Mcunetv2: Memory- efficient patch-based inference for tiny deep learning,” arXiv preprint arXiv:2110.15352, 2021

work page arXiv 2021

[7] [7]

On- device training under 256kb memory,

J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, “On- device training under 256kb memory,” Advances in Neural Information Processing Systems, vol. 35, pp. 22 941–22 954, 2022

2022

[8] [8]

An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,

J. Lee and H.-J. Yoo, “An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,” IEEE Open Journal of the Solid-State Circuits Society, vol. 1, pp. 115–128, 2021

2021

[9] [9]

Efficient on-device training via gradient filtering,

Y . Yang, G. Li, and R. Marculescu, “Efficient on-device training via gradient filtering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3811–3820

2023

[10] [10]

Running hardware-aware neural architecture search on embedded devices under 512mb of ram,

A. M. Garavagno, E. Ragusa, A. Frisoli, and P. Gastaldo, “Running hardware-aware neural architecture search on embedded devices under 512mb of ram,” in 2024 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2024, pp. 1–2

2024

[11] [11]

An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms,

——, “An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms,” IEEE Sensors Letters, 2024

2024

[12] [12]

Combining com- pressed sensing and neural architecture search for sensor-near vibration diagnostics,

E. Ragusa, F. Zonzini, P. Gastaldo, and L. De Marchi, “Combining com- pressed sensing and neural architecture search for sensor-near vibration diagnostics,” IEEE Transactions on Industrial Informatics, 2024

2024

[13] [13]

Lightweight neural archi- tecture search for temporal convolutional networks at the edge,

M. Risso, A. Burrello, F. Conti, L. Lamberti, Y . Chen, L. Benini, E. Macii, M. Poncino, and D. J. Pagliari, “Lightweight neural archi- tecture search for temporal convolutional networks at the edge,” IEEE Transactions on Computers, vol. 72, no. 3, pp. 744–758, 2022

2022

[14] [14]

Neural architecture search for 1d cnns—different approaches tests and measurements,

J. Rala Cordeiro, A. Raimundo, O. Postolache, and P. Sebasti ˜ao, “Neural architecture search for 1d cnns—different approaches tests and measurements,” Sensors, vol. 21, no. 23, p. 7990, 2021

2021

[15] [15]

Fast hardware- aware neural architecture search,

L. L. Zhang, Y . Yang, Y . Jiang, W. Zhu, and Y . Liu, “Fast hardware- aware neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 692–693

2020

[16] [16]

Colabnas: Obtaining lightweight task-specific convolutional neural networks following oc- cam’s razor,

A. M. Garavagno, D. Leonardis, and A. Frisoli, “Colabnas: Obtaining lightweight task-specific convolutional neural networks following oc- cam’s razor,”Future Generation Computer Systems, vol. 152, pp. 152– 159, 2024

2024

[17] [17]

A comprehensive review on applications of raspberry pi,

S. E. Mathe, H. K. Kondaveeti, S. Vappangi, S. D. Vanambathina, and N. K. Kumaravelu, “A comprehensive review on applications of raspberry pi,” Computer Science Review, vol. 52, p. 100636, 2024

2024

[18] [18]

Electromyography ges- tures sensing with deeply quantized neural networks,

D. P. Pau and M. D. Randriatsimiovalaza, “Electromyography ges- tures sensing with deeply quantized neural networks,” in 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE). IEEE, 2023, pp. 711–716

2023

[19] [19]

An improved fault diagnosis using 1d-convolutional neural network model,

C.-C. Chen, Z. Liu, G. Yang, C.-C. Wu, and Q. Ye, “An improved fault diagnosis using 1d-convolutional neural network model,” Electronics, vol. 10, no. 1, p. 59, 2020

2020

[20] [20]

Italian sign language alphabet recognition from surface emg and imu sensors with a deep neural network

P. Sernani, I. Pacifici, N. Falcionelli, S. Tomassini, A. F. Dragoni et al., “Italian sign language alphabet recognition from surface emg and imu sensors with a deep neural network.” in RTA-CSIT, 2021, pp. 74–83

2021

[21] [21]

Rolling element bearing diagnostics using the case western reserve university data: A benchmark study,

W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using the case western reserve university data: A benchmark study,” Mechanical systems and signal processing, vol. 64, pp. 100–131, 2015

2015