On-Device Neural Architecture Search
Pith reviewed 2026-06-27 05:02 UTC · model grok-4.3
The pith
Lightweight neural architecture search can run directly on embedded devices to tailor tiny models to specific users' sensor data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By designing a lightweight NAS that executes on embedded hardware, the authors demonstrate that optimal tiny neural architectures can be found using only data from a short guided user session, resulting in models that occupy 0.63 times less RAM with 5.96 percentage points higher accuracy on the ISL dataset and 0.44 times less RAM with 0.2 percentage points higher accuracy on the CWRU dataset when tested on a Raspberry Pi 4.
What carries the argument
lightweight Neural Architecture Search algorithm engineered to operate under the memory and compute constraints of embedded systems like the Raspberry Pi
If this is right
- The discovered architectures require significantly less RAM than state-of-the-art alternatives while maintaining or improving classification accuracy on both datasets.
- Personalization of models for new users becomes feasible through on-device search after collecting a small amount of labeled data in a guided session.
- The approach applies across domains, including sign language recognition from sEMG signals and mechanical fault diagnosis.
Where Pith is reading between the lines
- Similar on-device search might enable continuous adaptation as sensor data distributions shift over time without external servers.
- Keeping data and search local could reduce privacy risks in biometric human-machine interface applications.
- The method might scale to other microcontroller platforms with tighter resource limits than the Raspberry Pi 4.
Load-bearing premise
The search procedure itself can be executed efficiently on the target embedded hardware using only the limited data collected during a guided user session, without requiring external compute or large validation sets.
What would settle it
Running the proposed NAS on a Raspberry Pi 4 with the ISL dataset and verifying whether the resulting architecture simultaneously achieves 0.63 times less RAM occupancy and 5.96 percentage points higher accuracy than state-of-the-art methods.
read the original abstract
This paper proposes a new approach to near-sensor computing, in which a lightweight Neural Architecture Search (NAS) is performed directly on the deployment device to find the best tiny neural architecture for analyzing the real-time data acquired through sensors. This new adaptation capability can be particularly useful in the case of human-machine interfaces for which the neural network analyzing the biometrical data can be re-designed each time the user changes, after a guided data collection procedure, fighting the typical data variations between individuals on a new level. To implement the proposed approach a new NAS has been designed and then validated on the Italian Sign Language dataset (ISL), a collection of surface electromyography (sEMG) signals of the signs of the Italian alphabet, using several embedded systems. Moreover, further validation on the Case Western Reserve University dataset (CWRU), a benchmark for intelligent fault diagnosis, is presented to suggest another possible application of the proposed approach. When run on a Raspberry Pi 4, the proposed NAS performs beyond the state of the art proposing a tiny neural architecture having 0.63 times less RAM occupancy and 5.96 percentage points of more accuracy in the case of the ISL dataset; and 0.44 times less RAM occupancy and 0.2 percentage points of more accuracy in the case of the CWRU dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a lightweight Neural Architecture Search (NAS) executed directly on embedded deployment hardware such as the Raspberry Pi 4 to discover optimal tiny neural networks for real-time sensor data analysis. The approach is motivated by personalization needs in human-machine interfaces and is validated on the Italian Sign Language (ISL) sEMG dataset and the CWRU bearing fault diagnosis dataset, with claims of architectures that occupy less RAM and achieve higher accuracy than prior methods when the full NAS runs on-device after a guided data collection session.
Significance. If the on-device search can be shown to complete within the memory, time, and data constraints of the target hardware using only the small per-user dataset, the work would demonstrate a practical route to adaptive edge models that do not require external compute or large validation sets. The two-dataset, real-hardware evaluation provides a concrete starting point for assessing utility in embedded signal-processing applications.
major comments (2)
- [Abstract] Abstract: the central claim that the NAS search itself executes on the Raspberry Pi 4 using only limited guided-session data is unsupported by any reported figures for search-time memory footprint, number of architectures evaluated, or search duration; without these quantities the headline RAM and accuracy improvements cannot be verified as resulting from an on-device procedure.
- [Abstract] Abstract: quantitative gains (0.63 imes less RAM and +5.96 pp accuracy on ISL; 0.44 imes less RAM and +0.2 pp on CWRU) are presented without baseline descriptions, number of runs, or error analysis, preventing assessment of whether the improvements are attributable to the proposed NAS rather than implementation choices or dataset specifics.
minor comments (1)
- A methods subsection describing the search algorithm, search-space constraints, and validation-set construction from the guided user session would clarify how on-device execution is achieved.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate planned revisions to strengthen the presentation of our on-device NAS results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the NAS search itself executes on the Raspberry Pi 4 using only limited guided-session data is unsupported by any reported figures for search-time memory footprint, number of architectures evaluated, or search duration; without these quantities the headline RAM and accuracy improvements cannot be verified as resulting from an on-device procedure.
Authors: The full manuscript details the on-device execution of the NAS on Raspberry Pi 4 in the experimental setup and results sections, confirming use of guided-session data only. However, we agree the abstract would benefit from explicit quantitative support for this claim. We will revise the abstract to include search duration, peak memory footprint during search, and the number of architectures evaluated, with corresponding details added to the main text for verification. revision: yes
-
Referee: [Abstract] Abstract: quantitative gains (0.63 times less RAM and +5.96 pp accuracy on ISL; 0.44 times less RAM and +0.2 pp on CWRU) are presented without baseline descriptions, number of runs, or error analysis, preventing assessment of whether the improvements are attributable to the proposed NAS rather than implementation choices or dataset specifics.
Authors: The gains are reported relative to prior methods detailed in the related work and experimental sections. We acknowledge that the abstract would be clearer with explicit baseline references, run counts, and error measures. We will update the abstract to briefly note the baselines and include multi-run statistics with standard deviations, ensuring consistency with the detailed analysis already present in the results. revision: yes
Circularity Check
No circularity: experimental validation with no derivations or self-referential fits
full rationale
The paper proposes an on-device NAS method and reports empirical results on ISL and CWRU datasets using embedded hardware. No equations, derivations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the abstract or described content. Performance claims rest on direct experimental comparisons rather than any chain that reduces to its own inputs by construction. The central precondition (search executing on-device with limited data) is an empirical assumption tested in the work, not a definitional or fitted tautology.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Near-sensor and in-sensor computing,
F. Zhou and Y . Chai, “Near-sensor and in-sensor computing,” Nature Electronics, vol. 3, no. 11, pp. 664–671, 2020
2020
-
[2]
Embracing change: Continual learning in deep neural networks,
R. Hadsell, D. Rao, A. A. Rusu, and R. Pascanu, “Embracing change: Continual learning in deep neural networks,” Trends in cognitive sciences, vol. 24, no. 12, pp. 1028–1040, 2020
2020
-
[3]
Authentication gets personal with biometrics,
J. Ortega-Garcia, J. Bigun, D. Reynolds, and J. Gonzalez-Rodriguez, “Authentication gets personal with biometrics,” IEEE signal processing magazine, vol. 21, no. 2, pp. 50–62, 2004
2004
-
[4]
Quantization and training of neural networks for efficient integer-arithmetic-only inference,
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704– 2713
2018
-
[5]
Mcunet: Tiny deep learning on iot devices,
J. Lin, W.-M. Chen, Y . Lin, C. Gan, S. Han et al., “Mcunet: Tiny deep learning on iot devices,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 711–11 722, 2020
2020
-
[6]
Mcunetv2: Memory- efficient patch-based inference for tiny deep learning,
J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han, “Mcunetv2: Memory- efficient patch-based inference for tiny deep learning,” arXiv preprint arXiv:2110.15352, 2021
-
[7]
On- device training under 256kb memory,
J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, “On- device training under 256kb memory,” Advances in Neural Information Processing Systems, vol. 35, pp. 22 941–22 954, 2022
2022
-
[8]
An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,
J. Lee and H.-J. Yoo, “An overview of energy-efficient hardware accelerators for on-device deep-neural-network training,” IEEE Open Journal of the Solid-State Circuits Society, vol. 1, pp. 115–128, 2021
2021
-
[9]
Efficient on-device training via gradient filtering,
Y . Yang, G. Li, and R. Marculescu, “Efficient on-device training via gradient filtering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3811–3820
2023
-
[10]
Running hardware-aware neural architecture search on embedded devices under 512mb of ram,
A. M. Garavagno, E. Ragusa, A. Frisoli, and P. Gastaldo, “Running hardware-aware neural architecture search on embedded devices under 512mb of ram,” in 2024 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2024, pp. 1–2
2024
-
[11]
An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms,
——, “An affordable hardware-aware neural architecture search for deploying convolutional neural networks on ultra-low-power computing platforms,” IEEE Sensors Letters, 2024
2024
-
[12]
Combining com- pressed sensing and neural architecture search for sensor-near vibration diagnostics,
E. Ragusa, F. Zonzini, P. Gastaldo, and L. De Marchi, “Combining com- pressed sensing and neural architecture search for sensor-near vibration diagnostics,” IEEE Transactions on Industrial Informatics, 2024
2024
-
[13]
Lightweight neural archi- tecture search for temporal convolutional networks at the edge,
M. Risso, A. Burrello, F. Conti, L. Lamberti, Y . Chen, L. Benini, E. Macii, M. Poncino, and D. J. Pagliari, “Lightweight neural archi- tecture search for temporal convolutional networks at the edge,” IEEE Transactions on Computers, vol. 72, no. 3, pp. 744–758, 2022
2022
-
[14]
Neural architecture search for 1d cnns—different approaches tests and measurements,
J. Rala Cordeiro, A. Raimundo, O. Postolache, and P. Sebasti ˜ao, “Neural architecture search for 1d cnns—different approaches tests and measurements,” Sensors, vol. 21, no. 23, p. 7990, 2021
2021
-
[15]
Fast hardware- aware neural architecture search,
L. L. Zhang, Y . Yang, Y . Jiang, W. Zhu, and Y . Liu, “Fast hardware- aware neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 692–693
2020
-
[16]
Colabnas: Obtaining lightweight task-specific convolutional neural networks following oc- cam’s razor,
A. M. Garavagno, D. Leonardis, and A. Frisoli, “Colabnas: Obtaining lightweight task-specific convolutional neural networks following oc- cam’s razor,”Future Generation Computer Systems, vol. 152, pp. 152– 159, 2024
2024
-
[17]
A comprehensive review on applications of raspberry pi,
S. E. Mathe, H. K. Kondaveeti, S. Vappangi, S. D. Vanambathina, and N. K. Kumaravelu, “A comprehensive review on applications of raspberry pi,” Computer Science Review, vol. 52, p. 100636, 2024
2024
-
[18]
Electromyography ges- tures sensing with deeply quantized neural networks,
D. P. Pau and M. D. Randriatsimiovalaza, “Electromyography ges- tures sensing with deeply quantized neural networks,” in 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE). IEEE, 2023, pp. 711–716
2023
-
[19]
An improved fault diagnosis using 1d-convolutional neural network model,
C.-C. Chen, Z. Liu, G. Yang, C.-C. Wu, and Q. Ye, “An improved fault diagnosis using 1d-convolutional neural network model,” Electronics, vol. 10, no. 1, p. 59, 2020
2020
-
[20]
Italian sign language alphabet recognition from surface emg and imu sensors with a deep neural network
P. Sernani, I. Pacifici, N. Falcionelli, S. Tomassini, A. F. Dragoni et al., “Italian sign language alphabet recognition from surface emg and imu sensors with a deep neural network.” in RTA-CSIT, 2021, pp. 74–83
2021
-
[21]
Rolling element bearing diagnostics using the case western reserve university data: A benchmark study,
W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using the case western reserve university data: A benchmark study,” Mechanical systems and signal processing, vol. 64, pp. 100–131, 2015
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.