VECODI introduces SHANGRI-LA, an intermediate-privilege runtime on TrustZone-M, to enable verifiable confidential DNN inference on constrained edge devices with small TCB and overhead.
hub
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs
15 Pith papers cite this work. Polarity classification is still indexing.
abstract
Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication. This paper presents CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Neural network inference based on CMSIS-NN kernels achieves 4.6X improvement in runtime/throughput and 4.9X improvement in energy efficiency.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
SEABAD is a publicly released, balanced dataset of 50,000 curated 16 kHz audio clips spanning 1,677 tropical bird species, with a dual-branch curation pipeline and MobileNetV3-Small baseline reaching 99.57% accuracy.
NDR-SHKF replaces the static forgetting factor in Sage-Husa Kalman Filters with a learned vector-valued memory attenuation policy from a bifurcated recurrent network trained end-to-end on whitened innovations to minimize estimation error.
CATS enables collaborative transformer inference on up to 16 ultra-low-power wireless devices, supporting models up to 14 times larger than a single device can run via SomeGather pruning and message-dropout robustness.
MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and suppressed identity leakage.
AEG baremetal framework achieves 9.2x higher compute efficiency, 3-7x less data movement, and near-zero latency variance for ResNet-18 on 28 AIE tiles versus Linux Vitis AI on 304 tiles while maintaining 68.78% ImageNet accuracy.
Non-IID data causes up to 55% accuracy loss in federated learning due to weight divergence measured by earth mover's distance; 5% globally shared data recovers 30% accuracy on CIFAR-10.
A fine-grained split inference system enables CNN models infeasible on single MCUs to run across networked devices by partitioning at sub-layer granularity, reducing per-device peak RAM while keeping practical latency.
EdgeSpike delivers 91.4% mean accuracy on five sensing tasks with 31x lower energy on neuromorphic hardware and 6.3x longer battery life in a seven-month field deployment compared to conventional CNNs.
A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.
A three-layer leaky integrate-and-fire spiking neural network estimates passive component parameters in power converters, cutting resistance error from 25.8% to 10.2% versus feedforward baselines at projected 270x lower energy on neuromorphic chips.
Compressed FastGRNN model with 566-byte weights runs real-time 50 Hz inference on Arduino and MSP430 MCUs at macro F1 0.918 while matching PyTorch reference and cutting energy 96.7% via LUT.
MicroBi-ConvLSTM is a convolutional-recurrent model with 11.4K parameters that delivers competitive accuracy on eight HAR benchmarks and full INT8 deployment coverage on Raspberry Pi Pico 2 and ESP32.
Measurement-based characterization of quantized AI inference latency and data movement on Cortex-M platforms, positioned as a lower-bound reference for small-satellite embedded vision workloads.
This paper outlines a systems-oriented workflow for embedded machine learning on microcontrollers, using accelerometer-based motion recognition and audio keyword spotting as running examples to illustrate data, feature, evaluation, and deployment steps.
citing papers explorer
-
SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring
SEABAD is a publicly released, balanced dataset of 50,000 curated 16 kHz audio clips spanning 1,677 tropical bird species, with a dual-branch curation pipeline and MobileNetV3-Small baseline reaching 99.57% accuracy.
-
From Compression to Deployment: Real-Time and Energy-Efficient FastGRNN on Ultra-Constrained Microcontrollers
Compressed FastGRNN model with 566-byte weights runs real-time 50 Hz inference on Arduino and MSP430 MCUs at macro F1 0.918 while matching PyTorch reference and cutting energy 96.7% via LUT.