hub

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

· 2018 · cs.NE · arXiv 1801.06601

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open full Pith review browse 15 citing papers arXiv PDF

abstract

Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication. This paper presents CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Neural network inference based on CMSIS-NN kernels achieves 4.6X improvement in runtime/throughput and 4.9X improvement in energy efficiency.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Verifiable and Confidential DNN Inference on Low-End Edge Devices

cs.CR · 2026-06-05 · unverdicted · novelty 7.0

VECODI introduces SHANGRI-LA, an intermediate-privilege runtime on TrustZone-M, to enable verifiable confidential DNN inference on constrained edge devices with small TCB and overhead.

SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring

cs.SD · 2026-05-20 · accept · novelty 7.0

SEABAD is a publicly released, balanced dataset of 50,000 curated 16 kHz audio clips spanning 1,677 tropical bird species, with a dual-branch curation pipeline and MobileNetV3-Small baseline reaching 99.57% accuracy.

Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation

eess.SP · 2026-05-18 · unverdicted · novelty 7.0

NDR-SHKF replaces the static forgetting factor in Sage-Husa Kalman Filters with a learned vector-valued memory attenuation policy from a bifurcated recurrent network trained end-to-end on whitened innovations to minimize estimation error.

Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices

cs.LG · 2026-05-15 · conditional · novelty 7.0

CATS enables collaborative transformer inference on up to 16 ultra-low-power wireless devices, supporting models up to 14 times larger than a single device can run via SomeGather pruning and message-dropout robustness.

Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and suppressed identity leakage.

AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators

cs.DC · 2026-02-15 · unverdicted · novelty 6.0

AEG baremetal framework achieves 9.2x higher compute efficiency, 3-7x less data movement, and near-zero latency variance for ResNet-18 on 28 AIE tiles versus Linux Vitis AI on 304 tiles while maintaining 68.78% ImageNet accuracy.

Federated Learning with Non-IID Data

cs.LG · 2018-06-02 · conditional · novelty 6.0

Non-IID data causes up to 55% accuracy loss in federated learning due to weight divergence measured by earth mover's distance; 5% globally shared data recovers 30% accuracy on CIFAR-10.

Split CNN Inference on Networked Microcontrollers

cs.DC · 2026-05-10 · unverdicted · novelty 6.0

A fine-grained split inference system enables CNN models infeasible on single MCUs to run across networked devices by partitioning at sub-layer granularity, reducing per-device peak RAM while keeping practical latency.

EdgeSpike: Spiking Neural Networks for Low-Power Autonomous Sensing in Edge IoT Architectures

cs.NE · 2026-04-29 · unverdicted · novelty 6.0

EdgeSpike delivers 91.4% mean accuracy on five sensing tasks with 31x lower energy on neuromorphic hardware and 6.3x longer battery life in a seven-month field deployment compared to conventional CNNs.

Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

cs.AR · 2026-04-17 · unverdicted · novelty 6.0

A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.

Neuromorphic Parameter Estimation for Power Converter Health Monitoring Using Spiking Neural Networks

cs.NE · 2026-04-17 · unverdicted · novelty 6.0

A three-layer leaky integrate-and-fire spiking neural network estimates passive component parameters in power converters, cutting resistance error from 25.8% to 10.2% versus feedforward baselines at projected 270x lower energy on neuromorphic chips.

From Compression to Deployment: Real-Time and Energy-Efficient FastGRNN on Ultra-Constrained Microcontrollers

cs.AR · 2026-06-15 · accept · novelty 5.0

Compressed FastGRNN model with 566-byte weights runs real-time 50 Hz inference on Arduino and MSP430 MCUs at macro F1 0.918 while matching PyTorch reference and cutting energy 96.7% via LUT.

MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices

cs.CV · 2026-02-06 · conditional · novelty 4.0 · 2 refs

MicroBi-ConvLSTM is a convolutional-recurrent model with 11.4K parameters that delivers competitive accuracy on eight HAR benchmarks and full INT8 deployment coverage on Raspberry Pi Pico 2 and ESP32.

Quantized AI Inference on Constrained Embedded Platforms for Small-Satellite Settings

cs.AR · 2026-06-03 · unverdicted · novelty 3.0

Measurement-based characterization of quantized AI inference latency and data movement on Cortex-M platforms, positioned as a lower-bound reference for small-satellite embedded vision workloads.

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

cs.LG · 2026-06-16 · unverdicted · novelty 2.0

This paper outlines a systems-oriented workflow for embedded machine learning on microcontrollers, using accelerometer-based motion recognition and audio keyword spotting as running examples to illustrate data, feature, evaluation, and deployment steps.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Split CNN Inference on Networked Microcontrollers cs.DC · 2026-05-10 · unverdicted · none · ref 27
A fine-grained split inference system enables CNN models infeasible on single MCUs to run across networked devices by partitioning at sub-layer granularity, reducing per-device peak RAM while keeping practical latency.
Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition cs.AR · 2026-04-17 · unverdicted · none · ref 7
A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer