An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks
read the original abstract
Edge TPUs are a domain of accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. In this paper, we first discuss the major microarchitectural details of Edge TPUs. Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline, across 423K unique convolutional neural networks. Building upon this extensive study, we discuss critical and interpretable microarchitectural insights about the studied classes of Edge TPUs. Mainly, we discuss how Edge TPU accelerators perform across convolutional neural networks with different structures. Finally, we present our ongoing efforts in developing high-accuracy learned machine learning models to estimate the major performance metrics of accelerators such as latency and energy consumption. These learned models enable significantly faster (in the order of milliseconds) evaluations of accelerators as an alternative to time-consuming cycle-accurate simulators and establish an exciting opportunity for rapid hard-ware/software co-design.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Verifiable and Confidential DNN Inference on Low-End Edge Devices
VECODI introduces SHANGRI-LA, an intermediate-privilege runtime on TrustZone-M, to enable verifiable confidential DNN inference on constrained edge devices with small TCB and overhead.
-
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
-
Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning
Demonstrates single-shot 3D matrix-matrix multiplication in an optical tensor processor that accelerates CNNs and DNNs at 20 aJ per MAC with 96.4% accuracy on image recognition using 292616 parameters.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.