An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

Amir Yazdanbakhsh; Berkin Akin; James Laudon; Kiran Seshadri; Ravi Narayanaswami

arxiv: 2102.10423 · v2 · pith:O7G6CCAHnew · submitted 2021-02-20 · 💻 cs.LG · cs.AR

An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

Kiran Seshadri , Berkin Akin , James Laudon , Ravi Narayanaswami , Amir Yazdanbakhsh This is my paper

classification 💻 cs.LG cs.AR

keywords edgeacceleratorstpusconvolutionaldiscussnetworksneuralacross

0 comments

read the original abstract

Edge TPUs are a domain of accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. In this paper, we first discuss the major microarchitectural details of Edge TPUs. Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline, across 423K unique convolutional neural networks. Building upon this extensive study, we discuss critical and interpretable microarchitectural insights about the studied classes of Edge TPUs. Mainly, we discuss how Edge TPU accelerators perform across convolutional neural networks with different structures. Finally, we present our ongoing efforts in developing high-accuracy learned machine learning models to estimate the major performance metrics of accelerators such as latency and energy consumption. These learned models enable significantly faster (in the order of milliseconds) evaluations of accelerators as an alternative to time-consuming cycle-accurate simulators and establish an exciting opportunity for rapid hard-ware/software co-design.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Verifiable and Confidential DNN Inference on Low-End Edge Devices
cs.CR 2026-06 unverdicted novelty 7.0

VECODI introduces SHANGRI-LA, an intermediate-privilege runtime on TrustZone-M, to enable verifiable confidential DNN inference on constrained edge devices with small TCB and overhead.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
cs.CR 2026-04 unverdicted novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning
physics.optics 2025-03 unverdicted novelty 6.0

Demonstrates single-shot 3D matrix-matrix multiplication in an optical tensor processor that accelerates CNNs and DNNs at 20 aJ per MAC with 96.4% accuracy on image recognition using 292616 parameters.