pith. sign in

arxiv: 2102.10423 · v2 · pith:O7G6CCAHnew · submitted 2021-02-20 · 💻 cs.LG · cs.AR

An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

classification 💻 cs.LG cs.AR
keywords edgeacceleratorstpusconvolutionaldiscussnetworksneuralacross
0
0 comments X
read the original abstract

Edge TPUs are a domain of accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. In this paper, we first discuss the major microarchitectural details of Edge TPUs. Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline, across 423K unique convolutional neural networks. Building upon this extensive study, we discuss critical and interpretable microarchitectural insights about the studied classes of Edge TPUs. Mainly, we discuss how Edge TPU accelerators perform across convolutional neural networks with different structures. Finally, we present our ongoing efforts in developing high-accuracy learned machine learning models to estimate the major performance metrics of accelerators such as latency and energy consumption. These learned models enable significantly faster (in the order of milliseconds) evaluations of accelerators as an alternative to time-consuming cycle-accurate simulators and establish an exciting opportunity for rapid hard-ware/software co-design.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Verifiable and Confidential DNN Inference on Low-End Edge Devices

    cs.CR 2026-06 unverdicted novelty 7.0

    VECODI introduces SHANGRI-LA, an intermediate-privilege runtime on TrustZone-M, to enable verifiable confidential DNN inference on constrained edge devices with small TCB and overhead.

  2. Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

    cs.CR 2026-04 unverdicted novelty 7.0

    Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

  3. Single-Shot Matrix-Matrix Multiplication Optical Tensor Processor for Deep Learning

    physics.optics 2025-03 unverdicted novelty 6.0

    Demonstrates single-shot 3D matrix-matrix multiplication in an optical tensor processor that accelerates CNNs and DNNs at 20 aJ per MAC with 96.4% accuracy on image recognition using 292616 parameters.