pith. sign in

arxiv: 1907.05047 · v2 · pith:3Y4QGHODnew · submitted 2019-07-11 · 💻 cs.CV

BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

Pith reviewed 2026-05-24 23:19 UTC · model grok-4.3

classification 💻 cs.CV
keywords face detectionmobile GPUreal-time inferenceaugmented realitylightweight neural networkSSD anchor schemetie resolution
0
0 comments X

The pith

BlazeFace detects faces at 200-1000+ FPS on mobile GPUs using a custom lightweight network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BlazeFace as a face detector built for mobile GPU hardware that delivers super-realtime performance. This speed makes it practical as the initial stage in augmented reality systems that need a facial region of interest to feed into later models for keypoints, expression analysis, or segmentation. The authors reach these rates through three changes: a feature extraction network lighter than MobileNet variants, an anchor scheme adjusted from SSD for GPU execution, and a tie-breaking method that replaces non-maximum suppression. A reader would care because such performance on phones could allow continuous facial processing without noticeable delay or high power draw.

Core claim

BlazeFace is a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. The contributions include a lightweight feature extraction network inspired by but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector, and an improved tie resolution strategy.

What carries the argument

Lightweight feature extraction network combined with GPU-friendly SSD-style anchors and non-NMS tie resolution.

If this is right

  • The detector can supply accurate facial regions of interest to downstream AR models for keypoint estimation.
  • It supports real-time facial expression classification and feature analysis on phones.
  • Face region segmentation becomes feasible within live AR pipelines.
  • The approach works across flagship mobile devices without custom hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same anchor and tie-resolution adjustments might apply to other single-shot detectors on mobile GPUs.
  • Sustained high frame rates could lower average power use in always-on camera applications.
  • Design patterns here could guide speed optimizations for related tasks like hand or body detection.

Load-bearing premise

The described changes to the feature extractor, anchor scheme, and tie resolution produce the stated speed and accuracy on mobile GPUs.

What would settle it

Benchmark measurements on a flagship mobile device showing inference slower than 200 FPS or detection accuracy substantially below standard mobile face detectors.

read the original abstract

We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation. Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents BlazeFace, a lightweight neural face detector optimized for mobile GPU inference. It claims to run at 200-1000+ FPS on flagship devices through three contributions: a custom feature extraction network distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from SSD, and an improved tie resolution strategy as an alternative to NMS. The detector is intended to supply accurate facial regions of interest as input to downstream AR models for tasks such as 2D/3D keypoint estimation, expression classification, and segmentation.

Significance. If the reported sub-millisecond performance holds and the three architectural modifications can be shown to be responsible for the gains, the work would provide a useful engineering contribution for real-time face detection in mobile augmented reality pipelines. The emphasis on GPU-friendly design choices directly addresses deployment constraints on mobile hardware. The paper is presented as an applied artifact rather than a parameter-free derivation or theoretical result.

major comments (1)
  1. [Results] Results section: the paper reports aggregate FPS on flagship devices and qualitative AR use-cases, but contains no ablation tables, no per-component timing breakdowns, and no direct comparison against an unmodified SSD-MobileNet baseline on the same hardware. This leaves open the possibility that model size alone, rather than the three cited modifications, accounts for the performance, undermining attribution of the central speed claim.
minor comments (1)
  1. [Abstract] Abstract: the claim of 'accurate facial regions of interest' is stated without accompanying quantitative accuracy metrics (e.g., mAP or precision-recall) to accompany the FPS figures.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We appreciate the referee's constructive feedback on our manuscript. We address the major comment below.

read point-by-point responses
  1. Referee: [Results] Results section: the paper reports aggregate FPS on flagship devices and qualitative AR use-cases, but contains no ablation tables, no per-component timing breakdowns, and no direct comparison against an unmodified SSD-MobileNet baseline on the same hardware. This leaves open the possibility that model size alone, rather than the three cited modifications, accounts for the performance, undermining attribution of the central speed claim.

    Authors: We agree that the manuscript would be strengthened by explicit ablation studies, per-component timing breakdowns, and a direct comparison to an unmodified SSD-MobileNet baseline on the same hardware. The current version focuses on the end-to-end performance of the integrated BlazeFace system on mobile GPUs. In the revised manuscript we will add ablation tables isolating the contributions of the custom backbone, modified anchor scheme, and tie-resolution method, along with the requested baseline comparison and timing details where available on the target devices. revision: yes

Circularity Check

0 steps flagged

No significant circularity; engineering artifact without load-bearing derivations or self-referential reductions

full rationale

The paper presents an empirical engineering result: a lightweight face detector with three listed modifications (feature extractor distinct from MobileNet, modified SSD anchors, alternative tie resolution). No equations, fitted parameters, predictions, or uniqueness theorems appear. Claims rest on reported FPS measurements and qualitative AR use-cases rather than any derivation chain that reduces to its own inputs by construction. Self-citations (if present) are not load-bearing for a central premise, and the work is self-contained against external benchmarks without tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the paper is an applied neural architecture description rather than a theoretical derivation.

pith-pipeline@v0.9.0 · 5659 in / 931 out tokens · 17962 ms · 2026-05-24T23:19:49.874342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BIDO: A Biometric Identity Online Authentication Framework

    cs.ET 2026-05 unverdicted novelty 5.0

    BIDO derives transient ECDSA keys from live facial biometrics salted with a memorized secret to produce non-resident WebAuthn credentials, achieving 99.51% verification accuracy on LFW without storing templates or PII.

  2. UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks

    cs.CR 2026-04 unverdicted novelty 5.0

    UNSEEN combines AR access control, LLM unlearning to suppress profiles, and agent guardrails to defend against AR-LLM social engineering attacks, tested in a 60-person user study with 360 conversations.