arxiv: 2604.06729 · v1 · submitted 2026-04-08 · 💻 cs.CR

Recognition: 1 theorem link

· Lean Theorem

Turn Your Face Into An Attack Surface: Screen Attack Using Facial Reflections in Video Conferencing

En Zhang, Jiazi Li, Mingyang Chen, Wanqing Tu, Yanzhao Lu, Yong Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3

classification 💻 cs.CR

keywords video conferencingside-channel attackfacial reflectionscreen leakageeavesdroppingoptical side channelprivacy

0 comments

The pith

Facial reflections in video calls leak detailed on-screen application activity to observers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a face in front of a screen reflects light variations from displayed content, and these reflections are captured in the video feed of conferencing software. An attacker can process the feed to recover which applications are in use and what they show. The authors built a system called FaceTell that performs this extraction and tested it across real laptops, platforms, people, and rooms. It reaches 99.32 percent accuracy on 28 common applications while remaining effective under normal variations in lighting and movement. The result shows that video conferencing creates an unintended visual side channel for screen surveillance.

Core claim

FaceTell recovers fine-grained application activity by analyzing optical variations reflected from the user's face during video conferencing. The face receives light from both the display and the room, then redirects content-dependent changes back to the camera. Experiments with 24 subjects, 13 indoor settings, three laptop models, and four conferencing platforms produced 99.32 percent accuracy across more than 12 hours of video for 28 applications, with resilience to common practical disturbances.

What carries the argument

The optical reflection of screen-emitted and ambient light from the human face, which encodes and carries on-screen content variations into the video conferencing camera feed for classification.

If this is right

Observers of a video call can determine the exact applications open on a participant's screen without any direct access.
The leakage occurs across multiple laptop brands and major conferencing platforms under everyday indoor conditions.
User movement and ambient light changes do not stop reliable extraction of application activity.
Countermeasures must be added to video systems to limit how much screen light reaches and reflects from the face.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Video call participants in sensitive environments may need to control room lighting or screen brightness to reduce the reflection channel.
The same reflection principle could allow inference of other screen details such as document text or browser tabs if the classifier is extended.
Conferencing software could incorporate real-time filters that detect and suppress facial reflection patterns before transmission.
Similar optical side channels may appear in other camera-based remote collaboration tools where faces are filmed near displays.

Load-bearing premise

Subtle light variations reflected from the face stay detectable and distinguishable in ordinary video call conditions even when lighting, head position, camera quality, and background change.

What would settle it

A controlled test in a typical indoor video call with standard room lighting and normal user movement that drops FaceTell's application identification accuracy to near-random levels.

Figures

Figures reproduced from arXiv: 2604.06729 by En Zhang, Jiazi Li, Mingyang Chen, Wanqing Tu, Yanzhao Lu, Yong Huang.

**Figure 1.** Figure 1: An example of screen-based attacks. accuracy of 99.32% in application prediction. The accuracy is achieved with a runtime of approximately 124 ms per video frame. Moreover, the experimental results show that FaceTell is robust to impact factors, including subject gender, facial occlusions, distances, angles, and ambient light. Contributions. Our contributions are summarized below: • A New Side-Channel Atta… view at source ↗

**Figure 2.** Figure 2: An example to demonstrate the theoretical model [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 5.** Figure 5: Investigation of minimally differentiable content. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 4.** Figure 4: Experimental setup and results. 3.2 Simulation Validation With the above theoretical model, a Python based simulation is conducted to analyze Gd(·,·) and Gs(·,·) in Eq. (4) and Eq. (5). Specifically, as depicted in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 7.** Figure 7: Workflow of face segmentation and reconstruction. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Characteristics of online video streams on Zoom. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 11.** Figure 11: Device setups during data collection. a 720p HD camera; the HP ProBook 440 is equipped with a 720p HD camera. In addition, a Lenovo Y9000P with a 720p HD camera is used as an attack device to record video streams from the other devices. Note that all devices have an auto-brightness-adjustment feature, which is activated by Windows automatically. Video Conferencing Platforms. We select four mainstream app… view at source ↗

**Figure 12.** Figure 12: Overall performance of FaceTell. Office Software 384 1 2 406 6 2 1 1 394 4 1 1 419 1 1 1 2 11 410 1 435 Adobe ReaderWordExcelVisio OneNote PowerPoint Predictions Adobe Reader Word Excel Visio OneNote PowerPoint True Labels Programming Software 390 7 1 1 1 398 420 1 1 3 2 361 3 3 383 1 1 411 MATLAB Visual Studio Eclipse PyCharm Vim Notepad Predictions MATLAB Visual Studio Eclipse PyCharm Vim Notepad True L… view at source ↗

**Figure 13.** Figure 13: Confusion matrices for office software and pro [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

**Figure 14.** Figure 14: Feature visualization of the category discriminator [PITH_FULL_IMAGE:figures/full_fig_p010_14.png] view at source ↗

**Figure 15.** Figure 15: Performance on different subjects [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 17.** Figure 17: Setups of different distances and angles. [PITH_FULL_IMAGE:figures/full_fig_p011_17.png] view at source ↗

**Figure 19.** Figure 19: Performance under different intensities of ambient [PITH_FULL_IMAGE:figures/full_fig_p012_19.png] view at source ↗

**Figure 18.** Figure 18: Testbed for investigation of ambient light. [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗

**Figure 20.** Figure 20: Screenshots of 28 selected applications. Each im [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗

**Figure 21.** Figure 21: The multimedia predictor’s performance under [PITH_FULL_IMAGE:figures/full_fig_p020_21.png] view at source ↗

**Figure 22.** Figure 22: Performance under different video conferencing [PITH_FULL_IMAGE:figures/full_fig_p020_22.png] view at source ↗

read the original abstract

In video conferencing, human faces serve as the primary visual focal points, playing multifaceted roles that enhance visual communication and emotional connection. However, we argue that a human face is also a side channel, which can unwittingly leak on-screen information through online video feeds. To demonstrate this, we conduct feasibility studies, which reveal that, illuminated by both ambient light and light emitted from displays, the human face can reflect optical variations of different on-screen content. The paper then proposes FaceTell, a novel side-channel attack system that eavesdrops on fine-grained application activities from pervasive yet subtle facial reflections during video conferencing. We implement FaceTell in a real-world testbed with three different brands of laptops and four mainstream video conferencing platforms. FaceTell is then evaluated with 24 human subjects across 13 unique indoor environments. With more than 12 hours of video data, FaceTell achieves a high accuracy of 99.32% for eavesdropping on 28 popular applications and is resilient to many practical impact factors. Finally, potential countermeasures are proposed to mitigate this new attack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Facial reflections off screen light during video calls can leak which apps are open, and their testbed shows 99%+ accuracy across real devices and rooms.

read the letter

The core result is that a person's face acts as a passive mirror for on-screen content in video calls, letting an attacker infer the exact app in use from the reflected light patterns. They call the system FaceTell and report 99.32% accuracy on 28 common applications after collecting more than 12 hours of video from 24 subjects across 13 indoor environments and three laptop brands on four platforms. That scale of testing is what makes the claim worth noticing rather than dismissing as lab-only.

Referee Report

0 major / 2 minor

Summary. The paper introduces FaceTell, a side-channel attack exploiting optical reflections from a user's face (illuminated by ambient light and screen emissions) during video conferencing to eavesdrop on on-screen application activity. It reports feasibility studies confirming detectable reflections, followed by a real-world implementation and evaluation across three laptop brands, four video platforms, 24 subjects, 13 indoor environments, and over 12 hours of video data, claiming 99.32% accuracy in classifying 28 popular applications while asserting resilience to lighting, movement, camera quality, and background variations.

Significance. If the empirical results hold under the described conditions, this work identifies a novel and practical side-channel in ubiquitous video conferencing tools, extending the literature on optical and visual side channels beyond traditional screen emanations or acoustic leaks. The scale of the multi-device, multi-subject, multi-environment testbed provides concrete evidence of feasibility and strengthens the case for considering facial reflections as an attack surface, which could motivate new privacy-preserving features in conferencing software.

minor comments (2)

The abstract states concrete accuracy figures and resilience claims but omits any reference to the feature extraction pipeline, classifier architecture, or statistical controls (e.g., error bars, cross-validation scheme, or explicit handling of head-pose/lighting confounders); adding one sentence summarizing these would improve clarity without altering the central narrative.
Section describing the testbed (multi-laptop, multi-platform, 24 subjects, 13 environments) would benefit from an explicit statement of how ground-truth application labels were obtained and synchronized with the video streams to allow independent reproduction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary and significance assessment of our work on FaceTell. We appreciate the recognition that the multi-device, multi-subject, multi-environment evaluation provides concrete evidence of a novel optical side-channel in video conferencing. As the report contains no specific major comments, we have no points requiring rebuttal and will address any minor revisions in the updated manuscript.

Circularity Check

0 steps flagged

No significant circularity: purely empirical evaluation

full rationale

The manuscript describes a side-channel attack implemented and evaluated via real-world testbed experiments (three laptops, four platforms, 24 subjects, 13 environments, >12 hours of video). No equations, parameter fitting, or derivation chain appear in the provided text. The 99.32% accuracy is reported as a measured outcome from collected data, not derived from or reduced to any internal definition or self-citation. Feasibility studies on illumination and resilience claims are addressed by explicit experimental variation rather than by construction. This is a standard empirical demonstration with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an experimental security demonstration; the abstract contains no mathematical axioms, free parameters, or newly postulated entities.

pith-pipeline@v0.9.0 · 5497 in / 1003 out tokens · 38494 ms · 2026-05-10T17:47:33.718366+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
theoretical modeling ... Phong reflection model ... diffuse and specular reflections ... two-tier classification model ... heuristic label correction algorithm

Reference graph

Works this paper leans on

40 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Technical guide, Labsphere Inc., 2001

The radiometry of light emitting diodes. Technical guide, Labsphere Inc., 2001. https://www.labspher e.com/wp-content/uploads/2021/09/Radiometry-of-L ight-Emitting-Diodes.pdf

2001
[2]

Illumination normalization of facial images by reversing the process of image formation.Machine Vision and Applications, 22(6):899–911, 2011

Faisal R Al-Osaimi, Mohammed Bennamoun, and Aj- mal Mian. Illumination normalization of facial images by reversing the process of image formation.Machine Vision and Applications, 22(6):899–911, 2011. https://li nk.springer.com/article/10.1007/s00138-010-0309-5

work page doi:10.1007/s00138-010-0309-5 2011
[3]

Tempest in a teapot: Compromising reflections revisited

Michael Backes, Tongbo Chen, Markus Dürmuth, Hen- drik PA Lensch, and Martin Welk. Tempest in a teapot: Compromising reflections revisited. InProceedings of the IEEE Symposium on Security and Privacy (SP), pages 315–327, 2009. https://ieeexplore.ieee.org/docu ment/5207653

work page arXiv 2009
[4]

Compromising reflections-or-how to read LCD monitors around the corner

Michael Backes, Markus Dürmuth, and Dominique Un- ruh. Compromising reflections-or-how to read LCD monitors around the corner. InProceedings of the IEEE Symposium on Security and Privacy (SP), pages 158– 169, 2008. https://ieeexplore.ieee.org/document/45311 51

2008
[5]

Average Session Duration By Industry,

Evan Bailyn. Average Session Duration By Industry,
[6]

https://firstpagesage.com/reports/average-session -duration-by-industry/
[7]

Principles of shape from specular reflection.Measurement, 43(10):1305– 1317, 2010

Jonathan Balzer and Stefan Werling. Principles of shape from specular reflection.Measurement, 43(10):1305– 1317, 2010. https://www.sciencedirect.com/science/arti cle/abs/pii/S0263224110001570

2010
[8]

Skype & Type: Key- board eavesdropping in V oice-over-IP.ACM Transac- tions on Privacy and Security, 22(4):1–34, 2019

Stefano Cecconello, Alberto Compagno, Mauro Conti, Daniele Lain, and Gene Tsudik. Skype & Type: Key- board eavesdropping in V oice-over-IP.ACM Transac- tions on Privacy and Security, 22(4):1–34, 2019. https: //dl.acm.org/doi/10.1145/3365366

work page doi:10.1145/3365366 2019
[9]

Toward proper eval- uation of light dose in indoor office environment by frontal lux meter.Energy Procedia, 122:835–840, 2017

Maíra Vieira Dias, Ali Motamed, Paulo Sergio Scaraz- zato, and Jean-Louis Scartezzini. Toward proper eval- uation of light dose in indoor office environment by frontal lux meter.Energy Procedia, 122:835–840, 2017. https://www.sciencedirect.com/science/article/pii/S1 87661021733148X?ref=pdf_download&fr=RR-2&rr= 9ac48880bc91a9b5

2017
[10]

3D morphable face models—past, present, and future.ACM Transactions on Graphics, 39(5):1–38,

Bernhard Egger, William AP Smith, Ayush Tewari, Ste- fanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romd- hani, et al. 3D morphable face models—past, present, and future.ACM Transactions on Graphics, 39(5):1–38,
[11]

https://dl.acm.org/doi/10.1145/3395208

work page doi:10.1145/3395208
[12]

No training hurdles: Fast training-agnostic attacks to infer your typing

Song Fang, Ian Markwood, Yao Liu, Shangqing Zhao, Zhuo Lu, and Haojin Zhu. No training hurdles: Fast training-agnostic attacks to infer your typing. InPro- ceedings of the ACM Conference on Computer and Com- munications Security (CCS), page 1747–1760, 2018. https://dl.acm.org/doi/10.1145/3243734.3243755

work page doi:10.1145/3243734.3243755 2018
[13]

Synesthesia: Detecting screen content via re- mote acoustic side channels

Daniel Genkin, Mihir Pattani, Roei Schuster, and Eran Tromer. Synesthesia: Detecting screen content via re- mote acoustic side channels. InProceedings of the IEEE Symposium on Security and Privacy (SP), pages 853– 869, 2019. https://www.computer.org/csdl/proceeding s-article/sp/2019/666000a853/1dlwjC6dIT6

2019
[14]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples.arXiv:1412.6572, 2014. https://arxiv.org/pdf/1412 .6572

work page internal anchor Pith review arXiv 2014
[15]

A threat for tablet PCs in public space: Remote visualization of screen im- ages using em emanation

Yuichi Hayashi, Naofumi Homma, Mamoru Miura, Takafumi Aoki, and Hideaki Sone. A threat for tablet PCs in public space: Remote visualization of screen im- ages using em emanation. InProceedings of the ACM Conference on Computer and Communications Security (CCS), pages 954–965, 2014. https://dl.acm.org/doi/10. 1145/2660267.2660292

work page arXiv 2014
[16]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2961–2969, 2017. https://openaccess.thecvf.com/ content_ICCV_2017/papers/He_Mask_R-CNN_ICC V_2017_paper.pdf

2017
[17]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 770–778,
[18]

https://ieeexplore.ieee.org/document/7780459

work page arXiv
[19]

Zoom user statistics 2024 — market share & revenue, 2024

Naveen Kumar. Zoom user statistics 2024 — market share & revenue, 2024. http://pages.cs.wisc.edu/~remzi /OSTEP/

2024
[20]

Wavespy: Remote and through-wall screen attack via mmWave sensing

Zhengxiong Li, Fenglong Ma, Aditya Singh Rathore, Zhuolin Yang, Baicheng Chen, Lu Su, and Wenyao Xu. Wavespy: Remote and through-wall screen attack via mmWave sensing. InProceedings of the IEEE Sym- posium on Security and Privacy (SP), pages 217–232,
[21]

https://ieeexplore.ieee.org/document/9152804

work page arXiv
[22]

Private Eye: On the lim- its of textual screen peeking via eyeglass reflections in video conferencing

Yan Long, Chen Yan, Shilin Xiao, Shivan Prasad, Wenyuan Xu, and Kevin Fu. Private Eye: On the lim- its of textual screen peeking via eyeglass reflections in video conferencing. InProceedings of the IEEE Sympo- sium on Security and Privacy (SP), pages 3432–3449,
[23]

https://www.computer.org/csdl/proceedings-artic le/sp/2023/933600a870/1OXGUMtuJLa

2023
[24]

Eavesdropping mobile app activity via radio- frequency energy harvesting

Tao Ni, Guohao Lan, Jia Wang, Qingchuan Zhao, and Weitao Xu. Eavesdropping mobile app activity via radio- frequency energy harvesting. InProceedings of the USENIX Security Symposium (USENIX Security), pages 3511–3528, 2023. https://www.usenix.org/system/files /usenixsecurity23-ni.pdf

2023
[25]

Do users write more insecure code with AI assistants?

Tao Ni, Xiaokuan Zhang, and Qingchuan Zhao. Re- covering fingerprints from in-display fingerprint sen- sors via electromagnetic side channel. InProceedings of the ACM Conference on Computer and Communi- cations Security (CCS), pages 253–267, 2023. https: //dl.acm.org/doi/10.1145/3576915.3623153

work page doi:10.1145/3576915.3623153 2023
[26]

Illumination for computer generated pictures

Bui Tuong Phong. Illumination for computer generated pictures. InSeminal graphics: pioneering efforts that shaped the field, pages 95–101. 1998. http://www.cs.n orthwestern.edu/~ago820/cs395/Papers/Phong_1975. pdf

1998
[27]

Digital Devices and Your Eyes, 2025

Daniel Porter. Digital Devices and Your Eyes, 2025. https://www.aao.org/eye-health/tips-prevention/digita l-devices-your-eyes

2025
[28]

iSpy: automatic reconstruction of typed input from com- promising reflections

Rahul Raguram, Andrew M White, Dibyendusekhar Goswami, Fabian Monrose, and Jan-Michael Frahm. iSpy: automatic reconstruction of typed input from com- promising reflections. InProceedings of the ACM con- ference on Computer and Communications Security (CCS), pages 527–536, 2011. https://dl.acm.org/d oi/10.1145/2046707.2046769

work page doi:10.1145/2046707.2046769 2011
[29]

On the stress potential of videoconferenc- ing: definition and root causes of Zoom fatigue.Elec- tronic Markets, 32(1):153–177, 2022

René Riedl. On the stress potential of videoconferenc- ing: definition and root causes of Zoom fatigue.Elec- tronic Markets, 32(1):153–177, 2022. https://link.sprin ger.com/article/10.1007/s12525-021-00501-3

work page doi:10.1007/s12525-021-00501-3 2022
[30]

LED directivity measurement in situ

Linas Svilainis. LED directivity measurement in situ. Measurement, 41(6):647–654, 2008. https://www.scie ncedirect.com/science/article/abs/pii/S0263224107000 917

2008
[31]

Numerical comparison of LED directivity approximation functions for video displays.Displays, 31(4-5):196–204, 2010

Linas Svilainis and Vytautas Dumbrava. Numerical comparison of LED directivity approximation functions for video displays.Displays, 31(4-5):196–204, 2010. https://www.sciencedirect.com/science/article/abs/pii/ S0141938210000661

2010
[32]

Illuminance - Recommended Light Levels, 2004

The Engineering ToolBox. Illuminance - Recommended Light Levels, 2004. https://www.engineeringtoolbox.c om/light-level-rooms-d_708.html

2004
[33]

Rapid object detec- tion using a boosted cascade of simple features

Paul Viola and Michael Jones. Rapid object detec- tion using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages I–I, 2001. https://ieeexplore.ieee.org/document/990517

2001
[34]

attention

Yan Wang, Yi Liu, Shijie Zhao, Junlin Li, and Li Zhang. CAMixerSR: Only details need more "attention". InPro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 25837–25846,
[35]

https://openaccess.thecvf.com/content/CVPR20 24/papers/Wang_CAMixerSR_Only_Details_Need_M ore_Attention_CVPR_2024_paper.pdf
[36]

The proof is in the glare: On the privacy risk posed by eyeglasses in video calls

Hassan Wasswa and Abdul Serwadda. The proof is in the glare: On the privacy risk posed by eyeglasses in video calls. InProceedings of the 2022 ACM on Inter- national Workshop on Security and Privacy Analytics, pages 46–54, 2022. https://dl.acm.org/doi/abs/10.1145 /3510548.3519378

work page arXiv 2022
[37]

I still know what you visited last summer: Leaking browsing history via user interaction and side channel attacks

Zachary Weinberg, Eric Y Chen, Pavithra Ramesh Ja- yaraman, and Collin Jackson. I still know what you visited last summer: Leaking browsing history via user interaction and side channel attacks. In2011 IEEE Sym- posium on Security and Privacy, pages 147–161, 2011. https://ieeexplore.ieee.org/document/5958027

work page arXiv 2011
[38]

CBAM: Convolutional block attention module

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. CBAM: Convolutional block attention module. InProceedings of the European Conference on Computer Vision (ECCV), pages 3–19, 2018. https: //openaccess.thecvf.com/content_ECCV_2018/papers /Sanghyun_Woo_Convolutional_Block_Attention_EC CV_2018_paper.pdf

2018
[39]

Seeing double: Reconstruct- ing obscured typed input from repeated compromising reflections

Yi Xu, Jared Heinly, Andrew M White, Fabian Monrose, and Jan-Michael Frahm. Seeing double: Reconstruct- ing obscured typed input from repeated compromising reflections. InProceedings of the ACM conference on Computer and Communications Security (CCS), pages 1063–1074, 2013. https://dl.acm.org/doi/10.1145/250 8859.2516709

work page doi:10.1145/250 2013
[40]

Study of human visual comfort based on sudden vertical illuminance changes.Build- ings, 12(8):1127, 2022

Jiuhong Zhang, Kunjie Lv, Xiaoqian Zhang, Mingxiao Ma, and Jiahui Zhang. Study of human visual comfort based on sudden vertical illuminance changes.Build- ings, 12(8):1127, 2022. https://www.mdpi.com/2075-5 309/12/8/1127. A Details of Implementation A.1 Screenshots of Selected Applications Multimedia: 1 to 8, Web Application: 9 to 14. OS Application: 1 to...

2022