StreamPPG: Low-Latency rPPG Estimation via Consistent Privileged Learning

Hui-Liang Shen; Si-Yuan Cao; Xiaohan Zhang; Xiaokai Bai; Yihan Yang; Yiming Li; Yuanhui Hu; Yuguang Chu; Zhe Wu

arxiv: 2606.23186 · v1 · pith:MHQNJCBOnew · submitted 2026-06-22 · 💻 cs.CV

StreamPPG: Low-Latency rPPG Estimation via Consistent Privileged Learning

Yiming Li , Yihan Yang , Yuguang Chu , Yuanhui Hu , Si-Yuan Cao , Xiaohan Zhang , Xiaokai Bai , Zhe Wu

show 1 more author

Hui-Liang Shen

This is my paper

Pith reviewed 2026-06-26 09:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote photoplethysmographyrPPGprivileged learninglow-latency estimationframe-wise processingblood volume pulseedge computingreal-time monitoring

0 comments

The pith

StreamPPG enables frame-by-frame rPPG estimation from video with accuracy matching clip-wise methods by using ground-truth signals only during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the latency-accuracy trade-off in remote photoplethysmography, where clip-wise methods require over one hundred frames and introduce multi-second delays while frame-wise methods lose periodic signal features. StreamPPG introduces a unified architecture trained via consistent privileged learning that incorporates ground-truth rPPG signals exclusively at training time. This produces representations capable of accurate single-frame inference at test time. The result is state-of-the-art accuracy on standard datasets together with real-time throughput on edge hardware, supporting contact-free vital-sign monitoring without waiting for long video buffers.

Core claim

StreamPPG is a unified architecture that enables low-latency frame-wise physiological signal estimation while achieving competitive accuracy compared with clip-wise approaches. It is trained under a consistent privileged learning strategy that leverages ground-truth rPPG signals as privileged information to enhance the model's representation capability.

What carries the argument

Consistent privileged learning (CPL) strategy that supplies ground-truth rPPG signals exclusively during training to strengthen single-frame inference representations.

If this is right

StreamPPG achieves state-of-the-art accuracy across multiple datasets.
It maintains real-time throughput on edge devices.
It removes the multi-second delay inherent in clip-wise rPPG while avoiding the accuracy drop of conventional frame-wise methods.
The same training approach supports continuous contact-free health monitoring on resource-constrained hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The CPL pattern could be tested on other periodic video signals such as respiration rate estimation.
Mobile health applications could incorporate the model to deliver continuous vital-sign feedback without specialized sensors.
Further experiments on skin-tone diversity would clarify whether the learned representations remain stable across demographic groups.
The approach might reduce buffer requirements in other real-time video analysis pipelines that currently rely on batch processing.

Load-bearing premise

Ground-truth rPPG signals used only at training time will produce features that generalize to new videos without those signals and without overfitting to the training distribution.

What would settle it

A cross-dataset evaluation in which StreamPPG error exceeds that of a standard clip-wise baseline under changed lighting, camera, or subject demographics.

Figures

Figures reproduced from arXiv: 2606.23186 by Hui-Liang Shen, Si-Yuan Cao, Xiaohan Zhang, Xiaokai Bai, Yihan Yang, Yiming Li, Yuanhui Hu, Yuguang Chu, Zhe Wu.

**Figure 2.** Figure 2: Overview of the StreamPPG inference pipeline. The model receives the [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the StreamPPG training with consistent privileged learning (CPL) strategy. During training, privileged information is introduced through [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of facial attention maps. (a) The model trained with the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: Bland-Altman plots and scatter plots. (a) Training with PURE and [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Representative rPPG signal visualizations from intra-dataset evaluation on the PURE, UBFC, COHFACE and MMPD datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: HR estimation errors on the UBFC dataset under different video [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Effect of λ on the MMPD dataset. Performance peaks at λ = 0.8, which provides the best balance between privileged supervision and consistency regularization. TABLE VI ABLATION STUDY ON THE ENCODER OF STREAMPPG ON THE MMPD DATASET. “CA” DENOTES CHANNEL ATTENTION. Encoder MAE RMSE MAPE ρ w/o CA 3.77 9.35 3.92 0.74 w/ CA 3.39 8.35 3.50 0.78 and poor generalization in signal-free inference, confirming the exis… view at source ↗

read the original abstract

Remote photoplethysmography (rPPG) estimates the blood volume pulse (BVP) signal from facial videos, enabling contact-free health monitoring. Conventional clip-wise approaches, which use video clips as input, require capturing over one hundred frames before inference, thus introducing several seconds of delay and hindering real-time use. Meanwhile, frame-wise approaches struggle to capture long-range temporal and periodic features of physiological rhythms, and therefore lead to reduced estimation accuracy. To overcome these issues, we propose StreamPPG, a unified architecture that enables low-latency frame-wise physiological signal estimation while achieving competitive accuracy compared with clip-wise approaches. StreamPPG is trained under a consistent privileged learning (CPL) strategy, which leverages ground-truth rPPG signals as privileged information to enhance the model's representation capability. Extensive experiments demonstrate that StreamPPG achieves state-of-the-art accuracy across multiple datasets while maintaining real-time throughput on edge devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StreamPPG tries to fix the latency-accuracy tradeoff in rPPG with a CPL training trick, but the abstract supplies no numbers or ablations to show the video-only model actually generalizes.

read the letter

The paper introduces StreamPPG, a frame-wise architecture trained with consistent privileged learning so that ground-truth BVP signals help at training time only. The goal is real-time inference on edge devices without the multi-second delay of clip-wise methods or the accuracy drop of plain frame-wise ones.

The new piece is the CPL strategy itself, which treats the privileged rPPG as a consistent signal to shape the video encoder. That framing is not just a restatement of earlier privileged-learning work in rPPG; it is aimed squarely at the streaming constraint.

The paper states the problem cleanly and sketches a unified model that could matter for mobile health or HCI. The claim that privileged information is dropped at inference is also stated without circularity.

The soft spot is exactly the one in the stress-test note. The abstract asserts SOTA accuracy and real-time throughput across datasets, yet gives zero quantitative results, baselines, error bars, or ablation numbers. Without those, or without cross-dataset transfer tests, it is impossible to tell whether the learned features actually survive removal of the privileged branch or whether they exploit dataset-specific video-BVP correlations. The full manuscript is said to contain extensive experiments, but the provided text does not show them, so the central generalization claim stays unverified.

This work is aimed at people building contact-free physiological monitors who need low latency. A reader already working on rPPG might pick up the architecture idea, but anyone wanting to cite the accuracy numbers would need the full results first.

It is worth sending to peer review because the latency problem is real and the proposed training approach is distinct enough to merit referee scrutiny, even if the current evidence is thin.

Referee Report

2 major / 0 minor

Summary. The paper proposes StreamPPG, a unified frame-wise architecture for remote photoplethysmography (rPPG) that uses consistent privileged learning (CPL) to incorporate ground-truth BVP signals only at training time. This enables low-latency inference from individual frames while claiming to match or exceed the accuracy of conventional clip-wise methods. The abstract states that extensive experiments show state-of-the-art accuracy across multiple datasets together with real-time throughput on edge devices.

Significance. If the central claim holds, StreamPPG would address a practical bottleneck in contact-free physiological monitoring by delivering both low latency and competitive accuracy, potentially enabling real-time applications on resource-constrained devices. The CPL strategy is a standard privileged-information technique; its value here would lie in empirical demonstration that video-only features learned under this regime generalize without overfitting to training-domain video-BVP correlations.

major comments (2)

[Experiments] Experiments section (and any associated tables/figures): the manuscript claims SOTA accuracy and real-time performance but the provided abstract supplies no quantitative numbers, baseline comparisons, error bars, dataset details, or ablation studies. Without these, the central claim that CPL produces video-only features whose accuracy matches clip-wise baselines cannot be evaluated.
[Method] § on method / CPL strategy: the assumption that ground-truth rPPG signals used only at training produce representations that generalize to unseen test videos is load-bearing, yet no cross-dataset transfer results or ablation removing the privileged branch are described. If dataset-specific correlations between video and BVP are learned, the reported performance would not transfer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying the content of the Experiments and Method sections while outlining targeted revisions for improved clarity.

read point-by-point responses

Referee: [Experiments] Experiments section (and any associated tables/figures): the manuscript claims SOTA accuracy and real-time performance but the provided abstract supplies no quantitative numbers, baseline comparisons, error bars, dataset details, or ablation studies. Without these, the central claim that CPL produces video-only features whose accuracy matches clip-wise baselines cannot be evaluated.

Authors: We agree that the abstract, as a concise summary, does not include specific quantitative numbers. However, the Experiments section of the manuscript contains detailed tables and figures reporting SOTA comparisons against multiple baselines, error bars across runs, full dataset specifications, and ablation studies on the CPL components. These results directly support that the video-only inference matches clip-wise accuracy. In the revision we will add a brief summary of key metrics (e.g., MAE and throughput) to the abstract for immediate accessibility. revision: partial
Referee: [Method] § on method / CPL strategy: the assumption that ground-truth rPPG signals used only at training produce representations that generalize to unseen test videos is load-bearing, yet no cross-dataset transfer results or ablation removing the privileged branch are described. If dataset-specific correlations between video and BVP are learned, the reported performance would not transfer.

Authors: The manuscript already reports cross-dataset transfer experiments (training on one dataset and testing on others) as well as ablations that disable the privileged BVP branch at training time. These appear in the Experiments section and confirm that the learned video representations generalize without overfitting to training-domain video-BVP correlations. We will add explicit forward references from the Method section to these results to make the generalization evidence more prominent. revision: no

Circularity Check

0 steps flagged

No circularity; derivation is self-contained empirical method

full rationale

The paper describes a standard neural architecture for rPPG estimation trained with a privileged branch that receives ground-truth BVP signals exclusively at training time. The central claim is an empirical performance result (SOTA accuracy + real-time inference) obtained via supervised optimization on labeled datasets; no equation or derivation is shown to reduce by construction to its own fitted parameters or to a self-citation chain. The inference procedure is explicitly defined to operate without the privileged signal, making the generalization claim falsifiable by cross-dataset or ablation experiments rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone; the method description does not introduce new physical quantities or unstated mathematical assumptions beyond standard supervised learning.

pith-pipeline@v0.9.1-grok · 5718 in / 1065 out tokens · 18983 ms · 2026-06-26T09:12:47.970372+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 1 linked inside Pith

[1]

Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002

Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002

2002
[2]

Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019

Serge Bobbia, Richard Macwan, Yannick Benezeth, Alamin Mansouri, and Julien Dubois. Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019

2019
[3]

Bj¨orn Braun, Daniel McDuff, and Christian Holz. How suboptimal is training rPPG models with videos and targets from different body sites? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 410–418, 2024

2024
[4]

Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces

Constantino Alvarez Casado and Miguel Bordallo L ´opez. Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces. IEEE Journal of Biomedical and Health Informatics, 27(11):5530–5541, 2023

2023
[5]

DeepPhys: Video-based physiological measurement using convolutional attention networks

Weixuan Chen and Daniel McDuff. DeepPhys: Video-based physiological measurement using convolutional attention networks. InProceedings of the European Conference on Computer Vision, pages 349–365, 2018

2018
[6]

Juan Cheng, Ping Wang, Rencheng Song, Yu Liu, Chang Li, Yong Liu, and Xun Chen. Remote heart rate measurement from near-infrared videos based on joint blind source separation with delay-coordinate transformation.IEEE Transactions on Instrumentation and Measurement, 70:1–13, 2020

2020
[7]

Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024

Tri Dao and Albert Gu. Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024

2024
[8]

Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013

Gerard De Haan and Vincent Jeanne. Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013

2013
[9]

Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014

Gerard De Haan and Arno Van Leest. Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014

1913
[10]

A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017

Guillaume Heusch, Andr ´e Anjos, and S ´ebastien Marcel. A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017

Pith/arXiv arXiv 2017
[11]

ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021

Min Hu, Fei Qian, Dong Guo, Xiaohua Wang, Lei He, and Fuji Ren. ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021

2021
[12]

Titong Jiang, Yuan Ma, Jiaqi Li, Qing Dong, Xuewu Ji, and Yahui Liu. LSTS: Periodicity learning via long short-term temporal shift for remote physiological measurement.IEEE Transactions on Circuits and Systems for Video Technology, 35(7):6452–6465, 2025

2025
[13]

Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024

Jitesh Joshi, Sos S Agaian, and Youngjun Cho. Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024

arXiv 2024
[14]

Learning motion-robust remote photoplethysmography through arbitrary resolution videos

Jianwei Li, Zitong Yu, and Jingang Shi. Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1334–1342, 2023

2023
[15]

Contactless pulse estimation leveraging pseudo labels and self-supervision

Zhihua Li and Lijun Yin. Contactless pulse estimation leveraging pseudo labels and self-supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20588–20597, 2023

2023
[16]

Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025

Mingxuan Liu, Jiankai Tang, Yongli Chen, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Jie Gan, Yuntao Wang, and Hong Chen. Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025

2025
[17]

A general remote photoplethysmography estimator with spatiotemporal convolutional network

Si-Qi Liu and Pong C Yuen. A general remote photoplethysmography estimator with spatiotemporal convolutional network. InProceedings of the International Conference on Automatic Face and Gesture Recognition, pages 481–488. IEEE, 2020

2020
[18]

Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020

Xin Liu, Josh Fromm, Shwetak Patel, and Daniel McDuff. Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020

2020
[19]

Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement

Xin Liu, Brian Hill, Ziheng Jiang, Shwetak Patel, and Daniel McDuff. Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement. InProceedings of the Winter Conference on Applications of Computer Vision, pages 5008–5017, 2023

2023
[20]

rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023

Xin Liu, Girish Narayanswamy, Akshay Paruchuri, Xiaoyu Zhang, Jiankai Tang, Yuzhe Zhang, Roni Sengupta, Shwetak Patel, Yuntao Wang, and Daniel McDuff. rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023

2023
[21]

Dual-gan: Joint BVP and noise modeling for remote physiological measurement

Hao Lu, Hu Han, and S Kevin Zhou. Dual-gan: Joint BVP and noise modeling for remote physiological measurement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12404–12413, 2021

2021
[22]

A vector-contraction inequality for rademacher complexities

Andreas Maurer. A vector-contraction inequality for rademacher complexities. InProceedings of the International Conference on Algorithmic Learning Theory, pages 3–17. Springer, 2016

2016
[23]

Video-based remote physiological measurement via cross-verified feature disentangling

Xuesong Niu, Zitong Yu, Hu Han, Xiaobai Li, Shiguang Shan, and Guoying Zhao. Video-based remote physiological measurement via cross-verified feature disentangling. InProceedings of the European Conference on Computer Vision, pages 295–310. Springer, 2020

2020
[24]

On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010

Dmitry Pechyony and Vladimir Vapnik. On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010

2010
[25]

Local group invariance for heart rate estimation from face videos in the wild

Christian S Pilz, Sebastian Zaunseder, Jarek Krajewski, and Vladimir Blazek. Local group invariance for heart rate estimation from face videos in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1254–1262, 2018

2018
[26]

Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010

Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010

2010
[27]

Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025

Marko Savic and Guoying Zhao. Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025
[28]

Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023

Hang Shao, Lei Luo, Jianjun Qian, Shuo Chen, Chuanfei Hu, and Jian Yang. Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023

2023
[29]

PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021

Rencheng Song, Huan Chen, Juan Cheng, Chang Li, Yu Liu, and Xun Chen. PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021

2021
[30]

Non- contrastive unsupervised learning of physiological signals from video

Jeremy Speth, Nathan Vance, Patrick Flynn, and Adam Czajka. Non- contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14464–14474, 2023

2023
[31]

Visual heart rate estimation with convolutional neural network

Radim ˇSpetl´ık, V ojtech Franc, and Jir´ı Matas. Visual heart rate estimation with convolutional neural network. InProceedings of the British Machine Vision Conference, pages 3–6, 2018

2018
[32]

Non-contact video-based pulse rate measurement on a mobile service robot

Ronny Stricker, Steffen M ¨uller, and Horst-Michael Gross. Non-contact video-based pulse rate measurement on a mobile service robot. In The IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062. IEEE, 2014

2014
[33]

Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015

Yu Sun and Nitish Thakor. Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015

2015
[34]

Sun and X

Z. Sun and X. Li. Contrast-Phys+: Unsupervised and weakly-supervised video-based remote physiological measurement via spatiotemporal con- trast.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, 2024

2024
[35]

MMPD: Multi-domain mobile video physiology dataset

Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, and Xin Liu. MMPD: Multi-domain mobile video physiology dataset. InProceedings of the International Conference of the IEEE Engineering in Medicine & Biology Society, pages 1–5, 2023

2023
[36]

Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015

Vladimir Vapnik, Rauf Izmailov, et al. Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015. ARXIV 11

2023
[37]

A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009

Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009

2009
[38]

Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008

Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008

2008
[39]

Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016

Wenjin Wang, Albertus C Den Brinker, Sander Stuijk, and Gerard De Haan. Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016

2016
[40]

Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals

Bryan P Yan, William HS Lai, Christy KY Chan, Stephen Chun-Hin Chan, Lok-Hei Chan, Ka-Ming Lam, Ho-Wang Lau, Chak-Ming Ng, Lok- Yin Tai, Kin-Wai Yip, et al. Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals. Journal of the American Heart Association, 7(8):e008585, 2018

2018
[41]

Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks

Zitong Yu, Xiaobai Li, and Guoying Zhao. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. InProceedings of the British Machine Vision Conference, 2019

2019
[42]

Facial-video-based physio- logical signal measurement: Recent advances and affective applications

Zitong Yu, Xiaobai Li, and Guoying Zhao. Facial-video-based physio- logical signal measurement: Recent advances and affective applications. IEEE Signal Processing Magazine, 38(6):50–58, 2021

2021
[43]

Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement

Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, and Guoying Zhao. Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 151–160, 2019

2019
[44]

Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023

Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr, and Guoying Zhao. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023

2023
[45]

Physformer: Facial video-based physiological measurement with temporal difference transformer

Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip HS Torr, and Guoying Zhao. Physformer: Facial video-based physiological measurement with temporal difference transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4186–4196, 2022

2022
[46]

Dezhao Zhai, Wei Chen, Yinghao Ding, Ming Yu, Qinwei Li, and Hang Wu. Research on robust measurement method of heart rate using remote photoplethysmography based on adversarial learning network with high and low frequency features.IEEE Transactions on Circuits and Systems for Video Technology, 35(6):5208–5222, 2025

2025
[47]

Yizhu Zhang, Jingang Shi, Jiayin Wang, Yuan Zong, Wenming Zheng, and Guoying Zhao. MaskFusionNet: A dual-stream fusion model with masked pre-training mechanism for rPPG measurement.IEEE Transactions on Circuits and Systems for Video Technology, 34(11):11521–11534, 2024

2024
[48]

JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022

Changchen Zhao, Hongsheng Wang, Huiling Chen, Weiwei Shi, and Yuanjing Feng. JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022

2022
[49]

RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025

Bochao Zou, Zizheng Guo, Jiansheng Chen, Junbao Zhuo, Weiran Huang, and Huimin Ma. RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025

2025
[50]

Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment

Bochao Zou, Zizheng Guo, Xiaocheng Hu, and Huimin Ma. Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 11077–11085, 2025. APPENDIXA CONSISTENTPRIVILEGEDLEARNINGSTRATEGY A. Notation and Definitions V, z and sgt denote the input video, privile...

2025

[1] [1]

Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002

Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3(Nov):463–482, 2002

2002

[2] [2]

Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019

Serge Bobbia, Richard Macwan, Yannick Benezeth, Alamin Mansouri, and Julien Dubois. Unsupervised skin tissue segmentation for remote photoplethysmography.Pattern Recognition Letters, 124:82–90, 2019

2019

[3] [3]

Bj¨orn Braun, Daniel McDuff, and Christian Holz. How suboptimal is training rPPG models with videos and targets from different body sites? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 410–418, 2024

2024

[4] [4]

Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces

Constantino Alvarez Casado and Miguel Bordallo L ´opez. Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces. IEEE Journal of Biomedical and Health Informatics, 27(11):5530–5541, 2023

2023

[5] [5]

DeepPhys: Video-based physiological measurement using convolutional attention networks

Weixuan Chen and Daniel McDuff. DeepPhys: Video-based physiological measurement using convolutional attention networks. InProceedings of the European Conference on Computer Vision, pages 349–365, 2018

2018

[6] [6]

Juan Cheng, Ping Wang, Rencheng Song, Yu Liu, Chang Li, Yong Liu, and Xun Chen. Remote heart rate measurement from near-infrared videos based on joint blind source separation with delay-coordinate transformation.IEEE Transactions on Instrumentation and Measurement, 70:1–13, 2020

2020

[7] [7]

Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024

Tri Dao and Albert Gu. Transformers are SSMs: generalized models and efficient algorithms through structured state space duality.Proceedings of Machine Learning Research, 235:10041–10071, 2024

2024

[8] [8]

Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013

Gerard De Haan and Vincent Jeanne. Robust pulse rate from chrominance- based rPPG.IEEE Transactions on Biomedical Engineering, 60(10):2878– 2886, 2013

2013

[9] [9]

Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014

Gerard De Haan and Arno Van Leest. Improved motion robustness of remote-PPG by using the blood volume pulse signature.Physiological Measurement, 35(9):1913, 2014

1913

[10] [10]

A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017

Guillaume Heusch, Andr ´e Anjos, and S ´ebastien Marcel. A repro- ducible study on remote heart rate measurement.arXiv preprint arXiv:1709.00962, 2017

Pith/arXiv arXiv 2017

[11] [11]

ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021

Min Hu, Fei Qian, Dong Guo, Xiaohua Wang, Lei He, and Fuji Ren. ETA- rPPGNet: Effective time-domain attention network for remote heart rate measurement.IEEE Transactions on Instrumentation and Measurement, 70:1–12, 2021

2021

[12] [12]

Titong Jiang, Yuan Ma, Jiaqi Li, Qing Dong, Xuewu Ji, and Yahui Liu. LSTS: Periodicity learning via long short-term temporal shift for remote physiological measurement.IEEE Transactions on Circuits and Systems for Video Technology, 35(7):6452–6465, 2025

2025

[13] [13]

Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024

Jitesh Joshi, Sos S Agaian, and Youngjun Cho. Factorizephys: Matrix factorization for multidimensional attention in remote physiological sensing.arXiv preprint arXiv:2411.01542, 2024

arXiv 2024

[14] [14]

Learning motion-robust remote photoplethysmography through arbitrary resolution videos

Jianwei Li, Zitong Yu, and Jingang Shi. Learning motion-robust remote photoplethysmography through arbitrary resolution videos. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 1334–1342, 2023

2023

[15] [15]

Contactless pulse estimation leveraging pseudo labels and self-supervision

Zhihua Li and Lijun Yin. Contactless pulse estimation leveraging pseudo labels and self-supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20588–20597, 2023

2023

[16] [16]

Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025

Mingxuan Liu, Jiankai Tang, Yongli Chen, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Jie Gan, Yuntao Wang, and Hong Chen. Spiking- physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer.Neural Networks, 185:107128, 2025

2025

[17] [17]

A general remote photoplethysmography estimator with spatiotemporal convolutional network

Si-Qi Liu and Pong C Yuen. A general remote photoplethysmography estimator with spatiotemporal convolutional network. InProceedings of the International Conference on Automatic Face and Gesture Recognition, pages 481–488. IEEE, 2020

2020

[18] [18]

Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020

Xin Liu, Josh Fromm, Shwetak Patel, and Daniel McDuff. Multi- task temporal shift attention networks for on-device contactless vitals measurement.Advances in Neural Information Processing Systems, 33:19400–19411, 2020

2020

[19] [19]

Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement

Xin Liu, Brian Hill, Ziheng Jiang, Shwetak Patel, and Daniel McDuff. Efficientphys: Enabling simple, fast and accurate camera-based cardiac measurement. InProceedings of the Winter Conference on Applications of Computer Vision, pages 5008–5017, 2023

2023

[20] [20]

rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023

Xin Liu, Girish Narayanswamy, Akshay Paruchuri, Xiaoyu Zhang, Jiankai Tang, Yuzhe Zhang, Roni Sengupta, Shwetak Patel, Yuntao Wang, and Daniel McDuff. rppg-toolbox: Deep remote ppg toolbox.Advances in Neural Information Processing Systems, 36:68485–68510, 2023

2023

[21] [21]

Dual-gan: Joint BVP and noise modeling for remote physiological measurement

Hao Lu, Hu Han, and S Kevin Zhou. Dual-gan: Joint BVP and noise modeling for remote physiological measurement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12404–12413, 2021

2021

[22] [22]

A vector-contraction inequality for rademacher complexities

Andreas Maurer. A vector-contraction inequality for rademacher complexities. InProceedings of the International Conference on Algorithmic Learning Theory, pages 3–17. Springer, 2016

2016

[23] [23]

Video-based remote physiological measurement via cross-verified feature disentangling

Xuesong Niu, Zitong Yu, Hu Han, Xiaobai Li, Shiguang Shan, and Guoying Zhao. Video-based remote physiological measurement via cross-verified feature disentangling. InProceedings of the European Conference on Computer Vision, pages 295–310. Springer, 2020

2020

[24] [24]

On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010

Dmitry Pechyony and Vladimir Vapnik. On the theory of learning with privileged information.Advances in Neural Information Processing Systems, 23, 2010

2010

[25] [25]

Local group invariance for heart rate estimation from face videos in the wild

Christian S Pilz, Sebastian Zaunseder, Jarek Krajewski, and Vladimir Blazek. Local group invariance for heart rate estimation from face videos in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1254–1262, 2018

2018

[26] [26]

Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010

Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.Optics Express, 18(10):10762–10774, 2010

2010

[27] [27]

Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025

Marko Savic and Guoying Zhao. Rs+rppg: Robust strongly self- supervised learning for rppg.IEEE Transactions on Circuits and Systems for Video Technology, 2025

2025

[28] [28]

Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023

Hang Shao, Lei Luo, Jianjun Qian, Shuo Chen, Chuanfei Hu, and Jian Yang. Tranphys: Spatiotemporal masked transformer steered remote photoplethysmography estimation.IEEE Transactions on Circuits and Systems for Video Technology, 34(4):3030–3042, 2023

2023

[29] [29]

PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021

Rencheng Song, Huan Chen, Juan Cheng, Chang Li, Yu Liu, and Xun Chen. PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography.IEEE Journal of Biomedical and Health Informatics, 25(5):1373–1384, 2021

2021

[30] [30]

Non- contrastive unsupervised learning of physiological signals from video

Jeremy Speth, Nathan Vance, Patrick Flynn, and Adam Czajka. Non- contrastive unsupervised learning of physiological signals from video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14464–14474, 2023

2023

[31] [31]

Visual heart rate estimation with convolutional neural network

Radim ˇSpetl´ık, V ojtech Franc, and Jir´ı Matas. Visual heart rate estimation with convolutional neural network. InProceedings of the British Machine Vision Conference, pages 3–6, 2018

2018

[32] [32]

Non-contact video-based pulse rate measurement on a mobile service robot

Ronny Stricker, Steffen M ¨uller, and Horst-Michael Gross. Non-contact video-based pulse rate measurement on a mobile service robot. In The IEEE International Symposium on Robot and Human Interactive Communication, pages 1056–1062. IEEE, 2014

2014

[33] [33]

Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015

Yu Sun and Nitish Thakor. Photoplethysmography revisited: from contact to noncontact, from point to imaging.IEEE Transactions on Biomedical Engineering, 63(3):463–477, 2015

2015

[34] [34]

Sun and X

Z. Sun and X. Li. Contrast-Phys+: Unsupervised and weakly-supervised video-based remote physiological measurement via spatiotemporal con- trast.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, 2024

2024

[35] [35]

MMPD: Multi-domain mobile video physiology dataset

Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, and Xin Liu. MMPD: Multi-domain mobile video physiology dataset. InProceedings of the International Conference of the IEEE Engineering in Medicine & Biology Society, pages 1–5, 2023

2023

[36] [36]

Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015

Vladimir Vapnik, Rauf Izmailov, et al. Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(1):2023–2049, 2015. ARXIV 11

2023

[37] [37]

A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009

Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information.Neural networks, 22(5-6):544–557, 2009

2009

[38] [38]

Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008

Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson. Remote plethys- mographic imaging using ambient light.Optics Express, 16(26):21434– 21445, 2008

2008

[39] [39]

Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016

Wenjin Wang, Albertus C Den Brinker, Sander Stuijk, and Gerard De Haan. Algorithmic principles of remote PPG.IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016

2016

[40] [40]

Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals

Bryan P Yan, William HS Lai, Christy KY Chan, Stephen Chun-Hin Chan, Lok-Hei Chan, Ka-Ming Lam, Ho-Wang Lau, Chak-Ming Ng, Lok- Yin Tai, Kin-Wai Yip, et al. Contact-free screening of atrial fibrillation by a smartphone using facial pulsatile photoplethysmographic signals. Journal of the American Heart Association, 7(8):e008585, 2018

2018

[41] [41]

Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks

Zitong Yu, Xiaobai Li, and Guoying Zhao. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. InProceedings of the British Machine Vision Conference, 2019

2019

[42] [42]

Facial-video-based physio- logical signal measurement: Recent advances and affective applications

Zitong Yu, Xiaobai Li, and Guoying Zhao. Facial-video-based physio- logical signal measurement: Recent advances and affective applications. IEEE Signal Processing Magazine, 38(6):50–58, 2021

2021

[43] [43]

Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement

Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, and Guoying Zhao. Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 151–160, 2019

2019

[44] [44]

Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023

Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr, and Guoying Zhao. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer.International Journal of Computer Vision, 131(6):1307– 1330, 2023

2023

[45] [45]

Physformer: Facial video-based physiological measurement with temporal difference transformer

Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip HS Torr, and Guoying Zhao. Physformer: Facial video-based physiological measurement with temporal difference transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4186–4196, 2022

2022

[46] [46]

Dezhao Zhai, Wei Chen, Yinghao Ding, Ming Yu, Qinwei Li, and Hang Wu. Research on robust measurement method of heart rate using remote photoplethysmography based on adversarial learning network with high and low frequency features.IEEE Transactions on Circuits and Systems for Video Technology, 35(6):5208–5222, 2025

2025

[47] [47]

Yizhu Zhang, Jingang Shi, Jiayin Wang, Yuan Zong, Wenming Zheng, and Guoying Zhao. MaskFusionNet: A dual-stream fusion model with masked pre-training mechanism for rPPG measurement.IEEE Transactions on Circuits and Systems for Video Technology, 34(11):11521–11534, 2024

2024

[48] [48]

JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022

Changchen Zhao, Hongsheng Wang, Huiling Chen, Weiwei Shi, and Yuanjing Feng. JAMSNet: A remote pulse extraction network based on joint attention and multi-scale fusion.IEEE Transactions on Circuits and Systems for Video Technology, 33(6):2783–2797, 2022

2022

[49] [49]

RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025

Bochao Zou, Zizheng Guo, Jiansheng Chen, Junbao Zhuo, Weiran Huang, and Huimin Ma. RhythmFormer: Extracting patterned rPPG signals based on periodic sparse attention.Pattern Recognition, 164:111511, 2025

2025

[50] [50]

Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment

Bochao Zou, Zizheng Guo, Xiaocheng Hu, and Huimin Ma. Rhythm- Mamba: Fast, Lightweight, and Accurate Remote Physiological Measure- ment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 11077–11085, 2025. APPENDIXA CONSISTENTPRIVILEGEDLEARNINGSTRATEGY A. Notation and Definitions V, z and sgt denote the input video, privile...

2025