Recognition: unknown
Adaptive Physical-Facial Representation Fusion via Subject-Invariant Cross-Modal Prompt Tuning for Video-Based Emotion Recognition
Pith reviewed 2026-05-08 15:04 UTC · model grok-4.3
The pith
Subject-invariant cross-modal prompt tuning fuses rPPG time-frequency representations with facial tokens in a frozen ViT to improve video emotion recognition and generalization across people.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that rPPG waveforms can be turned into noise-robust time-frequency representations from which modality-complementary prompts are generated to modulate facial tokens inside a frozen Vision Transformer, while a decoupled shared-specific adapter placed in each ViT layer explicitly separates subject-shared and subject-specific components; this combination enables effective cross-modal interaction, preserves generalizable facial representations, and delivers higher recognition accuracy plus better generalization on the MAHNOB-HCI and DEAP benchmarks.
What carries the argument
The subject-invariant cross-modal prompt tuning mechanism that creates modality-complementary prompts from rPPG time-frequency representations to adapt frozen ViT facial tokens, together with the decoupled shared-specific adapter (DSSA) that separates shared and specific components inside each transformer layer.
Load-bearing premise
Time-frequency representations of rPPG stay noise-robust enough that the generated prompts and the adapter can reliably isolate subject-shared from subject-specific components without labels at inference time or any change to the frozen backbone.
What would settle it
Run the full method against the same baselines on MAHNOB-HCI and DEAP with held-out subjects; if accuracy and generalization gaps disappear when the prompts or DSSA are removed, or if added rPPG noise collapses performance, the central claim is refuted.
Figures
read the original abstract
Emotion recognition from facial videos enables non-contact inference of human emotional states. Although facial expressions are widely used cues, they cannot fully reflect intrinsic affective states. Remote photoplethysmography (rPPG) provides complementary physiological information, but it is highly susceptible to noise and inter-subject variability, limiting generalization to unseen individuals. Existing multimodal methods combine facial and rPPG features, yet their fusion strategies often disrupt pretrained facial representations and lack explicit mechanisms to suppress subject-specific variations. To address these issues, we propose a subject-invariant cross-modal prompt-tuning framework for video-based emotion recognition. Specifically, rPPG waveforms are transformed into noise-robust time-frequency representations (TFRs), from which modality-complementary prompts are generated to modulate facial tokens within a frozen Vision Transformer (ViT). This design enables effective cross-modal interaction while preserving the generalizable facial representations learned by the pretrained backbone. In addition, we introduce a decoupled shared-specific adapter (DSSA) into each ViT layer to explicitly separate subject-shared and subject-specific components, thereby improving cross-subject generalization. Experiments on the MAHNOB-HCI and DEAP benchmarks demonstrate that the proposed method consistently outperforms strong baselines in both recognition accuracy and generalization ability, highlighting its effectiveness for video-based emotion recognition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a subject-invariant cross-modal prompt-tuning framework for video-based emotion recognition. rPPG waveforms are converted to noise-robust time-frequency representations (TFRs) from which modality-complementary prompts are generated to modulate facial tokens inside a frozen ViT backbone. A decoupled shared-specific adapter (DSSA) is inserted into each ViT layer to explicitly separate subject-shared from subject-specific components. Experiments on MAHNOB-HCI and DEAP are reported to show consistent gains in recognition accuracy and cross-subject generalization over strong baselines.
Significance. If the reported gains are reproducible and the ablations confirm that prompt modulation and DSSA preserve the frozen ViT representations while improving invariance, the work would offer a practical route to multimodal fusion that avoids catastrophic forgetting of pretrained features and does not require subject labels at test time. This could be relevant for non-contact affective monitoring where subject generalization remains a bottleneck.
major comments (2)
- [§5.2] §5.2, Table 3: the cross-subject generalization results on DEAP report accuracy improvements of 3–5 % over the strongest baseline, yet no per-subject variance, confidence intervals, or paired statistical tests are provided; without these the claim that DSSA reliably improves generalization cannot be evaluated as load-bearing.
- [§4.3] §4.3, Eq. (7)–(9): the DSSA formulation separates shared and specific adapters via an explicit orthogonality term, but the paper does not report the sensitivity of final accuracy to the weighting hyper-parameter λ; if performance collapses for λ outside a narrow range the separation mechanism is not robust.
minor comments (2)
- [Abstract] The abstract asserts outperformance without any numerical values; adding the key accuracy figures and dataset splits would make the contribution summary self-contained.
- [Figure 2] Figure 2 (DSSA diagram) uses inconsistent arrow styles for the shared versus specific paths; a single consistent notation would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§5.2] §5.2, Table 3: the cross-subject generalization results on DEAP report accuracy improvements of 3–5 % over the strongest baseline, yet no per-subject variance, confidence intervals, or paired statistical tests are provided; without these the claim that DSSA reliably improves generalization cannot be evaluated as load-bearing.
Authors: We agree that reporting per-subject variance, confidence intervals, and statistical tests is necessary to substantiate the generalization claims. In the revised manuscript we will expand Table 3 to include per-subject standard deviations, 95 % confidence intervals (via subject-wise bootstrapping), and paired t-test p-values comparing our method to the strongest baseline on DEAP. These additions will allow direct evaluation of whether the observed 3–5 % gains are reliable across subjects. revision: yes
-
Referee: [§4.3] §4.3, Eq. (7)–(9): the DSSA formulation separates shared and specific adapters via an explicit orthogonality term, but the paper does not report the sensitivity of final accuracy to the weighting hyper-parameter λ; if performance collapses for λ outside a narrow range the separation mechanism is not robust.
Authors: We appreciate the referee highlighting the need to verify robustness with respect to λ. In the revised version we will add an ablation subsection that sweeps λ across [0.01, 10] on DEAP and reports the resulting accuracy curves. This will demonstrate that the orthogonality constraint yields stable performance over a practical range of λ without requiring per-dataset retuning. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a new architectural framework (TFR conversion of rPPG, modality-complementary prompts modulating a frozen ViT, and DSSA for shared/specific separation) without any equations, derivations, or parameter-fitting steps that reduce by construction to the inputs. Performance claims rest on benchmark experiments rather than tautological predictions or self-citation chains. No load-bearing mathematical reduction or ansatz smuggling is present in the provided text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Emotion and decision making,
J. S. Lerner, Y . Li, P. Valdesolo, and K. S. Kassam, “Emotion and decision making,”Annu. Rev. Psychol., vol. 66, no. 1, pp. 799–823, 2015
2015
-
[2]
R. W. Picard,Affective computing. MIT press, 2000
2000
-
[3]
Hypercomplex neu- ral network and cross-modal attention for multi-modal emotion recognition using physiological signals,
X. Xu, J. Chen, C. Fu, and Z. Lyu, “Hypercomplex neu- ral network and cross-modal attention for multi-modal emotion recognition using physiological signals,”IEEE Trans. Affect. Comput., vol. 16, no. 4, pp. 3523–3536, 2025
2025
-
[4]
Emotion recognition based on galvanic skin response and pho- toplethysmography signals using artificial intelligence algorithms,
M. F. Bamonte, M. Risk, and V . Herrero, “Emotion recognition based on galvanic skin response and pho- toplethysmography signals using artificial intelligence algorithms,” inCongreso Argentino de Bioingenier ´ıa, 2023, pp. 23–35
2023
-
[5]
Heart rate estimation from facial videos us- ing a spatial-temporal representation with convolutional neural networks,
R. Song, S. Zhang, C. Li, Y . Zhang, J. Cheng, and X. Chen, “Heart rate estimation from facial videos us- ing a spatial-temporal representation with convolutional neural networks,”IEEE Trans. Instrum. Meas., vol. 69, no. 10, pp. 7411–7421, 2020
2020
-
[6]
PhysFormer: Facial video-based physiological measure- ment with temporal difference transformer,
Z. Yu, Y . Shen, J. Shi, H. Zhao, P. Torr, and G. Zhao, “PhysFormer: Facial video-based physiological measure- ment with temporal difference transformer,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 4186–4196
2022
-
[7]
Motion-robust respiratory rate estimation from camera videos via fusing pixel movement and pixel intensity information,
J. Cheng, R. Liu, J. Li, R. Song, Y . Liu, and X. Chen, “Motion-robust respiratory rate estimation from camera videos via fusing pixel movement and pixel intensity information,”IEEE Trans. Instrum. Meas., vol. 72, pp. 1–11, 2023
2023
-
[8]
rPPG-MAE: Self-supervised pretraining with masked autoencoders for remote physiological measurements,
X. Liu, Y . Zhang, Z. Yu, H. Lu, H. Yue, and J. Yang, “rPPG-MAE: Self-supervised pretraining with masked autoencoders for remote physiological measurements,” IEEE Trans. Multimedia, vol. 26, pp. 7278–7293, 2024
2024
-
[9]
RhythmMamba: Fast, lightweight, and accurate remote physiological mea- surement,
B. Zou, Z. Guo, X. Hu, and H. Ma, “RhythmMamba: Fast, lightweight, and accurate remote physiological mea- surement,” inProc. AAAI Conf. Artif. Intell., vol. 39, no. 10, 2025, pp. 11 077–11 085. 12
2025
-
[10]
Remote heart rate variability for emo- tional state monitoring,
Y . Benezeth, P. Li, R. Macwan, K. Nakamura, R. Gomez, and F. Yang, “Remote heart rate variability for emo- tional state monitoring,” inProc. IEEE EMBS Int. Conf. Biomed. Health Inform. (BHI), 2018, pp. 1–4
2018
-
[11]
Remote photoplethys- mograph signal measurement from facial videos using spatio-temporal networks,
Z. Yu, X. Li, and G. Zhao, “Remote photoplethys- mograph signal measurement from facial videos using spatio-temporal networks,” inProc. Brit. Mach. Vision Conf. (BMVC), 2019, pp. 1–12
2019
-
[12]
CNN-LSTM for au- tomatic emotion recognition using contactless photo- plythesmographic signals,
W. Mellouk and W. Handouzi, “CNN-LSTM for au- tomatic emotion recognition using contactless photo- plythesmographic signals,”Biomed. Signal Process. Con- trol, vol. 85, p. 104907, 2023
2023
-
[13]
Video-based multimodal spontaneous emotion recogni- tion using facial expressions and physiological signals,
Y . Ouzar, F. Bousefsaf, D. Djeldjli, and C. Maaoui, “Video-based multimodal spontaneous emotion recogni- tion using facial expressions and physiological signals,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 2459–2468
2022
-
[14]
Recognizing, fast and slow: Complex emotion recognition with facial expression detection and remote physiological measurement,
Y .-C. Wu, L.-W. Chiu, C.-C. Lai, B.-F. Wu, and S. S. J. Lin, “Recognizing, fast and slow: Complex emotion recognition with facial expression detection and remote physiological measurement,”IEEE Trans. Affect. Com- put., vol. 14, no. 4, pp. 3177–3190, 2023
2023
-
[15]
End-to-end multimodal emotion recognition based on facial expressions and remote pho- toplethysmography signals,
J. Li and J. Peng, “End-to-end multimodal emotion recognition based on facial expressions and remote pho- toplethysmography signals,”IEEE J. Biomed. Health Inform., 2024
2024
-
[16]
SVD-guided multi- modal feature fusion for emotion recognition from facial videos,
J. Bao, J. Qian, and J. Yang, “SVD-guided multi- modal feature fusion for emotion recognition from facial videos,”IEEE Trans. Affect. Comput., vol. 16, no. 3, pp. 1705–1715, 2025
2025
-
[17]
Con- trastive learning of subject-invariant eeg representations for cross-subject emotion recognition,
X. Shen, X. Liu, X. Hu, D. Zhang, and S. Song, “Con- trastive learning of subject-invariant eeg representations for cross-subject emotion recognition,”IEEE Transac- tions on Affective Computing, vol. 14, no. 3, pp. 2496– 2511, 2023
2023
-
[18]
Generalizing to unseen domains: A survey on domain generalization,
J. Wang, C. Lan, C. Liu, Y . Ouyang, T. Qin, W. Lu, Y . Chen, W. Zeng, and P. S. Yu, “Generalizing to unseen domains: A survey on domain generalization,”IEEE Trans. Knowl. Data Eng., vol. 35, no. 8, pp. 8052–8072, 2023
2023
-
[19]
Plug-and-play do- main adaptation for cross-subject EEG-based emotion recognition,
L.-M. Zhao, X. Yan, and B.-L. Lu, “Plug-and-play do- main adaptation for cross-subject EEG-based emotion recognition,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2021, pp. 863–870
2021
-
[20]
Multi-level disentangling network for cross-subject emotion recognition based on multimodal physiological signals,
Z. Jia, F. Zhao, Y . Guo, H. Chen, T. Jiang, and B. Cen- ter, “Multi-level disentangling network for cross-subject emotion recognition based on multimodal physiological signals,” inProc. Int. Joint Conf. Artif. Intell. (IJCAI), 2024, pp. 3069–3077
2024
-
[21]
Mutual informa- tion disentanglement based domain adaptation model for EEG emotion recognition,
Z. Lyu, Z. Zuo, C. Chen, and Y . Fang, “Mutual informa- tion disentanglement based domain adaptation model for EEG emotion recognition,”IEEE Signal Process. Lett., vol. 32, pp. 3027–3031, 2025
2025
-
[22]
FDDGNet: An information bottleneck-inspired feature disentanglement network for cross-subject EEG- based emotion recognition,
Y . Yang, L. Duan, K. Hou, Z. Kang, X. Zhang, and B. Hu, “FDDGNet: An information bottleneck-inspired feature disentanglement network for cross-subject EEG- based emotion recognition,”Neurocomputing, vol. 668, p. 132368, 2026
2026
-
[23]
Multi- modal fusion of behavioral and physiological signals for enhanced emotion recognition via feature decoupling and knowledge transfer,
H. Gao, Z. Cai, X. Wang, M. Wu, and C. Liu, “Multi- modal fusion of behavioral and physiological signals for enhanced emotion recognition via feature decoupling and knowledge transfer,”IEEE J. Biomed. Health Inform., 2025
2025
-
[24]
Efficient domain generalization via common-specific low-rank de- composition,
V . Piratla, P. Netrapalli, and S. Sarawagi, “Efficient domain generalization via common-specific low-rank de- composition,” inProc. Int. Conf. Mach. Learn. (ICML), 2020, pp. 7728–7738
2020
-
[25]
Deep domain generalization via conditional invariant adversarial networks,
Y . Li, X. Tian, M. Gong, Y . Liu, T. Liu, K. Zhang, and D. Tao, “Deep domain generalization via conditional invariant adversarial networks,”Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 624–639, 2018
2018
-
[26]
Eeg- match: Learning with incomplete labels for semisu- pervised eeg-based cross-subject emotion recognition,
R. Zhou, W. Ye, Z. Zhang, Y . Luo, L. Zhang, L. Li, G. Huang, Y . Dong, Y .-T. Zhang, and Z. Liang, “Eeg- match: Learning with incomplete labels for semisu- pervised eeg-based cross-subject emotion recognition,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 7, pp. 12 991–13 005, 2025
2025
-
[27]
Emotion separation and recognition from a facial expression by generating the poker face with vision transformers,
J. Li, J. Nie, D. Guo, R. Hong, and M. Wang, “Emotion separation and recognition from a facial expression by generating the poker face with vision transformers,” IEEE Transactions on Computational Social Systems, vol. 12, no. 4, pp. 1548–1562, 2025
2025
-
[28]
A comparison of emotion recognition system using electro- cardiogram (ECG) and photoplethysmogram (PPG),
S. N. M. S. Ismail, N. A. A. Aziz, and S. Z. Ibrahim, “A comparison of emotion recognition system using electro- cardiogram (ECG) and photoplethysmogram (PPG),”J. King Saud Univ.-Comput. Inform. Sci., vol. 34, no. 6, pp. 3539–3558, 2022
2022
-
[29]
Self-supervised ECG repre- sentation learning for emotion recognition,
P. Sarkar and A. Etemad, “Self-supervised ECG repre- sentation learning for emotion recognition,”IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1541–1554, 2020
2020
-
[30]
Dynamic confidence-aware multi-modal emotion recog- nition,
Q. Zhu, C. Zheng, Z. Zhang, W. Shao, and D. Zhang, “Dynamic confidence-aware multi-modal emotion recog- nition,”IEEE Trans. Affect. Comput., vol. 15, no. 3, pp. 1358–1370, 2024
2024
-
[31]
Noise-factorized disentangled representation learning for generalizable motor imagery EEG classification,
J. Han, X. Gu, G.-Z. Yang, and B. Lo, “Noise-factorized disentangled representation learning for generalizable motor imagery EEG classification,”IEEE J. Biomed. Health Inform., vol. 28, no. 2, pp. 765–776, 2023
2023
-
[32]
Grop: Graph orthogonal purification network for EEG emotion recognition,
M. Wu, C. P. Chen, B. Chen, and T. Zhang, “Grop: Graph orthogonal purification network for EEG emotion recognition,”IEEE Trans. Affect. Comput., 2024
2024
-
[33]
Video- based instantaneous heart rate measurement with en- hanced time-frequency representations,
J. Cheng, X. Luo, X. Wu, R. Song, and Y . Liu, “Video- based instantaneous heart rate measurement with en- hanced time-frequency representations,”IEEE Transac- tions on Multimedia, vol. 28, pp. 1289–1301, 2026
2026
-
[34]
Af- fectNet: A database for facial expression, valence, and arousal computing in the wild,
A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Af- fectNet: A database for facial expression, valence, and arousal computing in the wild,”IEEE Trans. Affect. Comput., vol. 10, no. 1, pp. 18–31, 2017
2017
-
[35]
From static to dynamic: Adapting landmark-aware image mod- els for facial expression recognition in videos,
Y . Chen, J. Li, S. Shan, M. Wang, and R. Hong, “From static to dynamic: Adapting landmark-aware image mod- els for facial expression recognition in videos,”IEEE Trans. Affect. Comput., vol. 16, no. 2, pp. 624–638, 2024
2024
-
[36]
Visual prompt multi-modal tracking,
J. Zhu, S. Lai, X. Chen, D. Wang, and H. Lu, “Visual prompt multi-modal tracking,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 9516– 13 9526
2023
-
[37]
Alfa: Attentive low-rank filter adaptation for structure-aware cross-domain personalized gaze estimation,
H.-Y . Hsieh, W.-T. M. Ting, and H. T. Kung, “Alfa: Attentive low-rank filter adaptation for structure-aware cross-domain personalized gaze estimation,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2026, pp. 17 481– 17 489
2026
-
[38]
ECG data compression using truncated singular value decom- position,
J.-J. Wei, C.-J. Chang, N.-K. Chou, and G.-J. Jan, “ECG data compression using truncated singular value decom- position,”IEEE Trans. Inf. Technol. Biomed., vol. 5, no. 4, pp. 290–299, 2001
2001
-
[39]
Emotion recognition using neighborhood components analysis and ECG/HRV-based features,
H. Ferdinando, T. Sepp ¨anen, and E. Alasaarela, “Emotion recognition using neighborhood components analysis and ECG/HRV-based features,” inProc. Int. Conf. Pattern Recognit. Appl. Methods (ICPRAM), 2017, pp. 99–113
2017
-
[40]
Emotion detection from ECG signals with different learning algorithms and automated feature engineering,
F. E. O ˘guz, A. Alkan, and T. Sch¨oler, “Emotion detection from ECG signals with different learning algorithms and automated feature engineering,”Signal Image Video Process., vol. 17, no. 7, pp. 3783–3791, 2023
2023
-
[41]
Bio-signal based multimodal fusion with bilinear model for emotion recognition,
A. Singh, T. Wittenberg, M.-M. Salman, N. Holzer, S. G ¨ob, J. Pahl, T. G ¨otz, and S. Sawant, “Bio-signal based multimodal fusion with bilinear model for emotion recognition,” in2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4834–4839
2023
-
[42]
Utiliz- ing deep learning towards multi-modal bio-sensing and vision-based affective computing,
Siddharth, T.-P. Jung, and T. J. Sejnowski, “Utiliz- ing deep learning towards multi-modal bio-sensing and vision-based affective computing,”IEEE Trans. Affect. Comput., vol. 13, no. 1, pp. 96–107, 2019
2019
-
[43]
MindLink-eumpy: an open-source python toolbox for multimodal emotion recognition,
R. Liet al., “MindLink-eumpy: an open-source python toolbox for multimodal emotion recognition,”Front. Hum. Neurosci., vol. 15, p. 621493, 2021
2021
-
[44]
NeuroSense: Short- term emotion recognition and understanding based on spiking neural network modelling of spatio-temporal EEG patterns,
C. Tan, M. ˇSarlija, and N. Kasabov, “NeuroSense: Short- term emotion recognition and understanding based on spiking neural network modelling of spatio-temporal EEG patterns,”Neurocomputing, vol. 434, pp. 137–148, 2021
2021
-
[45]
A unified biosensor–vision multi-modal transformer network for emotion recogni- tion,
K. Ali and C. E. Hughes, “A unified biosensor–vision multi-modal transformer network for emotion recogni- tion,”Biomed. Signal Process. Control, vol. 102, p. 107232, 2025
2025
-
[46]
DHCM-CACL: Dynamic hierarchical cross-modal mamba with confidence-adaptive contrastive learning for multimodal emotion recognition,
B. Wu and Y . Li, “DHCM-CACL: Dynamic hierarchical cross-modal mamba with confidence-adaptive contrastive learning for multimodal emotion recognition,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2026, pp. 2164–2172
2026
-
[47]
A multimodal database for affect recognition and implicit tagging,
M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multimodal database for affect recognition and implicit tagging,”IEEE Trans. Affect. Comput., vol. 3, no. 1, pp. 42–55, 2012
2012
-
[48]
DEAP: A database for emotion analysis using physi- ological signals,
S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yaz- dani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “DEAP: A database for emotion analysis using physi- ological signals,”IEEE Trans. Affect. Comput., vol. 3, no. 1, pp. 18–31, 2012
2012
-
[49]
MediaPipe: A Framework for Building Perception Pipelines
C. Lugaresiet al., “MediaPipe: A framework for building perception pipelines,”arXiv preprint arXiv:1906.08172, 2019
work page internal anchor Pith review arXiv 1906
-
[50]
Identity mappings in deep residual networks,
K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 630–645
2016
-
[51]
GCB-Net: Graph convolutional broad network and its application in emotion recognition,
T. Zhang, X. Wang, X. Xu, and C. P. Chen, “GCB-Net: Graph convolutional broad network and its application in emotion recognition,”IEEE Trans. Affect. Comput., vol. 13, no. 1, pp. 379–388, 2022
2022
-
[52]
Multi-modal emotion recognition using recurrence plots and transfer learning on physiological signals,
R. Elalamy, M. Fanourakis, and G. Chanel, “Multi-modal emotion recognition using recurrence plots and transfer learning on physiological signals,” inProc. Int. Conf. Affect. Comput. Intell. Interact. (ACII), 2021, pp. 1–7
2021
-
[53]
Uncertainty-aware graph contrastive fusion network for multimodal physiological signal emotion recognition,
G. Li, N. Chen, H. Zhu, J. Li, Z. Xu, and Z. Zhu, “Uncertainty-aware graph contrastive fusion network for multimodal physiological signal emotion recognition,” Neural Netw., vol. 187, p. 107363, 2025
2025
-
[54]
Heterogeneity-aware multi-modal physiological signal fusion strategy based on combined contrastive learning for emotion recognition,
Y . Tian, J. Li, N. Chen, G. Li, Z. Xu, H. Zhu, Y . Li, and Z. Zhu, “Heterogeneity-aware multi-modal physiological signal fusion strategy based on combined contrastive learning for emotion recognition,”Neural Networks, p. 108818, 2026
2026
-
[55]
Expression snippet transformer for ro- bust video-based facial expression recognition,
Y . Liu, W. Wang, C. Feng, H. Zhang, Z. Chen, and Y . Zhan, “Expression snippet transformer for ro- bust video-based facial expression recognition,”Pattern Recognit., vol. 138, p. 109368, 2023
2023
-
[56]
Continuous emotion recognition with audio-visual leader-follower attentive fusion,
S. Zhang, Y . Ding, Z. Wei, and C. Guan, “Continuous emotion recognition with audio-visual leader-follower attentive fusion,” inProc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), 2021, pp. 3560–3567
2021
-
[57]
DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,
X. Jiang, Y . Zong, W. Zheng, C. Tang, W. Xia, C. Lu, and J. Liu, “DFEW: A large-scale database for recognizing dynamic facial expressions in the wild,” inProc. 28th ACM Int. Conf. Multimedia (MM), 2020, pp. 2881–2889
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.