Computer Vision Estimation of Emotion Reaction Intensity in the Wild

Ali Kargarandehkordi; Dennis Paul Wall; Mohammadmahdi Honarmand; Onur Cezmi Mutlu; Peter Washington; Saimourya Surabhi; Yang Qian

arxiv: 2303.10741 · v2 · pith:PH3S7HJ6new · submitted 2023-03-19 · 💻 cs.CV · cs.LG

Computer Vision Estimation of Emotion Reaction Intensity in the Wild

Yang Qian , Ali Kargarandehkordi , Onur Cezmi Mutlu , Saimourya Surabhi , Mohammadmahdi Honarmand , Dennis Paul Wall , Peter Washington This is my paper

classification 💻 cs.CV cs.LG

keywords emotionreactionintensitymodelaffectivecomputeremotionalemotions

0 comments

read the original abstract

Emotions play an essential role in human communication. Developing computer vision models for automatic recognition of emotion expression can aid in a variety of domains, including robotics, digital behavioral healthcare, and media analytics. There are three types of emotional representations which are traditionally modeled in affective computing research: Action Units, Valence Arousal (VA), and Categorical Emotions. As part of an effort to move beyond these representations towards more fine-grained labels, we describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge in the 5th competition for Affective Behavior Analysis in-the-Wild (ABAW). We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity. Our best performing model on the Hume-Reaction dataset achieved an average Pearson correlation coefficient of 0.4080 on the test set using a pre-trained ResNet50 model. This work provides a first step towards the development of production-grade models which predict emotion reaction intensities rather than discrete emotion categories.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos
cs.CV 2026-06 unverdicted novelty 6.0

Fine-tuning Gemini 2.5 Pro with LoRA on 400 home videos improves per-feature agreement with clinicians by 40% and zero-shot ASD diagnosis F1 by 53% on held-out data, with classifier pipelines reaching 77% accuracy.
Facial Expression Recognition in the Deep Learning Era: A Systematic Multi-Criteria Review of Methods, Models, Datasets, Performance, Challenges, and Future Research Directions
cs.CV 2026-06 unverdicted novelty 4.0

This survey organizes deep learning FER literature into five evolutionary phases and a seven-criteria taxonomy, compares datasets and performance, and outlines challenges.