Recognition: unknown
Debiased Multimodal Personality Understanding through Dual Causal Intervention
Pith reviewed 2026-05-08 09:52 UTC · model grok-4.3
The pith
Dual causal interventions on observable demographics and latent mediators remove subject biases from multimodal personality predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct a Structural Causal Model to analyze the impact of subject attributes on personality understanding and propose the Dual Causal Adjustment Network (DCAN). DCAN consists of a Back-door Adjustment Causal Learning module that uses a prototype-based confounder dictionary to block spurious correlations from observable demographic factors and a Front-door Adjustment Causal Learning module that applies a learned mediator dictionary to address latent biases, thereby achieving causal disentanglement of representations for deconfounded reasoning.
What carries the argument
The Dual Causal Adjustment Network (DCAN) that performs back-door adjustment via a prototype-based confounder dictionary for observable demographics and front-door adjustment via a learned mediator dictionary for latent biases to produce deconfounded multimodal representations.
Load-bearing premise
The structural causal model correctly identifies all paths through which subject attributes influence the personality trait predictions.
What would settle it
An experiment on a dataset engineered so that demographic attributes are uncorrelated with personality labels, testing whether the reported fairness gains in equal opportunity and demographic parity disappear after the dual interventions.
Figures
read the original abstract
Multimodalpersonalityunderstandingplaysacriticalroleinhuman centered artificial intelligence. Previous work mainly focus on learn-ing rich multimodal representations for video personality under standing. However, they often suffer from potential harm caused by subject bias (e.g., observable age and unobservable mental states), as subjects originate from diverse demographic backgrounds. Learn ing such spurious associations between multimodal features and traits may lead to unfair personality understanding. In this work, weconstruct aStructural Causal Model (SCM)toanalyze theimpact of these biases from a causal perspective, and propose a novel Dual Causal Adjustment Network (DCAN) to mitigate the interference of subject attributes on personality understanding. Specifically, we design a Back-door Adjustment Causal Learning (BACL) module to block spurious correlations from observable demographic factors via a prototype-based confounder dictionary, and subsequently ap ply a Front-door Adjustment Causal Learning (FACL) module to ad dress latent and unobservable biases throughalearnedmediatordic tionary intervention, thereby achieving causal disentanglement of representations for deconfounded reasoning. Importantly, we con struct a Demographic-annotated Multimodal Student Personality (DMSP) dataset to support the analysis and discussion of fairness related factors. Extensive experiments on the benchmark dataset CFI-V2 and our DMSPdataset demonstrate that DCAN consistently improves prediction accuracy, reaching 92.11% and 92.90%, respec tively. Meanwhile, the improvementsinthefairnessmetricsofequal opportunity and demographic parity are 6.57% and 7.97% on CFI-V2, and 15.38% and 20.06% on the DMSP dataset. Our code and DMSP dataset are available at https://github.com/Sabrina-han/DCAN
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dual Causal Adjustment Network (DCAN) for debiased multimodal personality understanding. It constructs a Structural Causal Model (SCM) to analyze subject bias from observable demographics and unobservable mental states. DCAN includes Back-door Adjustment Causal Learning (BACL) via a prototype-based confounder dictionary to block spurious correlations, and Front-door Adjustment Causal Learning (FACL) via a learned mediator dictionary for latent biases, achieving causal disentanglement. A new Demographic-annotated Multimodal Student Personality (DMSP) dataset is introduced. Experiments on CFI-V2 and DMSP report accuracy of 92.11% and 92.90%, with fairness gains (equal opportunity and demographic parity) of 6.57%/7.97% and 15.38%/20.06%.
Significance. If the SCM fully captures confounding paths and the dictionary-based interventions validly implement back-door and front-door adjustments without artifacts, this could provide a principled causal approach to fairness in multimodal personality prediction, addressing both observable and latent biases. The DMSP dataset is a positive contribution for fairness research. The dual-intervention design is conceptually appealing for disentangling representations. However, the absence of independent causal validation, ablations, or statistical controls means the reported gains may not generalize or specifically stem from deconfounding, limiting immediate significance.
major comments (3)
- [SCM, BACL, and FACL sections] The SCM construction and BACL/FACL modules: the manuscript asserts that the prototype-based confounder dictionary and learned mediator dictionary perform valid back-door and front-door interventions that remove all subject-attribute confounding paths, but provides no do-calculus derivation, sensitivity analysis, or independent confounder-removal metric; validity is assessed solely via downstream accuracy/fairness on the training data, creating circularity between optimization and evaluation.
- [Experiments and results] Experimental results (implied by abstract claims of 92.11%/92.90% accuracy and fairness deltas): no ablation studies isolate the causal modules from capacity increases, no statistical tests or standard deviations are reported, and no capacity-matched baselines are compared, so it is unclear whether gains are due to deconfounding or other factors.
- [DMSP dataset section] DMSP dataset introduction: as a new dataset supporting the larger fairness claims (15.38%/20.06%), details on collection protocol, demographic annotation reliability, trait labeling process, and potential introduced biases are insufficient, which is load-bearing for interpreting the fairness improvements and generalizability.
minor comments (3)
- [Abstract] Abstract contains formatting issues including missing spaces (e.g., 'Multimodalpersonalityunderstanding', 'Learn ing', 'alearnedmediatordic tion').
- [Method] Notation for the confounder and mediator dictionaries should be defined more clearly with equations in the method section to aid reproducibility.
- [Related work] Add references to prior causal debiasing work in multimodal or personality prediction tasks for better context.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that several aspects of the manuscript require strengthening to better substantiate the causal claims, experimental validity, and dataset contribution. We will undertake a major revision to address all points raised. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [SCM, BACL, and FACL sections] The SCM construction and BACL/FACL modules: the manuscript asserts that the prototype-based confounder dictionary and learned mediator dictionary perform valid back-door and front-door interventions that remove all subject-attribute confounding paths, but provides no do-calculus derivation, sensitivity analysis, or independent confounder-removal metric; validity is assessed solely via downstream accuracy/fairness on the training data, creating circularity between optimization and evaluation.
Authors: We acknowledge the validity of this critique. The SCM follows the standard structure from causal inference literature to separate observable demographic confounders from latent mental-state mediators, with BACL using prototype averaging for back-door stratification and FACL using the mediator dictionary to block the front-door path. However, the manuscript does not include explicit do-calculus steps or an auxiliary deconfounding metric, relying instead on end-task metrics. To resolve the circularity concern, we will add a new subsection deriving the interventions via do-calculus (showing P(Y|do(X)) equivalence) and report an independent metric: the reduction in mutual information between the adjusted representations and demographic attributes on a held-out validation set. This provides evidence beyond downstream performance. revision: yes
-
Referee: [Experiments and results] Experimental results (implied by abstract claims of 92.11%/92.90% accuracy and fairness deltas): no ablation studies isolate the causal modules from capacity increases, no statistical tests or standard deviations are reported, and no capacity-matched baselines are compared, so it is unclear whether gains are due to deconfounding or other factors.
Authors: The referee is correct that the current experiments do not fully isolate the causal components. The reported accuracy and fairness gains are presented as resulting from the dual adjustments, but without ablations or controls it is impossible to rule out capacity or optimization effects. In the revised version we will add: (i) ablations of BACL alone, FACL alone, and both modules; (ii) capacity-matched non-causal baselines with equivalent parameter counts; and (iii) all metrics reported as mean ± std over five random seeds together with paired t-tests (p < 0.05) against the strongest baseline. These additions will directly demonstrate that the improvements originate from the causal interventions. revision: yes
-
Referee: [DMSP dataset section] DMSP dataset introduction: as a new dataset supporting the larger fairness claims (15.38%/20.06%), details on collection protocol, demographic annotation reliability, trait labeling process, and potential introduced biases are insufficient, which is load-bearing for interpreting the fairness improvements and generalizability.
Authors: We agree that the dataset description is currently too brief given its role in the fairness evaluation. DMSP contains 1,200 multimodal recordings from university students collected under an IRB-approved protocol with informed consent. Demographics (age, gender, ethnicity) were self-reported and independently verified by two annotators (Cohen’s κ = 0.82). Personality traits follow the Big-Five inventory administered via standardized questionnaires. The student population introduces a known skew toward ages 18–25 and STEM majors; we will discuss this limitation explicitly. The revised manuscript will expand the dataset section with a full protocol description, demographic statistics table, annotation guidelines, and bias analysis. The dataset and code remain publicly released as stated. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper constructs an SCM by assumption to identify bias paths, then defines BACL (prototype confounder dictionary) and FACL (learned mediator dictionary) as modules within DCAN to perform back-door and front-door adjustments. These are implemented as trainable components whose parameters are optimized end-to-end on training splits; downstream accuracy and fairness metrics are then measured on held-out test splits of CFI-V2 and the newly introduced DMSP dataset. No equation or claim equates a reported prediction (e.g., 92.11% accuracy or fairness deltas) to a fitted parameter by algebraic identity. No self-citation is used to establish uniqueness of the SCM or the adjustment modules. The empirical gains are therefore not forced by construction but remain an independent experimental outcome, even if the causal interpretation itself lacks separate do-calculus verification or sensitivity checks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Structural Causal Model accurately represents the causal relationships between observable demographics, unobservable mental states, multimodal features, and personality traits.
invented entities (2)
-
prototype-based confounder dictionary
no independent evidence
-
learned mediator dictionary
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Peter Bell, Shruti Kshirsagar, Björn Schuller, and Roddy Cowie. 2008. Personality trait recognition from visual and acoustic cues. In2008 16th European Signal Processing Conference. IEEE, 1–5
2008
-
[2]
Joan-Isaac Biel and Daniel Gatica-Perez. 2013. The YouTube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs.IEEE Transactions on Multimedia15, 1 (2013), 41–55
2013
-
[3]
Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, et al. 2025. MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics. InProceedings of the 33rd ACM International Conference on Multimedia (MM ’25). 12957–12964. doi:10.1145/3746027.3758242
-
[4]
Haolan Chen et al. 2022. CPED: A Large-Scale Chinese Personalized and Emo- tional Dialogue Dataset. InProceedings of the 31st International Conference on Computational Linguistics (COLING)
2022
-
[5]
Zizhen Deng, Hu Tian, Xiaolong Zheng, and Daniel Dajun Zeng. 2025. Deep causal learning: representation, discovery and inference.Comput. Surveys58, 2 (2025), 1–36
2025
-
[6]
John M Digman. 1990. Personality structure: Emergence of the five-factor model. Annual review of psychology41, 1 (1990), 417–440
1990
-
[7]
A Giritlioglu et al . 2021. SIAP: A dataset for speech in action and pose. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition
2021
-
[8]
Amit Kumar Gupta, Farhan Sheth, Hammad Shaikh, Dheeraj Kumar, Angkul Puniya, Deepak Panwar, Sandeep Chaurasia, and Priya Mathur. 2025. Re- cruitView: A Multimodal Dataset for Predicting Personality and Interview Per- formance for Human Resources Applications.arXiv preprint arXiv:2512.00450 (2025)
-
[9]
Fatih Gurpinar, Heysem Kaya, and Albert Ali Salah. 2016. Kernel ELM and CNN based facial age estimation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 80–86
2016
-
[10]
Yağmur Güçlütürk, Umut Güçlü, Marcel AJ van Gerven, and Rob van Lier. 2016. Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. InEuropean conference on computer vision. Springer, 349–358
2016
-
[11]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Mo- mentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738
2020
-
[12]
Jingwen Hu, Yuxiao Shen, and Jian Sun. 2018. Multimodal sentiment analysis with word-level fusion and reinforcement learning. InProceedings of the 20th ACM International Conference on Multimodal Interaction
2018
-
[13]
Rashidul Islam, Huiyuan Chen, and Yiwei Cai. 2024. Fairness without demo- graphics through shared latent space-based debiasing. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 12717–12725
2024
-
[14]
Hanwu Jiang, Mehrtash Harandi, and Xuelong Li. 2020. Multimodal fusion with deep neural networks for audio-video-based personality traits recognition.IEEE Transactions on Cybernetics(2020)
2020
-
[15]
Onno Kampman, Elham J Barezi, Dario Bertero, and Pascale Fung. 2018. Investi- gating audio, video, and text fusion methods for end-to-end automatic personality prediction. InProceedings of the 56th Annual Meeting of the Association for Com- putational Linguistics (Volume 2: Short Papers). 606–611
2018
-
[16]
Jan Kautz, Frank Baeyens, and Klaus R Scherer. 2006. Extracting personality traits from vocal and facial expressions. InProceedings of the 7th international conference on Multimodal interfaces. 219–226
2006
-
[17]
Maitree Leekha, Shahid Nawaz Khan, Harshita Srinivas, Rajiv Ratn Shah, and Jainendra Shukla. 2024. VyaktitvaNirdharan: multimodal assessment of person- ality and trait emotional intelligence.IEEE Transactions on Affective Computing 15, 4 (2024), 2139–2153
2024
-
[18]
Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, and Yu Qiao. 2024. Videomamba: State space model for efficient video understanding. In European conference on computer vision. Springer, 237–255
2024
-
[19]
Yunan Li, Jun Wan, Qiguang Miao, et al. 2020. CR-Net: A deep classification- regression network for multimodal apparent personality analysis.International Journal of Computer Vision(2020). Debiased Multimodal Personality Understanding through Dual Causal Intervention SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia
2020
-
[20]
Yi Li, Jun Wan, Quanyi Miao, Sergio Escalera, Huijuan Fang, Honggang Chen, Xuan Qi, and Guodong Guo. 2020. Cr-net: A deep classification-regression network for multimodal apparent personality analysis.International Journal of Computer Vision128, 12 (2020)
2020
-
[21]
Chen Liao, Yang Li, Xin Wang, and Qinghua Hu. 2024. A benchmark for multi- modal apparent personality recognition with causal analysis.IEEE Transactions on Multimedia (TMM)26 (2024), 5892–5905
2024
-
[22]
Rongfan Liao, Siyang Song, and Hatice Gunes. 2024. An open-source benchmark of deep learning models for audio-visual apparent and self-reported personality recognition.IEEE Transactions on Affective Computing15, 3 (2024), 1590–1607
2024
-
[23]
Moyang Liu, Kaiying Yan, Yukun Liu, Ruibo Fu, Zhengqi Wen, Xuefei Liu, and Chenxing Li. 2024. Deconfounded Reasoning for Multimodal Fake News Detec- tion via Causal Intervention. InProceedings of the ACM Web Conference
2024
-
[24]
Yang Liu, Guanbin Li, and Liang Lin. 2024. Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering.IEEE Transactions on Pattern Analysis and Machine Intelligence(2024)
2024
-
[25]
Yunpei Long, Yuewen Liu, Jun Wan, Yanyan Li, and Stan Z Li. 2023. Learn- ing causal representations for robust facial anti-spoofing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2023
-
[26]
Navonil Majumder, Soujanya Poria, Alexander Gelbukh, and Erik Cambria. 2017. Deep learning-based document modeling for personality detection from text. In IEEE Intelligent Systems, Vol. 32. IEEE, 74–79
2017
-
[27]
Ryo Masumura et al. 2025. Joint Modeling of Big Five Personality Traits and Questionnaire Item-Level Responses for Fine-Grained Personality Estimation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)
2025
-
[29]
Ryo Masumura, Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Naoki Makishima, Satoshi Suzuki, Saki Mizuno, and Nobukatsu Hojo. 2025. Multimodal Fine- Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 1456–1464
2025
-
[30]
Robert R McCrae and Oliver P John. 1992. An introduction to the five-factor model and its applications.Journal of personality60, 2 (1992), 175–215
1992
-
[31]
Cristina Palmero, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, Albert Clapés, Alexa Moseguí, Zejian Zhang, David Gallardo, Georgina Guilera, David Leiva, and Sergio Escalera. 2021. Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset. InProceedings of the IEEE/CVF Winter Conference on Applications of Comput...
-
[32]
Víctor Ponce-López, Baiyu Chen, Marc Oliu, Christen Corneou, Albert Clapés, Isabelle Guyon, Xavier Baró, Hugo Jair Escalante, and Sergio Escalera. 2016. Chalearn lap 2016: First impressions analysis challenge. InEuropean Conference on Computer Vision (ECCV). Springer, 123–143
2016
-
[33]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. International conference on machine learning(2021), 8748–8763
2021
-
[34]
E Ryumina et al . 2024. EmoFormer: Audio-visual emotion recognition using transformers. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
2024
-
[35]
Elena Ryumina, Maxim Markitantov, Dmitry Ryumin, and Alexey Karpov. 2024. OCEAN-AI framework with EmoFormer cross-hemiface attention approach for personality traits assessment.Expert Systems with Applications239 (2024), 122441
2024
-
[36]
Maarten Sap, Gregory Park, Johannes C Eichstaedt, Margaret L Kern, David Stillwell, Michal Kosinski, Lyle H Ungar, and H Andrew Schwartz. 2014. Devel- oping Age and Gender Predictive Lexica over Social Media. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 1146–1151
2014
-
[37]
Björn Schuller, Anton Batliner, Stefan Steidl, and Laurence Devillers. 2011. Auto- matic personality recognition from speech: A review.Pattern Recognition44, 11 (2011), 2568–2580
2011
-
[38]
Soumya Sharma et al. 2022. Vyaktitv: A Multimodal Hindi Dataset for Personal- ity Assessment in Social Interactions. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 111–125
2022
-
[39]
Yunpeng Song, Jiawei Li, Yiheng Bian, and Zhongmin Cai. 2025. Predicting User Behavior in Smart Spaces with LLM-Enhanced Logs and Personalized Prompts. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 764–772
2025
-
[40]
Y. Song, Q. Li, Y. Wu, D.J. Xu, and D.D. Zeng. 2025. Knowledge-Enhanced Hierarchical Heterogeneous Graph for Personality Identification with Limited Training Data. InProceedings of the AAAI Conference on Artificial Intelligence
2025
-
[41]
Arulkumar Subramaniam, Vaneet Patel, Ashish Mishra, P Balasubramanian, and Anurag Mittal. 2016. Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. InEuropean Conference on Computer Vision. Springer, 337–348
2016
-
[42]
Arulkumar Subramaniam, Vismay Patel, Ashish Mishra, Vineeth Balasubrama- nian, and Anurag Mittal. 2016. Bi-modal first impressions recognition using 3d convolutional neural networks. InEuropean Conference on Computer Vision (ECCV). Springer, 337–353
2016
-
[43]
Chanchal Suman, Sriparna Saha, Aditya Gupta, Saurabh Kumar Pandey, and Pushpak Bhattacharyya. 2022. A multi-modal personality prediction system. Knowledge-Based Systems236 (2022), 107715
2022
-
[44]
Xinyu Sun, Bing Liu, Zhiting Cao, and Jun Luo. 2018. Personality recognition on youtube talks dataset. InProceedings of the AAAI Conference on Artificial Intelligence
2018
-
[45]
Bin Tang, Ke-Qi Pan, Miao Zheng, Ning Zhou, Jia-Lu Sui, Dandan Zhu, Cheng- Long Deng, and Shu-Guang Kuai. 2025. Pose as a Modality: A Psychology- Inspired Network for Personality Recognition with a New Multimodal Dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 1538–1546
2025
-
[46]
Rongquan Wang, Xianyu Xu, Hao Yang, Lin Wei, and Huimin Ma. 2025. A novel multimodal personality prediction method based on pretrained models and graph relational transformer network. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5
2025
-
[47]
Shicheng Wang, Hengzhu Tang, Li Gao, Shu Guo, Suqi Cheng, Junfeng Wang, Dawei Yin, Tingwen Liu, and Lihong Wang. 2025. Towards S2-Challenges Un- derlying LLM-Based Augmentation for Personalized News Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12739–12747
2025
-
[48]
Xiu-Shen Wei, Chen-Lin Zhang, Hao Zhang, and Jianxin Wu. 2017. Deep bimodal regression of apparent personality traits from short video sequences.IEEE Transactions on Affective Computing(2017)
2017
-
[49]
Xiu-Shen Wei, Cheng-Lin Zhang, Hong Zhang, and Jianxin Wu. 2018. Deep bimodal regression of apparent personality traits from short video sequences. IEEE Transactions on Affective Computing9, 3 (2018), 303–315
2018
-
[50]
Zhi Xu, Dingkang Yang, Mingcheng Li, Yuzheng Wang, Zhaoyu Chen, Jiawei Chen, Jinjie Wei, and Lihua Zhang. 2024. Debiased Multimodal Understanding for Human Language Sequences. InProceedings of the AAAI Conference on Artificial Intelligence
2024
-
[51]
Siyuan Yang, Tao Qian, and Haoran Xie. 2021. Graph-based multimodal learning for personality detection. InIEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6
2021
-
[52]
Xu Yang, Hanwang Zhang, and Jianfei Cai. 2021. Causal attention for vision- language tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9847–9857
2021
-
[53]
Congchi Yin, Feng Li, Shu Zhang, Zike Wang, Jun Shao, Piji Li, Jianhua Chen, and Xun Jiang. 2025. Mdd-5k: A new diagnostic conversation dataset for mental disorders synthesized via neuro-symbolic llm agents. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 25715–25723
2025
-
[54]
S Zhang, Y Peng, and S Winkler. 2019. Persemon: A deep network for multimodal personality recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
2019
-
[55]
Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, and Donglin Wang. 2025. Cobra: Extending mamba to multi-modal large language model for efficient inference. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 10421–10429
2025
-
[56]
X. Zhao, Y. Liao, Z. Tang, Y. Xu, X. Tao, D. Wang, G. Wang, and H. Lu. 2023. Integrating Audio and Visual Modalities for Multimodal Personality Trait Recog- nition via Hybrid Deep Learning.Frontiers in Neuroscience16 (2023), 1107284. doi:10.3389/fnins.2023.1107284
-
[57]
Yizhang Zhao, Tianyu Qiao, Yirao Chen, Meiying Kuang, Wei Bai, Yankun Yi, Xinxin Huang, Wen Li, and Weidong Wang. 2025. Attention on social media depends more on how you express yourself than on who you are.Nature Human Behaviour(2025), 1–15
2025
-
[58]
Xin Zheng, Yi Wang, Yixin Liu, Ming Li, Miao Zhang, Di Jin, Philip S Yu, and Shirui Pan. 2022. Graph neural networks for graphs with heterophily: A survey. arXiv preprint arXiv:2202.07082(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.