arxiv: 2605.00370 · v2 · submitted 2026-05-01 · 💻 cs.LG · cs.CY· cs.MM

Recognition: 2 theorem links

· Lean Theorem

Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

Chunlei Meng, Chun Ouyang, Hoi Leong Lee, Pengbin Feng, Rong Fu, Weilin Zhou, Xiaojing Du, Zeyu Zhang, Zhaolu Kang, Zhongxue Gan

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:06 UTC · model grok-4.3

classification 💻 cs.LG cs.CYcs.MM

keywords Group Cognition Learningmultimodal fusionmodality dominancespurious couplingagent collaborationtwo-stage protocolsentiment analysisintent recognition

0 comments

The pith

Group Cognition Learning uses two-stage agent collaboration to reduce modality dominance and spurious coupling in multimodal fusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Group Cognition Learning to fix how models combine language, acoustic, and visual signals by replacing direct fusion with governed collaboration among specialized agents. Standard centralized approaches let one modality dominate optimization and overfit to incidental cross-modal links that do not improve actual predictions. The method first routes and gates interactions only when they deliver positive marginal gain, then forms consensus around an explicit shared factor while weighting each modality by its measured contribution. A sympathetic reader would care because reliable use of all signals matters for tasks like emotion recognition from video and speech where weaker cues still carry value. Experiments on three benchmarks show the protocol yields state-of-the-art regression and classification results by keeping modalities as distinct specialization channels.

Core claim

Centralized multimodal learning compresses language, acoustic, and visual signals into a single fused representation but suffers from modality dominance where optimization ignores weaker yet informative modalities and from spurious modality coupling where models overfit to incidental cross-modal correlations. Group Cognition Learning addresses this with a two-stage protocol after modality-specific encoding: in Selective Interaction a Routing Agent proposes directed routes while an Auditing Agent assigns sample-wise gates to emphasize exchanges that yield positive marginal predictive gain; in Consensus Formation a Public-Factor Agent maintains an explicit shared factor and an AggregationAgent

What carries the argument

The two-stage governed collaboration protocol consisting of Selective Interaction (Routing Agent plus Auditing Agent) followed by Consensus Formation (Public-Factor Agent plus Aggregation Agent) that enforces gain-focused exchanges and contribution-aware weighting while preserving modality specializations.

If this is right

Optimization no longer gravitates toward the path of least resistance and therefore incorporates information from weaker but still informative modalities.
Models stop overfitting to incidental cross-modal correlations because only exchanges with positive marginal predictive gain are retained.
Each modality representation is preserved as a distinct specialization channel rather than being fully compressed into one vector.
State-of-the-art results are obtained on both regression and classification benchmarks across the three evaluated multimodal datasets.
Analysis experiments confirm that the Routing, Auditing, Public-Factor, and Aggregation agents each contribute to the observed mitigation of dominance and coupling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same governed-agent pattern could be tested in non-multimodal settings where multiple feature groups risk one group dominating training.
An explicit public factor that is maintained separately from modality channels offers a natural hook for inspecting what information the model treats as shared.
The marginal-gain auditing step might be adapted as a general regularizer in any multi-component model to suppress low-value interactions.
Scaling the four-agent design to datasets that contain more than three modalities would reveal whether additional specialized agents become necessary.

Load-bearing premise

The four specialized agents can be trained to reliably identify positive marginal predictive gain and contribution-aware weights without introducing new overfitting or requiring dataset-specific tuning that undermines generality.

What would settle it

If the full GCL system is compared on CMU-MOSI against a version where the Auditing Agent's gates are removed or fixed to allow all interactions, and the ablated version shows equal or higher performance, the necessity of selective gating for the claimed mitigation would be refuted.

Figures

Figures reproduced from arXiv: 2605.00370 by Chunlei Meng, Chun Ouyang, Hoi Leong Lee, Pengbin Feng, Rong Fu, Weilin Zhou, Xiaojing Du, Zeyu Zhang, Zhaolu Kang, Zhongxue Gan.

**Figure 1.** Figure 1: Overview of the GCL architecture. The paradigm implements a two-stage governed collaboration protocol. Governed Interaction (Stage 1) uses Routing and Auditing agents to regulate cross-modal exchange based on marginal predictive gain. Consensus Formation (Stage 2) employs Public-Factor and Aggregation agents to synthesize predictions anchored by a shared semantic factor. by a specific encoder to yield init… view at source ↗

**Figure 2.** Figure 2: Robustness to Gaussian noise on CMU-MOSI. We inject additive Gaussian noise with varying standard deviation into all modalities. GCL demonstrates superior stability, maintaining the best MAE and Acc7 across all noise levels compared to baselines. Impact of Consensus Mechanisms and Regularization. We analyze the contribution of the Stage 2 components and the auxiliary objectives. Removing the Public-Factor … view at source ↗

**Figure 4.** Figure 4: Robustness against Spurious Coupling. The axes denote coupling diagnostics (HSIC/CKA, lower is better) and symbol size represents task accuracy. GCL (teal) remains remarkably stable compared to the drastic collapse of NoRed (red), demonstrating that redundancy control effectively suppresses spurious coupling [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Consensus Landscape Analysis. Dominance Index (D) vs. Alignment Correlation (Corr); Symbol size represents task accuracy. While UniformAgg lacks adaptivity and NoPublic Agent degenerates into dominance collapse (high D, low Corr), GCL occupies the optimal equilibrium, maximizing alignment with genuine marginal utility while preventing modality collapse [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Centralized multimodal learning commonly compresses language, acoustic, and visual signals into a single fused representation for prediction. While effective, this paradigm suffers from two limitations: modality dominance, where optimization gravitates towards the path of least resistance, ignoring weaker but informative modalities, and spurious modality coupling, where models overfit to incidental cross-modal correlations. To address these, we propose Group Cognition Learning (GCL), a governed collaboration paradigm that applies a two-stage protocol after modality-specific encoding. In Stage 1 (Selective Interaction), a Routing Agent proposes directed interaction routes, and an Auditing Agent assigns sample-wise gates to emphasize exchanges that yield positive marginal predictive gain while suppressing redundant coupling. In Stage 2 (Consensus Formation), a Public-Factor Agent maintains an explicit shared factor, and an Aggregation Agent produces the final prediction through contribution-aware weighting while keeping each modality representation as a specialization channel. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate that GCL mitigates dominance and coupling, establishing state-of-the-art results across both regression and classification benchmarks. Analysis experiments further demonstrate the effectiveness of the design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GCL sketches a four-agent two-stage protocol to reduce modality dominance and coupling in multimodal fusion, but the abstract supplies no methods, equations, or results to support the SOTA claims.

read the letter

The main thing to know is that this paper describes Group Cognition Learning as a governed collaboration using four specialized agents in two stages to handle modality dominance and spurious coupling after separate encoding, yet nothing in the abstract lets you check whether any of it actually works or improves on existing fusion techniques. The specific split—Routing and Auditing agents in stage one for directed interactions and sample-wise gates based on positive marginal predictive gain, then Public-Factor and Aggregation agents in stage two for a shared factor and contribution-aware weighting—gives a named protocol that could be tried by others working on language-acoustic-visual tasks. That framing is the clearest new piece here. It does a straightforward job naming two practical headaches that show up in sentiment and intent recognition on datasets like CMU-MOSI and CMU-MOSEI. Laying out selective interaction followed by consensus formation supplies a structure that might help people move past simple concatenation or basic attention. The soft spots are the lack of any supporting material. No equations define the marginal gain or weighting rules, no training procedure or loss terms appear, and the SOTA results on the three benchmarks are stated without tables, baselines, ablations, or even basic metrics. That leaves the central claim—that the agents reliably suppress dominance and coupling—uncheckable. The assumption that these agents can be trained to find useful gains without new overfitting or dataset-specific tuning also sits unsupported. This is the sort of paper that would interest people already working on multimodal classification in affective computing who want to test new fusion architectures. A reader looking for reproducible improvements or formal grounding would come away empty. It deserves a serious referee once the full methods, results, and comparisons are in place, because the problems it targets matter and the agent roles are concrete enough to evaluate.

Referee Report

2 major / 1 minor

Summary. The paper proposes Group Cognition Learning (GCL), a governed two-stage agent collaboration framework for multimodal learning. After modality-specific encoding, Stage 1 (Selective Interaction) uses a Routing Agent to propose interaction routes and an Auditing Agent to apply sample-wise gates emphasizing positive marginal predictive gain while suppressing redundant coupling. Stage 2 (Consensus Formation) employs a Public-Factor Agent to maintain a shared factor and an Aggregation Agent for final prediction via contribution-aware weighting, preserving modality specializations. The central claim is that this mitigates modality dominance and spurious coupling, yielding state-of-the-art regression and classification results on CMU-MOSI, CMU-MOSEI, and MIntRec, with supporting analysis experiments.

Significance. If the claimed performance gains and mitigation effects are substantiated by detailed experiments, GCL could offer a structured alternative to standard fusion methods in multimodal settings by explicitly governing inter-modality interactions. The two-stage protocol with specialized agents addresses recognized issues of dominance and incidental correlations in a potentially generalizable way, which might extend to other fusion tasks. However, the lack of any implementation specifics or results in the provided manuscript prevents a full evaluation of its significance.

major comments (2)

[Abstract] Abstract: The assertion that GCL 'establishes state-of-the-art results across both regression and classification benchmarks' on CMU-MOSI, CMU-MOSEI, and MIntRec is unsupported by any metrics, tables, baseline comparisons, ablation studies, or quantitative evidence, which is load-bearing for the headline performance claim.
[Abstract] Abstract: The key mechanisms of 'positive marginal predictive gain' (used by the Auditing Agent) and 'contribution-aware weighting' (used by the Aggregation Agent) are invoked to mitigate dominance and coupling but receive no mathematical definitions, training procedures, loss terms, or optimization details, preventing verification of the central mitigation claim.

minor comments (1)

[Abstract] Abstract: The repeated use of informal phrasing such as 'Making Everything Better' in the title and 'governed collaboration paradigm' could be replaced with more precise technical language to improve clarity for a journal audience.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's constructive comments. We recognize the limitations in the provided abstract regarding supporting evidence and details, and we will revise the manuscript to address these issues comprehensively.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that GCL 'establishes state-of-the-art results across both regression and classification benchmarks' on CMU-MOSI, CMU-MOSEI, and MIntRec is unsupported by any metrics, tables, baseline comparisons, ablation studies, or quantitative evidence, which is load-bearing for the headline performance claim.

Authors: We agree that the abstract claim of establishing state-of-the-art results must be supported by quantitative evidence. The revised manuscript will include detailed experimental results, tables showing performance metrics on the specified datasets for both regression and classification, baseline comparisons, and ablation studies to substantiate these claims. We will update the manuscript accordingly during the revision. revision: yes
Referee: [Abstract] Abstract: The key mechanisms of 'positive marginal predictive gain' (used by the Auditing Agent) and 'contribution-aware weighting' (used by the Aggregation Agent) are invoked to mitigate dominance and coupling but receive no mathematical definitions, training procedures, loss terms, or optimization details, preventing verification of the central mitigation claim.

Authors: The abstract describes the high-level ideas behind these mechanisms. In the revised manuscript, we will introduce precise mathematical definitions. Positive marginal predictive gain will be defined as the difference in the model's predictive loss or accuracy when the interaction is included versus excluded, used to gate the routes. The Auditing Agent will optimize a loss term that encourages positive gain. Contribution-aware weighting will be defined using a weighted aggregation where weights are derived from each modality's contribution to the final prediction, estimated through dedicated sub-networks or attention mechanisms. Full training procedures, loss functions, and optimization details will be provided in the methods section. revision: yes

Circularity Check

0 steps flagged

No circularity: abstract contains no equations or derivations

full rationale

The abstract describes a two-stage agent protocol in natural language without any equations, parameters, or mathematical steps. No self-definitional reductions, fitted inputs renamed as predictions, or self-citation chains appear. Claims of mitigating dominance and coupling rest on unspecified experiments rather than a closed derivation that reduces to its inputs by construction. This is the expected non-finding when only a high-level method sketch is available.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 4 invented entities

Abstract-only review provides no mathematical derivations, so no free parameters, standard axioms, or independently evidenced entities can be extracted; the four agent types are introduced as new components without external validation.

invented entities (4)

Routing Agent no independent evidence
purpose: Proposes directed interaction routes between modalities
New component introduced to control Stage 1 interactions
Auditing Agent no independent evidence
purpose: Assigns sample-wise gates based on marginal predictive gain
New component introduced to suppress redundant coupling
Public-Factor Agent no independent evidence
purpose: Maintains explicit shared factor across modalities
New component introduced for Stage 2 consensus
Aggregation Agent no independent evidence
purpose: Produces final prediction via contribution-aware weighting
New component introduced to preserve modality specializations

pith-pipeline@v0.9.0 · 5506 in / 1399 out tokens · 44991 ms · 2026-05-12T03:06:03.055848+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Stage 1 (Selective Interaction) uses Routing Agent to propose directed routes and Auditing Agent to assign sample-wise gates based on positive marginal predictive gain while suppressing redundant coupling via Lred (InfoNCE) and Lgain.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear
Stage 2 maintains explicit public factor c via Public-Factor Agent and contribution-aware weighting πm via Aggregation Agent to prevent feature collapse.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

[1]

FirstName LastName , title =

work page
[2]

Multi-grained teacher–student joint representation learning for surface defect classification , journal =

Chunlei Meng and Jiacheng Yang and Wei Lin and Linqiang Hu and Bowen Liu and Zhuo Zou and LiDa Xu and Zhongxue Gan and Chun Ouyang , volume =. Multi-grained teacher–student joint representation learning for surface defect classification , journal =

work page
[3]

2015 ieee information theory workshop (itw) , pages=

Deep learning and the information bottleneck principle , author=. 2015 ieee information theory workshop (itw) , pages=

work page 2015
[4]

Proceedings of the 28th ACM international conference on multimedia , pages=

Misa: Modality-invariant and-specific representations for multimodal sentiment analysis , author=. Proceedings of the 28th ACM international conference on multimedia , pages=

work page
[5]

arXiv preprint arXiv:2401.11818 , year=

Mind: improving multimodal sentiment analysis via multimodal information disentanglement , author=. arXiv preprint arXiv:2401.11818 , year=

work page arXiv
[6]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Joint multimodal transformer for emotion recognition in the wild , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[7]

ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Fine-grained disentangled representation learning for multimodal emotion recognition , author=. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2024 , organization=

work page 2024
[8]

Proceedings of the 30th ACM international conference on multimedia , pages=

Disentangled representation learning for multimodal emotion recognition , author=. Proceedings of the 30th ACM international conference on multimedia , pages=

work page
[9]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Decoupled multimodal distilling for emotion recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[10]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , pages=

Confede: Contrastive feature decomposition for multimodal sentiment analysis , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , pages=

work page
[11]

Proceedings of the AAAI conference on artificial intelligence , pages=

Scd-net: Spatiotemporal clues disentanglement network for self-supervised skeleton-based action recognition , author=. Proceedings of the AAAI conference on artificial intelligence , pages=

work page
[12]

Computer Vision -- ECCV 2024 , pages=

Towards multimodal sentiment analysis debiasing via bias purification , author=. Computer Vision -- ECCV 2024 , pages=. 2024 , organization=

work page 2024
[13]

Proceedings of the 38th International Conference on Neural Information Processing Systems , volume=

Classifier-guided gradient modulation for enhanced multimodal learning , author=. Proceedings of the 38th International Conference on Neural Information Processing Systems , volume=

work page
[14]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

D2r: Dual-branch dynamic routing network for multimodal sentiment detection , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2024
[15]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Multimodal prompting with missing modalities for visual recognition , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[16]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

An empirical evaluation of generic convolutional and recurrent networks for sequence modeling , author=. arXiv preprint arXiv:1803.01271 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

IEEE Intelligent Systems , pages=

Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages , author=. IEEE Intelligent Systems , pages=

work page
[18]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , pages=

Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph , author=. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , pages=

work page
[19]

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

EMOE: Modality-Specific Enhanced Dynamic Emotion Experts , author=. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page 2025
[20]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

DLF: Disentangled-language-focused multimodal sentiment analysis , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Enriching multimodal sentiment analysis through textual emotional descriptions of visual-audio content , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[22]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Semi-IIN: Semi-supervised Intra-inter modal Interaction Learning Network for Multimodal Sentiment Analysis , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[23]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Multimodal transformer for unaligned multimodal language sequences , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

work page
[24]

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages =

Tensor Fusion Network for Multimodal Sentiment Analysis , author =. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages =

work page 2017
[25]

arXiv preprint arXiv:1806.00064 , year=

Efficient low-rank multimodal fusion with modality-specific factors , author=. arXiv preprint arXiv:1806.00064 , year=

work page arXiv
[26]

Proceedings of the AAAI conference on artificial intelligence , pages=

Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis , author=. Proceedings of the AAAI conference on artificial intelligence , pages=

work page
[27]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[28]

arXiv preprint arXiv:2310.05804 , year=

Learning language-guided adaptive hyper-modality representation for multimodal sentiment analysis , author=. arXiv preprint arXiv:2310.05804 , year=

work page arXiv
[29]

Proceedings of the 19th ACM International Conference on Multimodal Interaction , pages =

Multimodal sentiment analysis with word-level fusion and reinforcement learning , author =. Proceedings of the 19th ACM International Conference on Multimodal Interaction , pages =

work page
[30]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages =

Integrating Multimodal Information in Large Pretrained Transformers , author =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages =

work page
[31]

Proceedings of the 30th ACM International Conference on Multimedia , pages =

Zhang, Hanlei and Xu, Hua and Wang, Xin and Zhou, Qianrui and Zhao, Shaojie and Teng, Jiayan , title =. Proceedings of the 30th ACM International Conference on Multimedia , pages =

work page
[32]

Kamrul Hasan and Sangwu Lee and AmirAli Bagher Zadeh and Chengfeng Mao and Louis

Wasifur Rahman and Md. Kamrul Hasan and Sangwu Lee and AmirAli Bagher Zadeh and Chengfeng Mao and Louis. Integrating Multimodal Information in Large Pretrained Transformers , booktitle =

work page
[33]

arXiv preprint arXiv:2410.11428 , year=

CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction , author=. arXiv preprint arXiv:2410.11428 , year=

work page arXiv
[34]

RTS-ViT: Real-Time Share Vision Transformer for Image Classification , year=

Meng, Chunlei and Lin, Wei and Liu, Bowen and Zhang, Hongda and Gan, Zhongxue and Ouyang, Chun , journal=. RTS-ViT: Real-Time Share Vision Transformer for Image Classification , year=

work page
[35]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs , author =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[36]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Contextual augmented global contrast for multimodal intent recognition , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[37]

Neural Machine Translation by Jointly Learning to Align and Translate

Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[38]

Computer vision--ECCV 2016: 14th European conference, amsterdam, the netherlands, October 11--14, 2016, proceedings, part VII 14 , pages=

A discriminative feature learning approach for deep face recognition , author=. Computer vision--ECCV 2016: 14th European conference, amsterdam, the netherlands, October 11--14, 2016, proceedings, part VII 14 , pages=

work page 2016
[39]

International conference on machine learning , pages=

A simple framework for contrastive learning of visual representations , author=. International conference on machine learning , pages=

work page
[40]

Jian and Zhongxue Gan and Chun Ouyang , journal=

Chunlei Meng and Guanhong Huang and Rong Fu and Runmin. Jian and Zhongxue Gan and Chun Ouyang , journal=

work page
[41]

Temporal-Spatial Decouple Before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis , pages=

Meng, Chunlei and Zhou, Ziyang and He, Lucas and Du, Xiaojing and Ouyang, Chun and Gan, Zhongxue , booktitle=. Temporal-Spatial Decouple Before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis , pages=

work page
[42]

Mitigating Shared-Private Branch Imbalance via Dual-Branch Rebalancing for Multimodal Sentiment Analysis

Mitigating Shared-Private Branch Imbalance via Dual-Branch Rebalancing for Multimodal Sentiment Analysis , author=. arXiv preprint arXiv:2604.25179 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Chunlei Meng and Jiabin Luo and Zhenglin Yan and Zhenyu Yu and Rong Fu and Zhongxue Gan and Chun Ouyang , journal=

work page
[44]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Wei, Shicai and Luo, Chunbo and Luo, Yang , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

work page