EEG-Based Multimodal Learning via Hyperbolic Mixture-of-Curvature Experts

Cuntai Guan; Guanxiang Huang; Motoaki Kawanabe; Qibin Zhao; Runhe Zhou; Shanglin Li; Xinliang Zhou; Yi Ding

arxiv: 2604.12579 · v3 · pith:JPLYJKA4new · submitted 2026-04-14 · 💻 cs.LG

EEG-Based Multimodal Learning via Hyperbolic Mixture-of-Curvature Experts

Runhe Zhou , Shanglin Li , Guanxiang Huang , Xinliang Zhou , Qibin Zhao , Motoaki Kawanabe , Yi Ding , Cuntai Guan This is my paper

Pith reviewed 2026-05-12 00:51 UTC · model grok-4.3

classification 💻 cs.LG

keywords EEGmultimodal learninghyperbolic geometrymixture of expertsemotion recognitionsleep stagingcognitive assessment

0 comments

The pith

EEG-MoCE assigns each modality to its own learnable-curvature hyperbolic expert and fuses them with curvature-aware weighting to capture hierarchical structures in brain signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes EEG-MoCE for multimodal learning that combines EEG with signals such as facial expressions to assess mental states. It starts from the observation that these modalities contain hierarchical structures arising from cognitive processes, which flat Euclidean space represents poorly while hyperbolic space matches through its exponential volume growth. Each modality receives its own expert whose curvature is learned during training so the geometry fits the data, after which a fusion step weights contributions according to how strongly each expert's curvature signals rich hierarchy. The resulting model is tested on standard benchmarks and reported to reach higher performance than prior approaches on emotion recognition, sleep staging, and cognitive assessment. A reader would care because more faithful geometric modeling of brain data could support more reliable clinical tools for mental-state monitoring.

Core claim

EEG-MoCE places each input modality into a dedicated expert inside a hyperbolic space whose curvature is learned independently, thereby adapting the geometry to the modality's intrinsic hierarchy. Curvature-aware fusion then combines the expert outputs by dynamically emphasizing those modalities whose learned curvature indicates greater hierarchical content. Experiments on benchmark datasets establish state-of-the-art results across emotion recognition, sleep staging, and cognitive assessment tasks.

What carries the argument

The mixture-of-curvature experts operating in hyperbolic space, where each expert learns its own curvature and the fusion weights are derived from those curvatures to highlight modalities carrying richer hierarchical information.

If this is right

Multimodal EEG systems achieve higher accuracy on emotion classification than Euclidean baselines.
Sleep staging benefits from dynamic emphasis on modalities whose geometry encodes stronger hierarchy.
Cognitive assessment tasks obtain improved performance when curvature-aware weighting is applied.
The same adaptive-geometry principle extends to other EEG-based mental-state pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Curvature values learned for different modalities might later be inspected to quantify how much hierarchy each contributes.
If the curvature-learning step remains stable across patient populations, the framework could support real-time neurotechnology devices.
The approach supplies a concrete testbed for checking whether hyperbolic geometry systematically outperforms Euclidean geometry on hierarchical neuroscience data.

Load-bearing premise

That EEG and the other modalities possess hierarchical structures best captured by hyperbolic geometry when each modality is allowed its own independently learned curvature.

What would settle it

An ablation experiment that replaces the learnable per-modality curvatures with a single shared curvature or switches to Euclidean space while keeping all other components fixed, and finds no accuracy gain on the same emotion-recognition, sleep-staging, or cognitive-assessment benchmarks.

Figures

Figures reproduced from arXiv: 2604.12579 by Cuntai Guan, Guanxiang Huang, Motoaki Kawanabe, Qibin Zhao, Runhe Zhou, Shanglin Li, Xinliang Zhou, Yi Ding.

**Figure 1.** Figure 1: Euclidean vs. hyperbolic geometry for hierarchical data. Euclidean space is flat and tends to under-represent hierarchical branching; hyperbolic space exhibits exponential volume growth and better preserves tree-like separation. Hyperbolic geometry is informative for multimodal learning, where modalities may differ in how strongly hierarchical their underlying structure is. tive processes (Bell & Cuevas,… view at source ↗

**Figure 2.** Figure 2: Architecture of EEG-MoCE on EAV dataset. Other datasets use the same overall architecture, with only modality encoders adapted. (a) Modality-specific hyperbolic experts: each modality (e.g., EEG, audio, video) is encoded by an expert that embeds inputs in its own learnable-curvature hyperbolic space. (b) Curvature-oriented fusion: expert representations are aggregated by a curvature-aware scheme, combining… view at source ↗

**Figure 4.** Figure 4: Ablation of hyperbolic components on the EAV dataset, focusing on learnable curvature and curvature-oriented multimodal fusion (COMF). All fusion variants use learnable curvatures to isolate the effect of the fusion mechanism. Our proposed method achieves the best performance, demonstrating the complementary benefits of learnable curvatures and COMF. The results in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 3.** Figure 3: t-SNE of fused features on EAV dataset. Hyperbolic encoder and fusion variant produces more compact and betterseparated emotion clusters than the Euclidean baseline variant, illustrating improved class separability by hyperbolic geometry. The results demonstrate that hyperbolic geometry in both encoder and fusion stages is essential for optimal performance, with each component contributing complementary … view at source ↗

read the original abstract

Electroencephalography (EEG)-based multimodal learning integrates brain signals with complementary modalities to improve mental state assessment, providing great clinical potential. The effectiveness of such paradigms largely depends on the representation learning on heterogeneous modalities. For EEG-based paradigms, one promising approach is to leverage their hierarchical structures, as recent studies have shown that both EEG and associated modalities (e.g., facial expressions) exhibit hierarchical structures reflecting complex cognitive processes. However, Euclidean embeddings struggle to represent these hierarchical structures due to their flat geometry, while hyperbolic spaces, with their exponential growth property, are naturally suited for them. In this work, we propose EEG-MoCE, a novel hyperbolic mixture-of-curvature experts framework designed for multimodal neurotechnology. EEG-MoCE assigns each modality to an expert in a learnable-curvature hyperbolic space, enabling adaptive modeling of its intrinsic geometry. A curvature-aware fusion strategy then dynamically weights experts, emphasizing modalities with richer hierarchical information. Extensive experiments on benchmark datasets demonstrate that EEG-MoCE achieves state-of-the-art performance, including emotion recognition, sleep staging, and cognitive assessment. Code is available at https://github.com/zhourunhe/EEG-MoCE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces per-modality learnable curvatures in a hyperbolic MoE for EEG multimodal tasks, but the abstract supplies zero numbers or ablations to show the curvatures matter.

read the letter

The main thing to know is that the authors built EEG-MoCE, a mixture-of-experts setup where each modality gets its own expert in a hyperbolic space whose curvature is learned during training, followed by a fusion step that weights experts according to those curvatures. They apply it to emotion recognition, sleep staging, and cognitive assessment from EEG plus other signals. This specific adaptive-curvature formulation for neuro signals is new relative to prior hyperbolic work on hierarchies. The motivation section does a clean job explaining why flat Euclidean embeddings fall short for the tree-like structure in brain data and why hyperbolic geometry's volume growth fits better. The architecture description is straightforward and avoids unnecessary complexity. The soft spots are real and center on missing evidence. The abstract claims state-of-the-art results but shows no tables, no baseline numbers, no statistical tests, and no ablations that isolate learnable curvature from a fixed-curvature hyperbolic MoE or a plain Euclidean version of similar size. Without those, we cannot tell whether the curvatures actually settle at different values across modalities or whether the fusion weights track hierarchical richness instead of noise. The stability worry is also fair: learnable curvatures are known to be sensitive to initialization and can collapse toward zero or produce unstable Möbius operations, yet nothing in the writeup addresses this. The paper is aimed at people working at the overlap of geometric deep learning and multimodal brain-signal processing. A reader who wants to try hyperbolic methods on EEG data could extract a usable architecture sketch from it, but only after the experiments are filled in. It deserves a serious referee because the idea is coherent, the application area is relevant, and the geometric framing engages honestly with existing literature on hyperbolic embeddings. I would send it out for review with a clear request for curvature ablations, stability checks, and full quantitative results.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes EEG-MoCE, a hyperbolic mixture-of-curvature experts framework for EEG-based multimodal learning. Each modality is assigned to an expert operating in its own learnable-curvature hyperbolic space, followed by a curvature-aware fusion mechanism that dynamically weights the experts according to the richness of hierarchical structure in each modality. The authors claim that this architecture yields state-of-the-art performance on benchmark datasets for emotion recognition, sleep staging, and cognitive assessment.

Significance. If the performance claims are rigorously substantiated, the work would add a concrete demonstration that per-modality learnable curvatures and curvature-aware fusion can exploit the exponential volume growth of hyperbolic geometry for heterogeneous neurophysiological signals. This would be of interest to the intersection of geometric deep learning and multimodal brain-computer interfaces, provided the gains are shown to arise from the geometric inductive bias rather than capacity alone.

major comments (2)

[Abstract] Abstract: the claim that EEG-MoCE 'achieves state-of-the-art performance' is unsupported by any numerical results, baseline tables, statistical tests, or ablation studies. Without these data it is impossible to determine whether the reported improvements are attributable to the learnable-curvature experts or to other modeling choices.
[Method / Experiments] Method and Experiments sections: the central modeling assumption—that independently learnable curvatures per modality plus curvature-aware fusion reliably capture richer hierarchical information—lacks any ablation that isolates these components against (i) a fixed-curvature hyperbolic MoE of equal capacity and (ii) a Euclidean MoE baseline. In the absence of such controls, the SOTA claim cannot be distinguished from a simple increase in model flexibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to make the performance claims more transparent and to provide stronger evidence isolating the contributions of the proposed components. We address each point below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that EEG-MoCE 'achieves state-of-the-art performance' is unsupported by any numerical results, baseline tables, statistical tests, or ablation studies. Without these data it is impossible to determine whether the reported improvements are attributable to the learnable-curvature experts or to other modeling choices.

Authors: We agree that the abstract, as a concise summary, did not include concrete numerical support. The full manuscript contains extensive experimental results, including baseline comparisons, tables, and statistical tests on emotion recognition, sleep staging, and cognitive assessment benchmarks. In the revised version we have updated the abstract to report key quantitative improvements (e.g., accuracy and F1 gains relative to prior SOTA) together with explicit references to the experimental tables and statistical analyses. This makes the SOTA claim directly verifiable from the abstract while preserving its brevity. revision: yes
Referee: [Method / Experiments] Method and Experiments sections: the central modeling assumption—that independently learnable curvatures per modality plus curvature-aware fusion reliably capture richer hierarchical information—lacks any ablation that isolates these components against (i) a fixed-curvature hyperbolic MoE of equal capacity and (ii) a Euclidean MoE baseline. In the absence of such controls, the SOTA claim cannot be distinguished from a simple increase in model flexibility.

Authors: This observation is correct and points to a genuine gap in the original submission. To isolate the effect of learnable per-modality curvatures and curvature-aware fusion from mere capacity increases, we have added two controlled ablation studies in the revised Experiments section: (i) a fixed-curvature hyperbolic mixture-of-experts model with identical expert count and parameter budget, and (ii) a Euclidean mixture-of-experts baseline matched in capacity. The new results show that the learnable-curvature variant consistently outperforms both controls, indicating that the performance gains arise from the geometric inductive bias rather than flexibility alone. Corresponding tables and analysis have been inserted. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces EEG-MoCE as a novel framework that assigns modalities to experts in learnable-curvature hyperbolic spaces and applies curvature-aware fusion, motivated by the exponential growth property of hyperbolic geometry for hierarchical structures in EEG and related modalities. This motivation is drawn from general properties of hyperbolic spaces and cited recent studies on hierarchical structures, without any reduction of the proposed method or its performance claims to fitted parameters by construction, self-referential uniqueness theorems, or ansatz smuggled via self-citation. The central results are empirical SOTA performance on external benchmarks (emotion recognition, sleep staging, cognitive assessment), which are independent of the model definition itself. No load-bearing steps in the abstract or described method equate outputs to inputs via self-definition or statistical forcing. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that hyperbolic geometry is naturally suited to hierarchical EEG structures, plus free parameters for per-expert curvatures and dynamic fusion weights; no new physical entities are postulated.

free parameters (2)

learnable curvatures
One curvature per modality expert, optimized during training to adapt to each data type's geometry.
curvature-aware fusion weights
Dynamically computed weights that emphasize experts based on their learned curvatures.

axioms (1)

domain assumption Hyperbolic spaces are naturally suited for representing hierarchical structures due to their exponential growth property.
Explicitly invoked in the abstract as the geometric motivation for the framework.

pith-pipeline@v0.9.0 · 5509 in / 1282 out tokens · 25811 ms · 2026-05-12T00:51:32.867093+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost Jcost definition and CostAlphaLog echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

hyperbolic spaces, with their exponential growth property, are naturally suited for them... per-modality experts with learnable curvatures... curvature magnitude serves as a learned geometric indicator of hierarchical complexity... τ(m)=τ0/√|K(m)|... λ·ϕ(K(j)) curvature prior
IndisputableMonolith/Foundation/AlexanderDuality alexander_duality_circle_linking and SphereAdmitsCircleLinking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

δ-hyperbolicity... lower δrel indicates stronger hierarchical structure... Lorentz model... expK and logK maps... weighted Fréchet mean

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.