M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

Lei, H · 2024 · arXiv 2409.05929

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.

BrainFIBRE: A Foundation Model via Information Decomposition for Brain Microstructure

cs.CV · 2026-07-01 · unverdicted · novelty 5.0

BrainFIBRE presents a foundation model for brain microstructure that applies self-supervised partial information decomposition on NODDI maps to disentangle unique, synergistic, and redundant information and reports state-of-the-art results on multiple prediction tasks.

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

cs.LG · 2026-05-22 · accept · novelty 5.0

A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey cs.LG · 2026-05-22 · accept · none · ref 6
A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

fields

years

verdicts

representative citing papers

citing papers explorer