M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

Hongyang Lei, Xiaolong Cheng, Dan Wang, Kun Fan, Qi Qin, Huazhen Huang, Yetao Wu, Qingqing Gu, Zhonglin Jiang, Yong Chen, et al · 2024 · arXiv 2409.05929

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

cs.LG · 2026-05-22 · accept · novelty 5.0

A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

fields

years

verdicts

representative citing papers

citing papers explorer