M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

· 2024 · arXiv 2409.05929

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation

cs.RO · 2026-06-09 · unverdicted · novelty 6.0

SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.

BrainFIBRE: A Foundation Model via Information Decomposition for Brain Microstructure

cs.CV · 2026-07-01 · unverdicted · novelty 5.0

BrainFIBRE presents a foundation model for brain microstructure that applies self-supervised partial information decomposition on NODDI maps to disentangle unique, synergistic, and redundant information and reports state-of-the-art results on multiple prediction tasks.

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

cs.LG · 2026-05-22 · accept · novelty 5.0

A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.

citing papers explorer

Showing 3 of 3 citing papers after filters.

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation cs.RO · 2026-06-09 · unverdicted · none · ref 51
SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring cs.CV · 2026-05-06 · unverdicted · none · ref 27
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
BrainFIBRE: A Foundation Model via Information Decomposition for Brain Microstructure cs.CV · 2026-07-01 · unverdicted · none · ref 22
BrainFIBRE presents a foundation model for brain microstructure that applies self-supervised partial information decomposition on NODDI maps to disentangle unique, synergistic, and redundant information and reports state-of-the-art results on multiple prediction tasks.

M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture

fields

years

verdicts

representative citing papers

citing papers explorer