MiMIC mitigates visual modality collapse and semantic misalignment in universal multimodal retrieval via fusion-in-decoder architecture and robust single-modality training.
A closer look at multimodal representation collapse
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
IPPg embeds text into images to reduce multimodal model inference costs by 35.8-91% with competitive accuracy on many VQA and code benchmarks.
StateXDiff integrates transcriptomic profiles with inferred protein features via a conditional diffusion model and mechanism-aware drug templates to predict single-cell drug perturbation responses under unseen cell lines, drugs, and combinatorial settings.
ModalImmune enforces modality immunity in multimodal models by controlled collapse of input channels during training using adaptive regularizers and meta-optimization.
citing papers explorer
-
MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment
MiMIC mitigates visual modality collapse and semantic misalignment in universal multimodal retrieval via fusion-in-decoder architecture and robust single-modality training.
-
Token-Efficient Multimodal Reasoning via Image Prompt Packaging
IPPg embeds text into images to reduce multimodal model inference costs by 35.8-91% with competitive accuracy on many VQA and code benchmarks.
-
StateXDiff: Cell State-Contextualized Multimodal Diffusion for Single-Cell Perturbation Prediction
StateXDiff integrates transcriptomic profiles with inferred protein features via a conditional diffusion model and mechanism-aware drug templates to predict single-cell drug perturbation responses under unseen cell lines, drugs, and combinatorial settings.
-
ModalImmune: Immunity Driven Unlearning via Self Destructive Training
ModalImmune enforces modality immunity in multimodal models by controlled collapse of input channels during training using adaptive regularizers and meta-optimization.
- Diverse via bounded Agreement: Geometric Regularization for Multimodal Fusion