pith. machine review for the scientific record. sign in

arxiv: 2409.07825 · v4 · pith:CWNRZTEBnew · submitted 2024-09-12 · 💻 cs.CV · cs.AI· cs.LG

Deep Multimodal Learning with Missing Modality: A Survey

Pith reviewed 2026-05-17 22:22 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords multimodal learningmissing modalitydeep learningsurveyrobustnessapplicationsdatasetschallenges
0
0 comments X

The pith

Multimodal deep learning models can maintain performance when some input types are missing by using dedicated robustness techniques.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper surveys deep learning approaches designed to handle cases where certain data modalities are absent during training or testing in multimodal systems. It begins by outlining the motivations for such techniques, including sensor limitations and privacy concerns, and clarifies how these setups differ from standard multimodal learning. The survey then analyzes existing methods, their applications across domains, relevant datasets, and concludes with open challenges along with suggested future directions. A sympathetic reader would care because these methods aim to make AI systems more reliable when real-world data collection is incomplete.

Core claim

The central claim is that Multimodal Learning with Missing Modality (MLMM) forms a distinct area from standard multimodal learning, and the survey supplies the first comprehensive review covering motivations, distinctions, current deep learning methods, applications, datasets, challenges, and future research directions.

What carries the argument

The taxonomy and detailed breakdown of methods that specifically address missing modalities to preserve model robustness when one or more data types are unavailable.

Load-bearing premise

The body of literature selected for the survey is sufficiently complete and representative of current work on deep multimodal learning with missing modalities.

What would settle it

A search that identifies several recent or important deep learning papers on missing-modality multimodal learning that were omitted from the survey's analysis would undermine its claim to comprehensiveness.

read the original abstract

During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performance. Multimodal learning techniques designed to handle missing modalities can mitigate this by ensuring model robustness even when some modalities are unavailable. This survey reviews recent progress in Multimodal Learning with Missing Modality (MLMM), focusing on deep learning methods. It provides the first comprehensive survey that covers the motivation and distinctions between MLMM and standard multimodal learning setups, followed by a detailed analysis of current methods, applications, and datasets, concluding with challenges and future directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper is a survey on deep multimodal learning with missing modalities (MLMM). It distinguishes MLMM from standard multimodal setups, reviews motivations arising from sensor limitations, cost, privacy, and data loss, analyzes deep learning methods for handling missing modalities, surveys applications and datasets, and outlines challenges plus future directions. The authors claim it as the first comprehensive survey focused specifically on this topic.

Significance. If the coverage proves complete and the taxonomy of methods accurate, the survey would provide a useful reference point for researchers working on robust multimodal models. It aggregates practical considerations around incomplete data that arise frequently in deployed vision and multimodal systems, potentially helping to consolidate scattered prior work and highlight open problems.

major comments (1)
  1. [Introduction] Introduction (or dedicated survey methodology subsection): the assertion of being the 'first comprehensive survey' is load-bearing for the paper's motivation and for the synthesized challenges/future directions. The manuscript must explicitly document the search protocol (databases queried, exact keywords and Boolean strings, date range, inclusion/exclusion criteria, and handling of preprints versus peer-reviewed work) so that readers can evaluate completeness and potential systematic omissions.
minor comments (2)
  1. [Methods taxonomy] Ensure every cited work in the methods taxonomy table or section is accompanied by a brief one-sentence characterization of how it addresses missing modalities, to avoid readers needing to consult the original papers for basic distinctions.
  2. [Figures] Figure captions for any overview diagrams should explicitly state the criteria used to group methods (e.g., imputation-based vs. modality-robust vs. generative), as current phrasing leaves some boundary cases ambiguous.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the paper's significance and for the constructive comment on improving the methodological transparency. We will address this point in the revision.

read point-by-point responses
  1. Referee: [Introduction] Introduction (or dedicated survey methodology subsection): the assertion of being the 'first comprehensive survey' is load-bearing for the paper's motivation and for the synthesized challenges/future directions. The manuscript must explicitly document the search protocol (databases queried, exact keywords and Boolean strings, date range, inclusion/exclusion criteria, and handling of preprints versus peer-reviewed work) so that readers can evaluate completeness and potential systematic omissions.

    Authors: We agree that explicitly documenting the survey methodology is crucial for validating the comprehensiveness of our review and supporting the synthesized insights. In the revised manuscript, we will add a dedicated subsection in the Introduction that describes the search protocol employed. This will include the databases and repositories queried, the keywords and search strategies utilized, the date range of the literature considered, the inclusion and exclusion criteria, and how preprints were handled relative to peer-reviewed publications. By providing this information, readers will be better positioned to assess the scope and any potential gaps in our survey. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey paper lacking derivations

full rationale

This is a literature survey paper that reviews and synthesizes existing external publications on MLMM without presenting original derivations, equations, fitted parameters, or predictive models. No load-bearing steps reduce by construction to the paper's own inputs, self-citations, or ansatzes. The claim of providing the 'first comprehensive survey' rests on literature selection completeness, which is an external validity and coverage issue rather than a circular reduction per the enumerated patterns. The paper is self-contained as an aggregation of independent prior results and receives a non-finding per guidelines for honest surveys without derivation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work does not introduce new free parameters, axioms, or invented entities; it synthesizes and categorizes existing research on MLMM.

pith-pipeline@v0.9.0 · 5401 in / 1079 out tokens · 36376 ms · 2026-05-17T22:22:32.472970+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS

    cs.LG 2026-05 unverdicted novelty 7.0

    SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.

  2. Retrieving to Recover: Towards Incomplete Audio-Visual Question Answering via Semantic-consistent Purification

    cs.CV 2026-04 unverdicted novelty 7.0

    R²ScP recovers missing audio-visual data in question answering by retrieving semantically consistent examples and purifying noise, outperforming generative imputation in incomplete scenarios.

  3. Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification

    cs.CV 2026-01 unverdicted novelty 7.0

    DyMo dynamically selects reliable recovered modalities at inference by using task loss as a proxy for task-relevant information, outperforming prior discard-or-impute methods on image datasets.

  4. Resilient Vision-Tabular Multimodal Learning under Modality Missingness

    cs.LG 2026-05 unverdicted novelty 6.0

    A vision-tabular multimodal transformer uses modality tokens, masked self-attention, and stochastic modality dropout to maintain performance under pervasive missing data on MIMIC-CXR and MIMIC-IV for 14-label diagnost...

  5. LARGO: Low-Rank Hypernetwork for Handling Missing Modalities

    cs.CV 2026-05 unverdicted novelty 6.0

    LARGO uses a low-rank hypernetwork with CP decomposition to unify 2^N-1 missing-modality models into one, ranking first in 47 of 52 configurations on BraTS and ISLES with small Dice gains over baselines.

  6. Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

    cs.IR 2026-05 unverdicted novelty 6.0

    GRE-MC retrieves relevant subgraphs and uses a graph transformer plus sparse codebook to complete missing modalities, outperforming prior methods on recommendation benchmarks.

  7. Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization

    cs.CV 2026-04 unverdicted novelty 6.0

    RCSR is a personalization-friendly federated framework that improves cross-modal retrieval accuracy and stability under missing modalities via semantic routing and adapters.

  8. Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data

    eess.IV 2026-04 unverdicted novelty 6.0

    A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.

  9. Conditional Evidence Reconstruction and Decomposition for Interpretable Multimodal Diagnosis

    cs.CV 2026-04 unverdicted novelty 6.0

    CERD reconstructs missing modalities conditioned on observed inputs and decomposes diagnostic evidence via logit attribution, outperforming baselines on incomplete ADNI data while providing interpretable attributions.

  10. Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher

    cs.CV 2026-04 unverdicted novelty 6.0

    PTA framework purifies noisy multimodal data via meta-learning and distills cross-modal knowledge through diffusion to create robust single-modality models under missing modalities.

  11. Evaluation Before Generation: A Paradigm for Robust Multimodal Sentiment Analysis with Missing Modalities

    cs.CV 2026-04 unverdicted novelty 6.0

    The ProMMA framework evaluates missing modalities at input using a dedicated evaluator, then applies modality-invariant prompt disentanglement, mutual-information dynamic weighting, and multi-level residual prompt con...

  12. Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection

    cs.MM 2026-02 unverdicted novelty 6.0

    EC-Net combines Poincare-ball hyperbolic embeddings, hypergraph fusion, and decoupled radial-angular contrastive learning to improve accuracy on multimodal emotion benchmarks especially under partial or noisy modalities.

  13. Fusion or Confusion? Multimodal Complexity Is Not All You Need

    cs.LG 2025-12 unverdicted novelty 6.0

    Complex multimodal architectures do not reliably outperform unimodal baselines or a simple multimodal baseline under standardized evaluation.

  14. Calibrated Multimodal Representation Learning with Missing Modalities

    cs.CV 2025-11 unverdicted novelty 6.0

    CalMRL mitigates anchor shift in multimodal representation learning by calibrating incomplete alignments through representation-level imputation of missing modalities using priors and a bi-step optimization with close...

  15. Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking

    cs.CL 2026-04 unverdicted novelty 5.0

    MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.

  16. Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality

    cs.CV 2026-04 unverdicted novelty 5.0

    Head-wise modality specialization via attention constraints and unimodal knowledge retention in MLLMs improves robustness to missing modalities in fake news detection while preserving full multimodal performance.

  17. ModalImmune: Immunity Driven Unlearning via Self Destructive Training

    cs.LG 2026-02 unverdicted novelty 4.0

    ModalImmune enforces modality immunity in multimodal models by controlled collapse of input channels during training using adaptive regularizers and meta-optimization.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · cited by 17 Pith papers · 9 internal anchors

  1. [1]

    Medical image segmentation on mri images with missing modalities: A review.arXiv preprint arXiv:2203.06217,

    Reza Azad, Nika Khosravi, Mohammad Dehghanmanshadi, Julien Cohen-Adad, and Dorit Merhof. Medical image segmentation on mri images with missing modalities: A review.arXiv preprint arXiv:2203.06217,

  2. [2]

    Dealing with the effects of sensor displacement in wearable activity recognition.Sensors, 14(6):9995–10023,

    30 Published in Transactions on Machine Learning Research (02/2026) Oresti Banos, Mate Attila Toth, Miguel Damas, Hector Pomares, and Ignacio Rojas. Dealing with the effects of sensor displacement in wearable activity recognition.Sensors, 14(6):9995–10023,

  3. [3]

    Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, and Sağnak Taşırlar

    DOI: https://doi.org/10.24432/C5C59F. Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, and Sağnak Taşırlar. Introducing our multimodal models,

  4. [4]

    Overcoming missing and incomplete modalities with generative adversarial networks for building footprint segmentation

    Benjamin Bischke, Patrick Helber, Florian Koenig, Damian Borth, and Andreas Dengel. Overcoming missing and incomplete modalities with generative adversarial networks for building footprint segmentation. In 2018 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE,

  5. [5]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4.arXiv preprint arXiv:2303.12712,

  6. [6]

    Whar datasets: An open source library for wearable human activity recognition.arXiv preprint arXiv:2508.16604,

    Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, and Tobias Röddiger. Whar datasets: An open source library for wearable human activity recognition.arXiv preprint arXiv:2508.16604,

  7. [7]

    Evalu- ating imputation techniques for missing data in adni: a patient classification study

    Sergio Campos, Luis Pizarro, Carlos Valle, Katherine R Gray, Daniel Rueckert, and Héctor Allende. Evalu- ating imputation techniques for missing data in adni: a patient classification study. InProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 20th Iberoamerican Congress, CIARP 2015, Montevideo, Uruguay, November 9-12, 201...

  8. [8]

    Guoqing Chao, Shiliang Sun, and Jinbo Bi

    URL https://arxiv.org/abs/2407.19156. Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multiview clustering.IEEE Transactions on Artificial Intelligence, 2(2):146–168,

  9. [9]

    Hava Chaptoukaev, Valeriya Strizhkova, Michele Panariello, Bianca Dalpaos, Aglind Reka, Valeria Manera, SusanneThümmler, EsmaIsmailova, MassimilianoTodisco, MariaAZuluaga, etal

    doi: 10.1109/TAI.2021.3065894. Hava Chaptoukaev, Valeriya Strizhkova, Michele Panariello, Bianca Dalpaos, Aglind Reka, Valeria Manera, SusanneThümmler, EsmaIsmailova, MassimilianoTodisco, MariaAZuluaga, etal. Stressid: amultimodal dataset for stress identification.Advances in Neural Information Processing Systems, 36,

  10. [10]

    Multimodal mr syn- thesis via modality-invariant latent representation.IEEE transactions on medical imaging, 37(3):803–814,

    31 Published in Transactions on Machine Learning Research (02/2026) Agisilaos Chartsias, Thomas Joyce, Mario Valerio Giuffrida, and Sotirios A Tsaftaris. Multimodal mr syn- thesis via modality-invariant latent representation.IEEE transactions on medical imaging, 37(3):803–814,

  11. [11]

    Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion

    Cheng Chen, Qi Dou, Yueming Jin, Hao Chen, Jing Qin, and Pheng-Ann Heng. Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, pp. 447–456. Springer,

  12. [12]

    A multi-graph convolutional network based wearable human activity recognition method using multi-sensors.Applied Intelligence, 53(23):28169–28185, 2023a

    Ling Chen, Yingsong Luo, Liangying Peng, Rong Hu, Yi Zhang, and Shenghuan Miao. A multi-graph convolutional network based wearable human activity recognition method using multi-sensors.Applied Intelligence, 53(23):28169–28185, 2023a. Qianqian Chen, Jiadong Zhang, Runqi Meng, Lei Zhou, Zhenhui Li, Qianjin Feng, and Dinggang Shen. Modality-specific informat...

  13. [13]

    Rescaling egocentric vision: Collec- tion, pipeline and challenges for epic-kitchens-100.International Journal of Computer Vision, pp

    32 Published in Transactions on Machine Learning Research (02/2026) Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. Rescaling egocentric vision: Collec- tion, pipeline and challenges for epic-kitchens-100.International Journal of Computer...

  14. [14]

    Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6):2405–2418,

    Christian Debes, Andreas Merentitis, Roel Heremans, Jürgen Hahn, Nikolaos Frangiadakis, Tim van Kasteren, Wenzhi Liao, Rik Bellens, Aleksandra Pižurica, Sidharta Gautama, et al. Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6):2405–2418,

  15. [15]

    Jonas Van Der Donckt

    [Accessed 13-05-2024]. Jonas Van Der Donckt. mbrain21,

  16. [16]

    Robert Duin

    URLhttps://www.kaggle.com/dsv/7745331. Robert Duin. Multiple Features. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5HC70. Aiman Farooq, Deepak Mishra, and Santanu Chaudhury. Survival prediction in lung cancer through multi- modal representation learning. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3...

  17. [17]

    Low to high dimensional modality hallucination using aggregated fields of view.IEEE Robotics and Automation Letters, 5(2):1983–1990,

    33 Published in Transactions on Machine Learning Research (02/2026) Kausic Gunasekar, Qiang Qiu, and Yezhou Yang. Low to high dimensional modality hallucination using aggregated fields of view.IEEE Robotics and Automation Letters, 5(2):1983–1990,

  18. [18]

    Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada

    URLhttps://arxiv.org/abs/2407.05374. Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada. Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),

  19. [19]

    Multi-modal deep learning for multi-temporal urban mapping with a partly missing optical modality

    Sebastian Hafner and Yifang Ban. Multi-modal deep learning for multi-temporal urban mapping with a partly missing optical modality. InIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pp. 6843–6846. IEEE,

  20. [20]

    Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation

    Mohammad Hamghalam, Alejandro F Frangi, Baiying Lei, and Amber L Simpson. Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V...

  21. [21]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

  22. [22]

    Knowledge distillation from multi-modal to mono-modal segmentation networks

    Minhao Hu, Matthis Maillard, Ya Zhang, Tommaso Ciceri, Giammarco La Barbera, Isabelle Bloch, and Pietro Gori. Knowledge distillation from multi-modal to mono-modal segmentation networks. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp. 772–78...

  23. [23]

    Bcdata: A large-scale dataset and benchmark for cell detection and counting

    34 Published in Transactions on Machine Learning Research (02/2026) Zhongyi Huang, Yao Ding, Guoli Song, Lin Wang, Ruizhe Geng, Hongliang He, Shan Du, Xia Liu, Yonghong Tian, Yongsheng Liang, et al. Bcdata: A large-scale dataset and benchmark for cell detection and counting. InInternational Conference on Medical Image Computing and Computer-Assisted Inter...

  24. [24]

    Epic-sounds: A large- scale dataset of actions that sound

    Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, and Andrew Zisserman. Epic-sounds: A large- scale dataset of actions that sound. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE,

  25. [25]

    Towards robust multimodal prompting with miss- ing modalities

    Jaehyuk Jang, Yooseung Wang, and Changick Kim. Towards robust multimodal prompting with miss- ing modalities. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8070–8074. IEEE,

  26. [26]

    kaggle flir thermal,

    35 Published in Transactions on Machine Learning Research (02/2026) kaggle. kaggle flir thermal,

  27. [27]

    Otterhd: A high-resolution multi-modality model

    Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, and Ziwei Liu. Otterhd: A high-resolution multi-modality model.arXiv preprint arXiv:2311.04219, 2023a. Haitao Li, Ziyu Li, Yiheng Mao, Zhengyao Ding, and Zhengxing Huang. Dc-seg: Disentangled contrastive learning for brain tumor segmentation with missing modalities.arXiv preprint arXiv:2505.119...

  28. [28]

    Deep learning based imaging data completion for improved brain disease diagnosis

    Rongjian Li, Wenlu Zhang, Heung-Il Suk, Li Wang, Jiang Li, Dinggang Shen, and Shuiwang Ji. Deep learning based imaging data completion for improved brain disease diagnosis. InMedical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17, pp. 305–312. ...

  29. [29]

    Simmlm: A simple framework for multi-modal learning with missing modality.arXiv preprint arXiv:2507.19264, 2025b

    Sijie Li, Chen Chen, and Jungong Han. Simmlm: A simple framework for multi-modal learning with missing modality.arXiv preprint arXiv:2507.19264, 2025b. Siting Li, Chenzhuang Du, Yue Zhao, Yu Huang, and Hang Zhao. What makes for robust multi-modal models in the face of missing modalities?arXiv preprint arXiv:2310.06383, 2023c. Xue Li, Guo Zhang, Hao Cui, S...

  30. [30]

    Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

    Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, and Louis-Philippe Morency. Learning representations from imperfect time series data via tensor rank regularization.arXiv preprint arXiv:1907.01011,

  31. [31]

    Multibench: Multiscalebenchmarksformultimodalrepresentationlearning.Advances in neural information processing systems, 2021(DB1):1,

    Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A Lee, YukeZhu, etal. Multibench: Multiscalebenchmarksformultimodalrepresentationlearning.Advances in neural information processing systems, 2021(DB1):1,

  32. [32]

    Causal representation learning from multimodal clinical records under non-random modality missingness

    Zihan Liang, Ziwen Pan, and Ruoxuan Xiong. Causal representation learning from multimodal clinical records under non-random modality missingness. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 28779–28796,

  33. [33]

    Sup- pressandrebalance: Towardsgeneralizedmulti-modalfaceanti-spoofing.arXiv preprint arXiv:2402.19298,

    37 Published in Transactions on Machine Learning Research (02/2026) Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Zitong Yu, Wenzhong Tang, and Alex Kot. Sup- pressandrebalance: Towardsgeneralizedmulti-modalfaceanti-spoofing.arXiv preprint arXiv:2402.19298,

  34. [34]

    Visual instruction tuning.Advances in neural information processing systems, 36, 2024a

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36, 2024a. Hong Liu, Dong Wei, Donghuan Lu, Jinghan Sun, Liansheng Wang, and Yefeng Zheng. M3ae: multimodal representation learning for brain tumor segmentation with missing modalities. InProceedings of the AAAI Conference ...

  35. [35]

    URLhttps://doi.org/10.1145/3411818

    doi: 10.1145/3411818. URLhttps://doi.org/10.1145/3411818. Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, et al. Llava-plus: Learning to use tools for creating multimodal agents.arXiv preprint arXiv:2311.05437, 2023c. YanbeiLiu, Lianxi Fan, Changqing Zhang, Tao Zhou, ZhitaoXiao, Lei Geng, an...

  36. [36]

    Fedmobile: Enabling knowledge contribution-aware multi-modal federated learning with incomplete modalities

    Yi Liu, Cong Wang, and Xingliang Yuan. Fedmobile: Enabling knowledge contribution-aware multi-modal federated learning with incomplete modalities. InProceedings of the ACM on Web Conference 2025, pp. 2775–2786, 2025a. Yuhang Liu, Quan Zou, Ran Su, and Leyi Wei. scmomer: A modality-aware pretraining framework for single-cell multi-omics modeling under miss...

  37. [37]

    Mc- dbn: A deep belief network-based model for modality completion.arXiv preprint arXiv:2402.09782,

    Zihong Luo, Haochen Xue, Mingyu Jin, Chengzhi Liu, Zile Huang, Chong Zhang, and Shuliang Zhao. Mc- dbn: A deep belief network-based model for modality completion.arXiv preprint arXiv:2402.09782,

  38. [38]

    An efficient approach for audio-visual emotion recognition with missing labels and missing modalities

    Fei Ma, Shao-Lun Huang, and Lin Zhang. An efficient approach for audio-visual emotion recognition with missing labels and missing modalities. In2021 IEEE international conference on multimedia and Expo (ICME), pp. 1–6. IEEE, 2021a. Fei Ma, Xiangxiang Xu, Shao-Lun Huang, and Lin Zhang. Maximum likelihood estimation for multimodal learning with missing moda...

  39. [39]

    Dealing with missing modalities in multimodal recommendation: a feature propagation-based approach.arXiv preprint arXiv:2403.19841,

    Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Fragkiskos D Malliaros, and Tommaso Di Noia. Dealing with missing modalities in multimodal recommendation: a feature propagation-based approach.arXiv preprint arXiv:2403.19841,

  40. [40]

    Learning to recognize objects from unseen modalities

    C Mario Christoudias, Raquel Urtasun, Mathieu Salzmann, and Trevor Darrell. Learning to recognize objects from unseen modalities. InComputer Vision–ECCV 2010, pp. 677–691. Springer,

  41. [41]

    The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024,

    Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024,

  42. [42]

    Learning a text-video embedding from incomplete and hetero- geneous data.arXiv preprint arXiv:1804.02516,

    Antoine Miech, Ivan Laptev, and Josef Sivic. Learning a text-video embedding from incomplete and hetero- geneous data.arXiv preprint arXiv:1804.02516,

  43. [43]

    The impact of the mit-bih arrhythmia database.IEEE engineering in medicine and biology magazine, 20(3):45–50,

    39 Published in Transactions on Machine Learning Research (02/2026) George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database.IEEE engineering in medicine and biology magazine, 20(3):45–50,

  44. [44]

    3d mri brain tumor segmentation using autoencoder regularization

    Andriy Myronenko. 3d mri brain tumor segmentation using autoencoder regularization. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4, pp. 311–320. Springer,

  45. [45]

    Understanding human daily experience through continuous sensing: Etri lifelog dataset 2024.arXiv preprint arXiv:2508.03698,

    Se Won Oh, Hyuntae Jeong, Seungeun Chung, Jeong Mook Lim, Kyoung Ju Noh, Sunkyung Lee, and Gyuwon Jung. Understanding human daily experience through continuous sensing: Etri lifelog dataset 2024.arXiv preprint arXiv:2508.03698,

  46. [46]

    Multi-modal and multi-attribute generation of single cells with cfgen.arXiv preprint arXiv:2407.11734,

    Alessandro Palma, Till Richter, Hanyi Zhang, Manuel Lubetzki, Alexander Tong, Andrea Dittadi, and Fabian Theis. Multi-modal and multi-attribute generation of single cells with cfgen.arXiv preprint arXiv:2407.11734,

  47. [47]

    SrinivasParthasarathyandShivaSundaram

    URLhttps://arxiv.org/abs/2407.16171. SrinivasParthasarathyandShivaSundaram. Trainingstrategiestohandlemissingmodalitiesforaudio-visual expression recognition. InCompanion Publication of the 2020 International Conference on Multimodal Interaction, pp. 400–404,

  48. [48]

    Fedmm: Federated multi-modal learning with modality hetero- geneity in computational pathology

    Yuanzhe Peng, Jieming Bian, and Jie Xu. Fedmm: Federated multi-modal learning with modality hetero- geneity in computational pathology. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1696–1700. IEEE,

  49. [49]

    2024 ieee grss data fusion contest - flood rapid mapping.IEEE Dataport, 2023a

    Claudio Persello; Saurabh Prasad; Gemine Vivone; Vincent Lonjou ; Frédéric Bretar ; Raquel Rodriguez- Suquet ; Pauline Guntzburger ; Vincent Poulain ; Jacqueline Le Moigne; Benjamin Smith ; Sujay Kumar ; Thomas Huang ; Sophie Ricci ; Thanh Huy Nguyen ; Andrea Piacentini. 2024 ieee grss data fusion contest - flood rapid mapping.IEEE Dataport, 2023a. doi: 1...

  50. [50]

    MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

    Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihal- cea. Meld: A multimodal multi-party dataset for emotion recognition in conversations.arXiv preprint arXiv:1810.02508,

  51. [51]

    Humanoid locomotion as next token prediction.arXiv preprint arXiv:2402.19469,

    Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, and Jitendra Malik. Humanoid locomotion as next token prediction.arXiv preprint arXiv:2402.19469,

  52. [52]

    Combating missing modal- ities in egocentric videos at test time.arXiv preprint arXiv:2404.15161,

    Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, and Motasem Alfarra. Combating missing modal- ities in egocentric videos at test time.arXiv preprint arXiv:2404.15161,

  53. [53]

    Md Kaykobad Reza, Ashley Prater-Bennette, and M Salman Asif

    DOI: https://doi.org/10.24432/C5NW2H. Md Kaykobad Reza, Ashley Prater-Bennette, and M Salman Asif. Robust multimodal learning with missing modalities via parameter-efficient adaptation.arXiv preprint arXiv:2310.03986,

  54. [54]

    Daniel Roggen, Alberto Calatroni, Mirco Rossi, Thomas Holleczek, Kilian Förster, Gerhard Tröster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois Ferscha, Jakob Doppler, Clemens Holzmann, Marc Kurz, Gerald Holl, Ricardo Chavarriaga, Hesam Sagha, Hamidreza Bayati, Marco Creatura, and José del R. Millán. Collecting complex activity datasets in highly rich...

  55. [55]

    Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, and J Alison Noble

    URL https://api.semanticscholar.org/CorpusID:953131. Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, and J Alison Noble. Examining modality incongruity in multimodal federated learning for medical vision and language-based disease detection.arXiv preprint arXiv:2402.05294,

  56. [56]

    Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043,

    41 Published in Transactions on Machine Learning Research (02/2026) GerwinSchalk, DennisJMcFarland, ThiloHinterberger, NielsBirbaumer, andJonathanRWolpaw. Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043,

  57. [57]

    Introducing wesad, a multimodal dataset for wearable stress and affect detection

    Philip Schmidt, Attila Reiss, Robert Duerichen, Claus Marberger, and Kristof Van Laerhoven. Introducing wesad, a multimodal dataset for wearable stress and affect detection. InProceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI ’18, pp. 400–408, New York, NY, USA, 2018a. Association for Computing Machinery. ISBN 9781450356...

  58. [58]

    Brain tumor segmentation on mri with missing modalities

    Yan Shen and Mingchen Gao. Brain tumor segmentation on mri with missing modalities. InInformation Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26, pp. 417–428. Springer,

  59. [59]

    Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga

    URLhttps: //arxiv.org/abs/2407.14796. Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146–10176,

  60. [60]

    Contrastive learning-based spectral knowledge distillation for multi-modality and missing modality scenarios in semantic segmentation.arXiv preprint arXiv:2312.02240,

    Aniruddh Sikdar, Jayant Teotia, and Suresh Sundaram. Contrastive learning-based spectral knowledge distillation for multi-modality and missing modality scenarios in semantic segmentation.arXiv preprint arXiv:2312.02240,

  61. [61]

    Indoor segmentation and support inference from rgbd images

    Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. InECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pp. 746–760. Springer,

  62. [62]

    The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress

    42 Published in Transactions on Machine Learning Research (02/2026) Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. InProceedings of the 2nd on Multimo...

  63. [63]

    Multispectral object detection for autonomous vehicles

    Karasawa Takumi, Kohei Watanabe, Qishen Ha, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, and Tatsuya Harada. Multispectral object detection for autonomous vehicles. InProceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 35–43,

  64. [64]

    NASA Science Editorial Team. Keeping Our Sense of Direction: Dealing With a Dead Sensor - NASA Science — science.nasa.gov.https://science.nasa.gov/missions/mars-2020-perseverance/ ingenuity-helicopter/keeping-our-sense-of-direction-dealing-with-a-dead-sensor/, JUN

  65. [65]

    Katarzyna Tomczak, Patrycja Czerwińska, and Maciej Wiznerowicz

    DOI: https://doi.org/10.24432/C53W49. Katarzyna Tomczak, Patrycja Czerwińska, and Maciej Wiznerowicz. Review the cancer genome atlas (tcga): an immeasurable source of knowledge.Contemporary Oncology/Współczesna Onkologia, 2015(1):68–77,

  66. [66]

    Missing modalities imputation via cascaded residual autoencoder

    43 Published in Transactions on Machine Learning Research (02/2026) Luan Tran, Xiaoming Liu, Jiayu Zhou, and Rong Jin. Missing modalities imputation via cascaded residual autoencoder. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1405–1414,

  67. [67]

    Learning Factorized Multimodal Representations

    Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. Learning factorized multimodal representations.arXiv preprint arXiv:1806.06176,

  68. [68]

    How to sense the world: Leveraging hierarchy in multimodal perception for robust reinforcement learning agents.arXiv preprint arXiv:2110.03608,

    Miguel Vasco, Hang Yin, Francisco S Melo, and Ana Paiva. How to sense the world: Leveraging hierarchy in multimodal perception for robust reinforcement learning agents.arXiv preprint arXiv:2110.03608,

  69. [69]

    Multi-modal learning with missing modality via shared-specific feature modelling

    Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, and Gustavo Carneiro. Multi-modal learning with missing modality via shared-specific feature modelling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15878–15887, 2023a. Hu Wang, Congbo Ma, Jianpeng Zhang, Yuan Zhang, Jodie Avery, Louise Hull, and Gusta...

  70. [70]

    Prototype knowledge dis- tillation for medical segmentation with missing modality

    44 Published in Transactions on Machine Learning Research (02/2026) Shuai Wang, Zipei Yan, Daoan Zhang, Haining Wei, Zhongsen Li, and Rui Li. Prototype knowledge dis- tillation for medical segmentation with missing modality. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE, 2023c. Tianyi W...

  71. [71]

    Sslmm: Semi-supervised learning with missing modalities for multimodal sentiment analysis.Information Fusion, 120:103058, 2025b

    Yiyu Wang, Haifang Jian, Jian Zhuang, Huimin Guo, and Yan Leng. Sslmm: Semi-supervised learning with missing modalities for multimodal sentiment analysis.Information Fusion, 120:103058, 2025b. Yuanyi Wang, Haifeng Sun, Jiabo Wang, Jingyu Wang, Wei Tang, Qi Qi, Shaoling Sun, and Jianxin Liao. Towardssemanticconsistency: Dirichletenergydrivenrobustmulti-mod...

  72. [72]

    Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

    Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models.arXiv preprint arXiv:2303.04671, 2023a. Renjie Wu, Hu Wang, Feras Dayoub, and Hsiang-Ting Chen. Segment beyond view: handling partially missing modality for audio-visual semantic segmentation. InProce...

  73. [73]

    MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

    Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. Mm-react: Prompting chatgpt for multimodal reasoning and action.arXiv preprint arXiv:2303.11381, 2023b. Wenfang Yao, Kejing Yin, William K Cheung, Jia Liu, and Jing Qin. Drfuse: Learning disentangled repre- sentation for ...

  74. [74]

    2020 ieee grss data fusion contest: Global land cover mapping with weak supervision [technical committees].IEEE Geoscience and Remote Sensing Magazine, 8(1):154–157,

    Naoto Yokoya, Pedram Ghamisi, Ronny Haensch, and Michael Schmitt. 2020 ieee grss data fusion contest: Global land cover mapping with weak supervision [technical committees].IEEE Geoscience and Remote Sensing Magazine, 8(1):154–157,

  75. [75]

    MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

    46 Published in Transactions on Machine Learning Research (02/2026) Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos.arXiv preprint arXiv:1606.06259,

  76. [76]

    Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities

    Jiandian Zeng, Jiantao Zhou, and Tianyi Liu. Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2924–2934,

  77. [77]

    Anygpt: Unified multimodal llm with discrete sequence modeling.arXiv preprint arXiv:2402.12226,

    Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, et al. Anygpt: Unified multimodal llm with discrete sequence modeling.arXiv preprint arXiv:2402.12226,

  78. [78]

    Distillingmissingmodalityknowledgefromultrasoundforendometriosisdiagnosiswithmagneticresonance images

    Yuan Zhang, Hu Wang, David Butler, Minh-Son To, Jodie Avery, M Louise Hull, and Gustavo Carneiro. Distillingmissingmodalityknowledgefromultrasoundforendometriosisdiagnosiswithmagneticresonance images. In2023 IEEE 20th International Symposium on Biomedical Imaging, 2023b. Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, and S Kevin Zhou. Unified ...

  79. [79]

    Learning modality-agnostic representation for semantic segmen- tation from any modalities,

    47 Published in Transactions on Machine Learning Research (02/2026) Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Learning modality-agnostic representation for semantic segmen- tation from any modalities,

  80. [80]

    Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li

    URLhttps://arxiv.org/abs/2407.11351. Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li. Forecasting fine- grained air quality based on big data. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 2267–2276,

Showing first 80 references.