arxiv: 2409.07825 · v4 · pith:CWNRZTEBnew · submitted 2024-09-12 · 💻 cs.CV · cs.AI· cs.LG

Deep Multimodal Learning with Missing Modality: A Survey

Renjie Wu , Hu Wang , Hsiang-Ting Chen , Gustavo Carneiro This is my paper

Pith reviewed 2026-05-17 22:22 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords multimodal learningmissing modalitydeep learningsurveyrobustnessapplicationsdatasetschallenges

0 comments

The pith

Multimodal deep learning models can maintain performance when some input types are missing by using dedicated robustness techniques.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper surveys deep learning approaches designed to handle cases where certain data modalities are absent during training or testing in multimodal systems. It begins by outlining the motivations for such techniques, including sensor limitations and privacy concerns, and clarifies how these setups differ from standard multimodal learning. The survey then analyzes existing methods, their applications across domains, relevant datasets, and concludes with open challenges along with suggested future directions. A sympathetic reader would care because these methods aim to make AI systems more reliable when real-world data collection is incomplete.

Core claim

The central claim is that Multimodal Learning with Missing Modality (MLMM) forms a distinct area from standard multimodal learning, and the survey supplies the first comprehensive review covering motivations, distinctions, current deep learning methods, applications, datasets, challenges, and future research directions.

What carries the argument

The taxonomy and detailed breakdown of methods that specifically address missing modalities to preserve model robustness when one or more data types are unavailable.

Load-bearing premise

The body of literature selected for the survey is sufficiently complete and representative of current work on deep multimodal learning with missing modalities.

What would settle it

A search that identifies several recent or important deep learning papers on missing-modality multimodal learning that were omitted from the survey's analysis would undermine its claim to comprehensiveness.

read the original abstract

During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performance. Multimodal learning techniques designed to handle missing modalities can mitigate this by ensuring model robustness even when some modalities are unavailable. This survey reviews recent progress in Multimodal Learning with Missing Modality (MLMM), focusing on deep learning methods. It provides the first comprehensive survey that covers the motivation and distinctions between MLMM and standard multimodal learning setups, followed by a detailed analysis of current methods, applications, and datasets, concluding with challenges and future directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A useful survey organizing methods for multimodal learning with missing modalities, but its value depends on whether the literature coverage is actually complete.

read the letter

This survey claims to be the first comprehensive look at deep multimodal learning when some modalities are missing. That's the key takeaway: it tries to collect and categorize techniques for a problem that shows up whenever sensors are unreliable or data collection is incomplete. The paper does well at the basics. It explains the motivation clearly, noting how missing modalities hurt performance in training and testing. It also draws a line between this and standard multimodal setups where everything is assumed available. From there it moves into analyzing methods, applications in fields like medical diagnosis, the datasets people use, and some thoughts on challenges ahead. That gives a structured entry point for someone who needs to get up to speed without reading every paper separately. One area to watch is how they chose what to review. Any survey that positions itself as comprehensive depends on a solid search process. If they left out recent work on certain imputation strategies or fusion techniques that handle absence differently, the overview of current methods and the identified future directions could shift. The abstract alone doesn't show the search details or the full reference list, so that part needs checking in the manuscript. This kind of paper is mainly for people already in multimodal machine learning or computer vision who encounter missing data in their applications. A reader looking for a reference to cite when discussing robustness would get something out of the applications and datasets sections. I think it should go to peer review. The subject is practical and the structure is reasonable, so referees can focus on whether the coverage is balanced and suggest any missing pieces.

Referee Report

1 major / 2 minor

Summary. The paper is a survey on deep multimodal learning with missing modalities (MLMM). It distinguishes MLMM from standard multimodal setups, reviews motivations arising from sensor limitations, cost, privacy, and data loss, analyzes deep learning methods for handling missing modalities, surveys applications and datasets, and outlines challenges plus future directions. The authors claim it as the first comprehensive survey focused specifically on this topic.

Significance. If the coverage proves complete and the taxonomy of methods accurate, the survey would provide a useful reference point for researchers working on robust multimodal models. It aggregates practical considerations around incomplete data that arise frequently in deployed vision and multimodal systems, potentially helping to consolidate scattered prior work and highlight open problems.

major comments (1)

[Introduction] Introduction (or dedicated survey methodology subsection): the assertion of being the 'first comprehensive survey' is load-bearing for the paper's motivation and for the synthesized challenges/future directions. The manuscript must explicitly document the search protocol (databases queried, exact keywords and Boolean strings, date range, inclusion/exclusion criteria, and handling of preprints versus peer-reviewed work) so that readers can evaluate completeness and potential systematic omissions.

minor comments (2)

[Methods taxonomy] Ensure every cited work in the methods taxonomy table or section is accompanied by a brief one-sentence characterization of how it addresses missing modalities, to avoid readers needing to consult the original papers for basic distinctions.
[Figures] Figure captions for any overview diagrams should explicitly state the criteria used to group methods (e.g., imputation-based vs. modality-robust vs. generative), as current phrasing leaves some boundary cases ambiguous.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the paper's significance and for the constructive comment on improving the methodological transparency. We will address this point in the revision.

read point-by-point responses

Referee: [Introduction] Introduction (or dedicated survey methodology subsection): the assertion of being the 'first comprehensive survey' is load-bearing for the paper's motivation and for the synthesized challenges/future directions. The manuscript must explicitly document the search protocol (databases queried, exact keywords and Boolean strings, date range, inclusion/exclusion criteria, and handling of preprints versus peer-reviewed work) so that readers can evaluate completeness and potential systematic omissions.

Authors: We agree that explicitly documenting the survey methodology is crucial for validating the comprehensiveness of our review and supporting the synthesized insights. In the revised manuscript, we will add a dedicated subsection in the Introduction that describes the search protocol employed. This will include the databases and repositories queried, the keywords and search strategies utilized, the date range of the literature considered, the inclusion and exclusion criteria, and how preprints were handled relative to peer-reviewed publications. By providing this information, readers will be better positioned to assess the scope and any potential gaps in our survey. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey paper lacking derivations

full rationale

This is a literature survey paper that reviews and synthesizes existing external publications on MLMM without presenting original derivations, equations, fitted parameters, or predictive models. No load-bearing steps reduce by construction to the paper's own inputs, self-citations, or ansatzes. The claim of providing the 'first comprehensive survey' rests on literature selection completeness, which is an external validity and coverage issue rather than a circular reduction per the enumerated patterns. The paper is self-contained as an aggregation of independent prior results and receives a non-finding per guidelines for honest surveys without derivation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work does not introduce new free parameters, axioms, or invented entities; it synthesizes and categorizes existing research on MLMM.

pith-pipeline@v0.9.0 · 5401 in / 1079 out tokens · 36376 ms · 2026-05-17T22:22:32.472970+00:00 · methodology

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS
cs.LG 2026-05 unverdicted novelty 7.0

SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.
Retrieving to Recover: Towards Incomplete Audio-Visual Question Answering via Semantic-consistent Purification
cs.CV 2026-04 unverdicted novelty 7.0

R²ScP recovers missing audio-visual data in question answering by retrieving semantically consistent examples and purifying noise, outperforming generative imputation in incomplete scenarios.
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
cs.CV 2026-01 unverdicted novelty 7.0

DyMo dynamically selects reliable recovered modalities at inference by using task loss as a proxy for task-relevant information, outperforming prior discard-or-impute methods on image datasets.
Resilient Vision-Tabular Multimodal Learning under Modality Missingness
cs.LG 2026-05 unverdicted novelty 6.0

A vision-tabular multimodal transformer uses modality tokens, masked self-attention, and stochastic modality dropout to maintain performance under pervasive missing data on MIMIC-CXR and MIMIC-IV for 14-label diagnost...
LARGO: Low-Rank Hypernetwork for Handling Missing Modalities
cs.CV 2026-05 unverdicted novelty 6.0

LARGO uses a low-rank hypernetwork with CP decomposition to unify 2^N-1 missing-modality models into one, ranking first in 47 of 52 configurations on BraTS and ISLES with small Dice gains over baselines.
Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion
cs.IR 2026-05 unverdicted novelty 6.0

GRE-MC retrieves relevant subgraphs and uses a graph transformer plus sparse codebook to complete missing modalities, outperforming prior methods on recommendation benchmarks.
Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization
cs.CV 2026-04 unverdicted novelty 6.0

RCSR is a personalization-friendly federated framework that improves cross-modal retrieval accuracy and stability under missing modalities via semantic routing and adapters.
Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data
eess.IV 2026-04 unverdicted novelty 6.0

A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.
Conditional Evidence Reconstruction and Decomposition for Interpretable Multimodal Diagnosis
cs.CV 2026-04 unverdicted novelty 6.0

CERD reconstructs missing modalities conditioned on observed inputs and decomposes diagnostic evidence via logit attribution, outperforming baselines on incomplete ADNI data while providing interpretable attributions.
Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher
cs.CV 2026-04 unverdicted novelty 6.0

PTA framework purifies noisy multimodal data via meta-learning and distills cross-modal knowledge through diffusion to create robust single-modality models under missing modalities.
Evaluation Before Generation: A Paradigm for Robust Multimodal Sentiment Analysis with Missing Modalities
cs.CV 2026-04 unverdicted novelty 6.0

The ProMMA framework evaluates missing modalities at input using a dedicated evaluator, then applies modality-invariant prompt disentanglement, mutual-information dynamic weighting, and multi-level residual prompt con...
Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
cs.MM 2026-02 unverdicted novelty 6.0

EC-Net combines Poincare-ball hyperbolic embeddings, hypergraph fusion, and decoupled radial-angular contrastive learning to improve accuracy on multimodal emotion benchmarks especially under partial or noisy modalities.
Fusion or Confusion? Multimodal Complexity Is Not All You Need
cs.LG 2025-12 unverdicted novelty 6.0

Complex multimodal architectures do not reliably outperform unimodal baselines or a simple multimodal baseline under standardized evaluation.
Calibrated Multimodal Representation Learning with Missing Modalities
cs.CV 2025-11 unverdicted novelty 6.0

CalMRL mitigates anchor shift in multimodal representation learning by calibrating incomplete alignments through representation-level imputation of missing modalities using priors and a bi-step optimization with close...
Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking
cs.CL 2026-04 unverdicted novelty 5.0

MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.
Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
cs.CV 2026-04 unverdicted novelty 5.0

Head-wise modality specialization via attention constraints and unimodal knowledge retention in MLLMs improves robustness to missing modalities in fake news detection while preserving full multimodal performance.
ModalImmune: Immunity Driven Unlearning via Self Destructive Training
cs.LG 2026-02 unverdicted novelty 4.0

ModalImmune enforces modality immunity in multimodal models by controlled collapse of input channels during training using adaptive regularizers and meta-optimization.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · cited by 17 Pith papers · 9 internal anchors

[1]

Medical image segmentation on mri images with missing modalities: A review.arXiv preprint arXiv:2203.06217,

Reza Azad, Nika Khosravi, Mohammad Dehghanmanshadi, Julien Cohen-Adad, and Dorit Merhof. Medical image segmentation on mri images with missing modalities: A review.arXiv preprint arXiv:2203.06217,

work page arXiv
[2]

Dealing with the effects of sensor displacement in wearable activity recognition.Sensors, 14(6):9995–10023,

30 Published in Transactions on Machine Learning Research (02/2026) Oresti Banos, Mate Attila Toth, Miguel Damas, Hector Pomares, and Ignacio Rojas. Dealing with the effects of sensor displacement in wearable activity recognition.Sensors, 14(6):9995–10023,

work page 2026
[3]

Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, and Sağnak Taşırlar

DOI: https://doi.org/10.24432/C5C59F. Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, and Sağnak Taşırlar. Introducing our multimodal models,

work page doi:10.24432/c5c59f
[4]

Overcoming missing and incomplete modalities with generative adversarial networks for building footprint segmentation

Benjamin Bischke, Patrick Helber, Florian Koenig, Damian Borth, and Andreas Dengel. Overcoming missing and incomplete modalities with generative adversarial networks for building footprint segmentation. In 2018 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE,

work page 2018
[5]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4.arXiv preprint arXiv:2303.12712,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Whar datasets: An open source library for wearable human activity recognition.arXiv preprint arXiv:2508.16604,

Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, and Tobias Röddiger. Whar datasets: An open source library for wearable human activity recognition.arXiv preprint arXiv:2508.16604,

work page arXiv
[7]

Evalu- ating imputation techniques for missing data in adni: a patient classification study

Sergio Campos, Luis Pizarro, Carlos Valle, Katherine R Gray, Daniel Rueckert, and Héctor Allende. Evalu- ating imputation techniques for missing data in adni: a patient classification study. InProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 20th Iberoamerican Congress, CIARP 2015, Montevideo, Uruguay, November 9-12, 201...

work page 2015
[8]

Guoqing Chao, Shiliang Sun, and Jinbo Bi

URL https://arxiv.org/abs/2407.19156. Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multiview clustering.IEEE Transactions on Artificial Intelligence, 2(2):146–168,

work page arXiv
[9]

Hava Chaptoukaev, Valeriya Strizhkova, Michele Panariello, Bianca Dalpaos, Aglind Reka, Valeria Manera, SusanneThümmler, EsmaIsmailova, MassimilianoTodisco, MariaAZuluaga, etal

doi: 10.1109/TAI.2021.3065894. Hava Chaptoukaev, Valeriya Strizhkova, Michele Panariello, Bianca Dalpaos, Aglind Reka, Valeria Manera, SusanneThümmler, EsmaIsmailova, MassimilianoTodisco, MariaAZuluaga, etal. Stressid: amultimodal dataset for stress identification.Advances in Neural Information Processing Systems, 36,

work page doi:10.1109/tai.2021.3065894 2021
[10]

Multimodal mr syn- thesis via modality-invariant latent representation.IEEE transactions on medical imaging, 37(3):803–814,

31 Published in Transactions on Machine Learning Research (02/2026) Agisilaos Chartsias, Thomas Joyce, Mario Valerio Giuffrida, and Sotirios A Tsaftaris. Multimodal mr syn- thesis via modality-invariant latent representation.IEEE transactions on medical imaging, 37(3):803–814,

work page 2026
[11]

Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion

Cheng Chen, Qi Dou, Yueming Jin, Hao Chen, Jing Qin, and Pheng-Ann Heng. Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, pp. 447–456. Springer,

work page 2019
[12]

A multi-graph convolutional network based wearable human activity recognition method using multi-sensors.Applied Intelligence, 53(23):28169–28185, 2023a

Ling Chen, Yingsong Luo, Liangying Peng, Rong Hu, Yi Zhang, and Shenghuan Miao. A multi-graph convolutional network based wearable human activity recognition method using multi-sensors.Applied Intelligence, 53(23):28169–28185, 2023a. Qianqian Chen, Jiadong Zhang, Runqi Meng, Lei Zhou, Zhenhui Li, Qianjin Feng, and Dinggang Shen. Modality-specific informat...

work page arXiv
[13]

Rescaling egocentric vision: Collec- tion, pipeline and challenges for epic-kitchens-100.International Journal of Computer Vision, pp

32 Published in Transactions on Machine Learning Research (02/2026) Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. Rescaling egocentric vision: Collec- tion, pipeline and challenges for epic-kitchens-100.International Journal of Computer...

work page 2026
[14]

Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6):2405–2418,

Christian Debes, Andreas Merentitis, Roel Heremans, Jürgen Hahn, Nikolaos Frangiadakis, Tim van Kasteren, Wenzhi Liao, Rik Bellens, Aleksandra Pižurica, Sidharta Gautama, et al. Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6):2405–2418,

work page 2013
[15]

Jonas Van Der Donckt

[Accessed 13-05-2024]. Jonas Van Der Donckt. mbrain21,

work page 2024
[16]

Robert Duin

URLhttps://www.kaggle.com/dsv/7745331. Robert Duin. Multiple Features. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5HC70. Aiman Farooq, Deepak Mishra, and Santanu Chaudhury. Survival prediction in lung cancer through multi- modal representation learning. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3...

work page doi:10.24432/c5hc70
[17]

Low to high dimensional modality hallucination using aggregated fields of view.IEEE Robotics and Automation Letters, 5(2):1983–1990,

33 Published in Transactions on Machine Learning Research (02/2026) Kausic Gunasekar, Qiang Qiu, and Yezhou Yang. Low to high dimensional modality hallucination using aggregated fields of view.IEEE Robotics and Automation Letters, 5(2):1983–1990,

work page 2026
[18]

Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada

URLhttps://arxiv.org/abs/2407.05374. Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada. Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),

work page arXiv
[19]

Multi-modal deep learning for multi-temporal urban mapping with a partly missing optical modality

Sebastian Hafner and Yifang Ban. Multi-modal deep learning for multi-temporal urban mapping with a partly missing optical modality. InIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pp. 6843–6846. IEEE,

work page 2023
[20]

Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation

Mohammad Hamghalam, Alejandro F Frangi, Baiying Lei, and Amber L Simpson. Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V...

work page 2021
[21]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Knowledge distillation from multi-modal to mono-modal segmentation networks

Minhao Hu, Matthis Maillard, Ya Zhang, Tommaso Ciceri, Giammarco La Barbera, Isabelle Bloch, and Pietro Gori. Knowledge distillation from multi-modal to mono-modal segmentation networks. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp. 772–78...

work page 2020
[23]

Bcdata: A large-scale dataset and benchmark for cell detection and counting

34 Published in Transactions on Machine Learning Research (02/2026) Zhongyi Huang, Yao Ding, Guoli Song, Lin Wang, Ruizhe Geng, Hongliang He, Shan Du, Xia Liu, Yonghong Tian, Yongsheng Liang, et al. Bcdata: A large-scale dataset and benchmark for cell detection and counting. InInternational Conference on Medical Image Computing and Computer-Assisted Inter...

work page 2026
[24]

Epic-sounds: A large- scale dataset of actions that sound

Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, and Andrew Zisserman. Epic-sounds: A large- scale dataset of actions that sound. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE,

work page 2023
[25]

Towards robust multimodal prompting with miss- ing modalities

Jaehyuk Jang, Yooseung Wang, and Changick Kim. Towards robust multimodal prompting with miss- ing modalities. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8070–8074. IEEE,

work page 2024
[26]

kaggle flir thermal,

35 Published in Transactions on Machine Learning Research (02/2026) kaggle. kaggle flir thermal,

work page 2026
[27]

Otterhd: A high-resolution multi-modality model

Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, and Ziwei Liu. Otterhd: A high-resolution multi-modality model.arXiv preprint arXiv:2311.04219, 2023a. Haitao Li, Ziyu Li, Yiheng Mao, Zhengyao Ding, and Zhengxing Huang. Dc-seg: Disentangled contrastive learning for brain tumor segmentation with missing modalities.arXiv preprint arXiv:2505.119...

work page arXiv 2022
[28]

Deep learning based imaging data completion for improved brain disease diagnosis

Rongjian Li, Wenlu Zhang, Heung-Il Suk, Li Wang, Jiang Li, Dinggang Shen, and Shuiwang Ji. Deep learning based imaging data completion for improved brain disease diagnosis. InMedical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17, pp. 305–312. ...

work page 2014
[29]

Simmlm: A simple framework for multi-modal learning with missing modality.arXiv preprint arXiv:2507.19264, 2025b

Sijie Li, Chen Chen, and Jungong Han. Simmlm: A simple framework for multi-modal learning with missing modality.arXiv preprint arXiv:2507.19264, 2025b. Siting Li, Chenzhuang Du, Yue Zhao, Yu Huang, and Hang Zhao. What makes for robust multi-modal models in the face of missing modalities?arXiv preprint arXiv:2310.06383, 2023c. Xue Li, Guo Zhang, Hao Cui, S...

work page arXiv
[30]

Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, and Louis-Philippe Morency. Learning representations from imperfect time series data via tensor rank regularization.arXiv preprint arXiv:1907.01011,

work page internal anchor Pith review Pith/arXiv arXiv 1907
[31]

Multibench: Multiscalebenchmarksformultimodalrepresentationlearning.Advances in neural information processing systems, 2021(DB1):1,

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A Lee, YukeZhu, etal. Multibench: Multiscalebenchmarksformultimodalrepresentationlearning.Advances in neural information processing systems, 2021(DB1):1,

work page 2021
[32]

Causal representation learning from multimodal clinical records under non-random modality missingness

Zihan Liang, Ziwen Pan, and Ruoxuan Xiong. Causal representation learning from multimodal clinical records under non-random modality missingness. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 28779–28796,

work page 2025
[33]

Sup- pressandrebalance: Towardsgeneralizedmulti-modalfaceanti-spoofing.arXiv preprint arXiv:2402.19298,

37 Published in Transactions on Machine Learning Research (02/2026) Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Zitong Yu, Wenzhong Tang, and Alex Kot. Sup- pressandrebalance: Towardsgeneralizedmulti-modalfaceanti-spoofing.arXiv preprint arXiv:2402.19298,

work page arXiv 2026
[34]

Visual instruction tuning.Advances in neural information processing systems, 36, 2024a

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36, 2024a. Hong Liu, Dong Wei, Donghuan Lu, Jinghan Sun, Liansheng Wang, and Yefeng Zheng. M3ae: multimodal representation learning for brain tumor segmentation with missing modalities. InProceedings of the AAAI Conference ...

work page arXiv
[35]

URLhttps://doi.org/10.1145/3411818

doi: 10.1145/3411818. URLhttps://doi.org/10.1145/3411818. Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, et al. Llava-plus: Learning to use tools for creating multimodal agents.arXiv preprint arXiv:2311.05437, 2023c. YanbeiLiu, Lianxi Fan, Changqing Zhang, Tao Zhou, ZhitaoXiao, Lei Geng, an...

work page doi:10.1145/3411818 2019
[36]

Fedmobile: Enabling knowledge contribution-aware multi-modal federated learning with incomplete modalities

Yi Liu, Cong Wang, and Xingliang Yuan. Fedmobile: Enabling knowledge contribution-aware multi-modal federated learning with incomplete modalities. InProceedings of the ACM on Web Conference 2025, pp. 2775–2786, 2025a. Yuhang Liu, Quan Zou, Ran Su, and Leyi Wei. scmomer: A modality-aware pretraining framework for single-cell multi-omics modeling under miss...

work page 2025
[37]

Mc- dbn: A deep belief network-based model for modality completion.arXiv preprint arXiv:2402.09782,

Zihong Luo, Haochen Xue, Mingyu Jin, Chengzhi Liu, Zile Huang, Chong Zhang, and Shuliang Zhao. Mc- dbn: A deep belief network-based model for modality completion.arXiv preprint arXiv:2402.09782,

work page arXiv
[38]

An efficient approach for audio-visual emotion recognition with missing labels and missing modalities

Fei Ma, Shao-Lun Huang, and Lin Zhang. An efficient approach for audio-visual emotion recognition with missing labels and missing modalities. In2021 IEEE international conference on multimedia and Expo (ICME), pp. 1–6. IEEE, 2021a. Fei Ma, Xiangxiang Xu, Shao-Lun Huang, and Lin Zhang. Maximum likelihood estimation for multimodal learning with missing moda...

work page arXiv
[39]

Dealing with missing modalities in multimodal recommendation: a feature propagation-based approach.arXiv preprint arXiv:2403.19841,

Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Fragkiskos D Malliaros, and Tommaso Di Noia. Dealing with missing modalities in multimodal recommendation: a feature propagation-based approach.arXiv preprint arXiv:2403.19841,

work page arXiv
[40]

Learning to recognize objects from unseen modalities

C Mario Christoudias, Raquel Urtasun, Mathieu Salzmann, and Trevor Darrell. Learning to recognize objects from unseen modalities. InComputer Vision–ECCV 2010, pp. 677–691. Springer,

work page 2010
[41]

The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024,

Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024,

work page 1993
[42]

Learning a text-video embedding from incomplete and hetero- geneous data.arXiv preprint arXiv:1804.02516,

Antoine Miech, Ivan Laptev, and Josef Sivic. Learning a text-video embedding from incomplete and hetero- geneous data.arXiv preprint arXiv:1804.02516,

work page arXiv
[43]

The impact of the mit-bih arrhythmia database.IEEE engineering in medicine and biology magazine, 20(3):45–50,

39 Published in Transactions on Machine Learning Research (02/2026) George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database.IEEE engineering in medicine and biology magazine, 20(3):45–50,

work page 2026
[44]

3d mri brain tumor segmentation using autoencoder regularization

Andriy Myronenko. 3d mri brain tumor segmentation using autoencoder regularization. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4, pp. 311–320. Springer,

work page 2018
[45]

Understanding human daily experience through continuous sensing: Etri lifelog dataset 2024.arXiv preprint arXiv:2508.03698,

Se Won Oh, Hyuntae Jeong, Seungeun Chung, Jeong Mook Lim, Kyoung Ju Noh, Sunkyung Lee, and Gyuwon Jung. Understanding human daily experience through continuous sensing: Etri lifelog dataset 2024.arXiv preprint arXiv:2508.03698,

work page arXiv 2024
[46]

Multi-modal and multi-attribute generation of single cells with cfgen.arXiv preprint arXiv:2407.11734,

Alessandro Palma, Till Richter, Hanyi Zhang, Manuel Lubetzki, Alexander Tong, Andrea Dittadi, and Fabian Theis. Multi-modal and multi-attribute generation of single cells with cfgen.arXiv preprint arXiv:2407.11734,

work page arXiv
[47]

SrinivasParthasarathyandShivaSundaram

URLhttps://arxiv.org/abs/2407.16171. SrinivasParthasarathyandShivaSundaram. Trainingstrategiestohandlemissingmodalitiesforaudio-visual expression recognition. InCompanion Publication of the 2020 International Conference on Multimodal Interaction, pp. 400–404,

work page arXiv 2020
[48]

Fedmm: Federated multi-modal learning with modality hetero- geneity in computational pathology

Yuanzhe Peng, Jieming Bian, and Jie Xu. Fedmm: Federated multi-modal learning with modality hetero- geneity in computational pathology. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1696–1700. IEEE,

work page 2024
[49]

2024 ieee grss data fusion contest - flood rapid mapping.IEEE Dataport, 2023a

Claudio Persello; Saurabh Prasad; Gemine Vivone; Vincent Lonjou ; Frédéric Bretar ; Raquel Rodriguez- Suquet ; Pauline Guntzburger ; Vincent Poulain ; Jacqueline Le Moigne; Benjamin Smith ; Sujay Kumar ; Thomas Huang ; Sophie Ricci ; Thanh Huy Nguyen ; Andrea Piacentini. 2024 ieee grss data fusion contest - flood rapid mapping.IEEE Dataport, 2023a. doi: 1...

work page doi:10.21227/73zj-4303 2024
[50]

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihal- cea. Meld: A multimodal multi-party dataset for emotion recognition in conversations.arXiv preprint arXiv:1810.02508,

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Humanoid locomotion as next token prediction.arXiv preprint arXiv:2402.19469,

Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, and Jitendra Malik. Humanoid locomotion as next token prediction.arXiv preprint arXiv:2402.19469,

work page arXiv
[52]

Combating missing modal- ities in egocentric videos at test time.arXiv preprint arXiv:2404.15161,

Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, and Motasem Alfarra. Combating missing modal- ities in egocentric videos at test time.arXiv preprint arXiv:2404.15161,

work page arXiv
[53]

Md Kaykobad Reza, Ashley Prater-Bennette, and M Salman Asif

DOI: https://doi.org/10.24432/C5NW2H. Md Kaykobad Reza, Ashley Prater-Bennette, and M Salman Asif. Robust multimodal learning with missing modalities via parameter-efficient adaptation.arXiv preprint arXiv:2310.03986,

work page doi:10.24432/c5nw2h
[54]

Daniel Roggen, Alberto Calatroni, Mirco Rossi, Thomas Holleczek, Kilian Förster, Gerhard Tröster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois Ferscha, Jakob Doppler, Clemens Holzmann, Marc Kurz, Gerald Holl, Ricardo Chavarriaga, Hesam Sagha, Hamidreza Bayati, Marco Creatura, and José del R. Millán. Collecting complex activity datasets in highly rich...

work page 2010
[55]

Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, and J Alison Noble

URL https://api.semanticscholar.org/CorpusID:953131. Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, and J Alison Noble. Examining modality incongruity in multimodal federated learning for medical vision and language-based disease detection.arXiv preprint arXiv:2402.05294,

work page arXiv
[56]

Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043,

41 Published in Transactions on Machine Learning Research (02/2026) GerwinSchalk, DennisJMcFarland, ThiloHinterberger, NielsBirbaumer, andJonathanRWolpaw. Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043,

work page 2026
[57]

Introducing wesad, a multimodal dataset for wearable stress and affect detection

Philip Schmidt, Attila Reiss, Robert Duerichen, Claus Marberger, and Kristof Van Laerhoven. Introducing wesad, a multimodal dataset for wearable stress and affect detection. InProceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI ’18, pp. 400–408, New York, NY, USA, 2018a. Association for Computing Machinery. ISBN 9781450356...

work page doi:10.1145/3242969.3242985
[58]

Brain tumor segmentation on mri with missing modalities

Yan Shen and Mingchen Gao. Brain tumor segmentation on mri with missing modalities. InInformation Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26, pp. 417–428. Springer,

work page 2019
[59]

Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga

URLhttps: //arxiv.org/abs/2407.14796. Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146–10176,

work page arXiv
[60]

Contrastive learning-based spectral knowledge distillation for multi-modality and missing modality scenarios in semantic segmentation.arXiv preprint arXiv:2312.02240,

Aniruddh Sikdar, Jayant Teotia, and Suresh Sundaram. Contrastive learning-based spectral knowledge distillation for multi-modality and missing modality scenarios in semantic segmentation.arXiv preprint arXiv:2312.02240,

work page arXiv
[61]

Indoor segmentation and support inference from rgbd images

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. InECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pp. 746–760. Springer,

work page 2012
[62]

The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress

42 Published in Transactions on Machine Learning Research (02/2026) Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. InProceedings of the 2nd on Multimo...

work page 2026
[63]

Multispectral object detection for autonomous vehicles

Karasawa Takumi, Kohei Watanabe, Qishen Ha, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, and Tatsuya Harada. Multispectral object detection for autonomous vehicles. InProceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 35–43,

work page 2017
[64]

NASA Science Editorial Team. Keeping Our Sense of Direction: Dealing With a Dead Sensor - NASA Science — science.nasa.gov.https://science.nasa.gov/missions/mars-2020-perseverance/ ingenuity-helicopter/keeping-our-sense-of-direction-dealing-with-a-dead-sensor/, JUN

work page 2020
[65]

Katarzyna Tomczak, Patrycja Czerwińska, and Maciej Wiznerowicz

DOI: https://doi.org/10.24432/C53W49. Katarzyna Tomczak, Patrycja Czerwińska, and Maciej Wiznerowicz. Review the cancer genome atlas (tcga): an immeasurable source of knowledge.Contemporary Oncology/Współczesna Onkologia, 2015(1):68–77,

work page doi:10.24432/c53w49 2015
[66]

Missing modalities imputation via cascaded residual autoencoder

43 Published in Transactions on Machine Learning Research (02/2026) Luan Tran, Xiaoming Liu, Jiayu Zhou, and Rong Jin. Missing modalities imputation via cascaded residual autoencoder. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1405–1414,

work page 2026
[67]

Learning Factorized Multimodal Representations

Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. Learning factorized multimodal representations.arXiv preprint arXiv:1806.06176,

work page internal anchor Pith review Pith/arXiv arXiv
[68]

How to sense the world: Leveraging hierarchy in multimodal perception for robust reinforcement learning agents.arXiv preprint arXiv:2110.03608,

Miguel Vasco, Hang Yin, Francisco S Melo, and Ana Paiva. How to sense the world: Leveraging hierarchy in multimodal perception for robust reinforcement learning agents.arXiv preprint arXiv:2110.03608,

work page arXiv
[69]

Multi-modal learning with missing modality via shared-specific feature modelling

Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, and Gustavo Carneiro. Multi-modal learning with missing modality via shared-specific feature modelling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15878–15887, 2023a. Hu Wang, Congbo Ma, Jianpeng Zhang, Yuan Zhang, Jodie Avery, Louise Hull, and Gusta...

work page arXiv
[70]

Prototype knowledge dis- tillation for medical segmentation with missing modality

44 Published in Transactions on Machine Learning Research (02/2026) Shuai Wang, Zipei Yan, Daoan Zhang, Haining Wei, Zhongsen Li, and Rui Li. Prototype knowledge dis- tillation for medical segmentation with missing modality. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE, 2023c. Tianyi W...

work page arXiv 2026
[71]

Sslmm: Semi-supervised learning with missing modalities for multimodal sentiment analysis.Information Fusion, 120:103058, 2025b

Yiyu Wang, Haifang Jian, Jian Zhuang, Huimin Guo, and Yan Leng. Sslmm: Semi-supervised learning with missing modalities for multimodal sentiment analysis.Information Fusion, 120:103058, 2025b. Yuanyi Wang, Haifeng Sun, Jiabo Wang, Jingyu Wang, Wei Tang, Qi Qi, Shaoling Sun, and Jianxin Liao. Towardssemanticconsistency: Dirichletenergydrivenrobustmulti-mod...

work page arXiv
[72]

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models.arXiv preprint arXiv:2303.04671, 2023a. Renjie Wu, Hu Wang, Feras Dayoub, and Hsiang-Ting Chen. Segment beyond view: handling partially missing modality for audio-visual semantic segmentation. InProce...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[73]

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. Mm-react: Prompting chatgpt for multimodal reasoning and action.arXiv preprint arXiv:2303.11381, 2023b. Wenfang Yao, Kejing Yin, William K Cheung, Jia Liu, and Jing Qin. Drfuse: Learning disentangled repre- sentation for ...

work page internal anchor Pith review Pith/arXiv arXiv
[74]

2020 ieee grss data fusion contest: Global land cover mapping with weak supervision [technical committees].IEEE Geoscience and Remote Sensing Magazine, 8(1):154–157,

Naoto Yokoya, Pedram Ghamisi, Ronny Haensch, and Michael Schmitt. 2020 ieee grss data fusion contest: Global land cover mapping with weak supervision [technical committees].IEEE Geoscience and Remote Sensing Magazine, 8(1):154–157,

work page 2020
[75]

MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

46 Published in Transactions on Machine Learning Research (02/2026) Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos.arXiv preprint arXiv:1606.06259,

work page internal anchor Pith review Pith/arXiv arXiv 2026
[76]

Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities

Jiandian Zeng, Jiantao Zhou, and Tianyi Liu. Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2924–2934,

work page 2022
[77]

Anygpt: Unified multimodal llm with discrete sequence modeling.arXiv preprint arXiv:2402.12226,

Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, et al. Anygpt: Unified multimodal llm with discrete sequence modeling.arXiv preprint arXiv:2402.12226,

work page arXiv
[78]

Distillingmissingmodalityknowledgefromultrasoundforendometriosisdiagnosiswithmagneticresonance images

Yuan Zhang, Hu Wang, David Butler, Minh-Son To, Jodie Avery, M Louise Hull, and Gustavo Carneiro. Distillingmissingmodalityknowledgefromultrasoundforendometriosisdiagnosiswithmagneticresonance images. In2023 IEEE 20th International Symposium on Biomedical Imaging, 2023b. Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, and S Kevin Zhou. Unified ...

work page arXiv
[79]

Learning modality-agnostic representation for semantic segmen- tation from any modalities,

47 Published in Transactions on Machine Learning Research (02/2026) Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Learning modality-agnostic representation for semantic segmen- tation from any modalities,

work page 2026
[80]

Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li

URLhttps://arxiv.org/abs/2407.11351. Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li. Forecasting fine- grained air quality based on big data. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 2267–2276,

work page arXiv

Showing first 80 references.