arxiv: 2604.20283 · v1 · submitted 2026-04-22 · 💻 cs.CL

Recognition: unknown

Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking

Helen Paik, Jianwei Wang, Kai Wang, Mo Zhou, Wenjie Zhang, Ying Zhang

Pith reviewed 2026-05-10 00:30 UTC · model grok-4.3

classification 💻 cs.CL

keywords Multimodal Entity LinkingUnsupervised LearningLarge Language ModelsEvidence SynthesisGraph Neural NetworksMulti-perspective ReasoningEntity Disambiguation

0 comments

The pith

A two-stage framework synthesizes multi-perspective evidence offline and uses LLMs to reason over it for better unsupervised multimodal entity linking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to advance multimodal entity linking by expanding beyond instance-level features to include group-level neighborhood data, lexical overlaps, and statistical summaries. It builds these into a unified evidence set through contextualized graphs and modality alignment, then applies an LLM to interpret their correlations and semantics for ranking candidate entities. This matters because most prior unsupervised methods leave interdependencies among evidence types unexplored, while human judgment naturally draws on multiple angles for disambiguation. If the approach holds, it would allow accurate linking in settings where labeled training data is unavailable or costly to obtain.

Core claim

MSR-MEL constructs offline a comprehensive evidence collection that covers instance-centric multimodal details of mentions and entities, group-level neighborhood information aggregated via LLM-enhanced contextualized graphs followed by asymmetric teacher-student graph neural network alignment, plus lexical string-overlap ratios and basic statistical summaries; online, it treats the LLM as a reasoning engine that examines correlations and semantics across these perspectives to derive an effective unsupervised ranking strategy, and experiments on standard MEL benchmarks show this consistently exceeds prior unsupervised baselines.

What carries the argument

The offline multi-perspective evidence synthesis module, especially its group-level component that builds LLM-enhanced contextualized graphs and aligns modalities with an asymmetric teacher-student graph neural network, paired with the online LLM reasoning step that induces a ranking strategy from evidence correlations.

If this is right

The method achieves higher accuracy than existing unsupervised approaches on widely used MEL benchmarks.
Group-level neighborhood evidence captured through graphs supplies context that single-instance features alone cannot provide.
The two-stage design separates evidence construction from reasoning, allowing the LLM to operate without task-specific supervision.
Lexical and statistical evidence types complement multimodal signals to reduce ambiguity in entity mentions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same evidence-synthesis pattern could extend to other knowledge-base tasks such as multimodal relation extraction where neighborhood context matters.
Replacing the LLM reasoning component with a lighter non-LLM module might test whether the performance gain truly requires large-model semantic analysis.
Scaling the graph construction to very large knowledge bases would likely surface computational limits of the current offline stage.

Load-bearing premise

That an LLM can reliably analyze correlations and semantics across the synthesized multi-perspective evidence to produce an accurate unsupervised ranking strategy.

What would settle it

A controlled experiment on a standard MEL benchmark in which MSR-MEL fails to rank the correct entity higher than strong unsupervised baselines after the LLM reasoning stage, or in which ablating the group-level graph evidence produces equivalent or superior results, would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.20283 by Helen Paik, Jianwei Wang, Kai Wang, Mo Zhou, Wenjie Zhang, Ying Zhang.

**Figure 2.** Figure 2: Overview of the proposed MSR-MEL framework. MSR-MEL adopts a two-stage design. Offline multi-perspective [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: An example of the LLM-driven online reason [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Efficiency comparison on MEL benchmarks. (a) [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Parameter sensitivity analysis of MSR-MEL on [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Case study of MSR-MEL on a hard example with [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Multimodal Entity Linking (MEL) is a fundamental task in data management that maps ambiguous mentions with diverse modalities to the multimodal entities in a knowledge base. However, most existing MEL approaches primarily focus on optimizing instance-centric features and evidence, leaving broader forms of evidence and their intricate interdependencies insufficiently explored. Motivated by the observation that human expert decision-making process relies on multi-perspective judgment, in this work, we propose MSR-MEL, a Multi-perspective Evidence Synthesis and Reasoning framework with Large Language Models (LLMs) for unsupervised MEL. Specifically, we adopt a two-stage framework: (1) Offline Multi-Perspective Evidence Synthesis constructs a comprehensive set of evidence. This includes instance-centric evidence capturing the instance-centric multimodal information of mentions and entities, group-level evidence that aggregates neighborhood information, lexical evidence based on string overlap ratio, and statistical evidence based on simple summary statistics. A core contribution of our framework is the synthesis of group-level evidence, which effectively aggregates vital neighborhood information by graph. We first construct LLM-enhanced contextualized graphs. Subsequently, different modalities are jointly aligned through an asymmetric teacher-student graph neural network. (2) Online Multi-Perspective Evidence Reasoning leverages the power of LLM as a reasoning module to analyze the correlation and semantics of the multi-perspective evidence to induce an effective ranking strategy for accurate entity linking without supervision. Extensive experiments on widely used MEL benchmarks demonstrate that MSR-MEL consistently outperforms state-of-the-art unsupervised methods. The source code of this paper was available at: https://anonymous.4open.science/r/MSR-MEL-C21E/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a two-stage unsupervised MEL framework that synthesizes instance, group-level graph, lexical, and statistical evidence then uses an LLM to induce rankings, reporting consistent benchmark gains over prior unsupervised methods.

read the letter

The core of this work is a practical two-stage pipeline for unsupervised multimodal entity linking. Offline it assembles four evidence types, with the group-level piece built from LLM-enhanced contextualized graphs and aligned via asymmetric teacher-student GNNs. Online an LLM then examines correlations across all the evidence to produce a ranking without task-specific labels. The experiments claim steady outperformance on standard MEL benchmarks compared with existing unsupervised baselines.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes MSR-MEL, a two-stage Multi-perspective Evidence Synthesis and Reasoning framework for unsupervised Multimodal Entity Linking (MEL). The offline stage constructs instance-centric multimodal evidence, group-level evidence via LLM-enhanced contextualized graphs and an asymmetric teacher-student GNN for neighborhood aggregation and modality alignment, plus lexical (string overlap) and statistical evidence. The online stage uses an LLM as a reasoning module to analyze correlations and semantics across the synthesized evidence and induce a ranking strategy without task supervision. Extensive experiments on standard MEL benchmarks are reported to show consistent outperformance over state-of-the-art unsupervised methods.

Significance. If the results hold, the work could meaningfully advance unsupervised MEL by moving beyond purely instance-centric features to integrate broader group-level graph evidence and LLM-driven multi-perspective reasoning. The synthesis of asymmetric teacher-student GNNs with LLM reasoning offers a concrete way to leverage neighborhood information without supervision, which may influence future evidence-aggregation approaches in entity linking and related multimodal tasks.

major comments (2)

[Online Multi-Perspective Evidence Reasoning] Online Multi-Perspective Evidence Reasoning: The central claim that the LLM reliably extracts semantics and interdependencies from the multi-perspective evidence to induce an effective unsupervised ranking strategy is load-bearing for the outperformance result, yet the manuscript provides no details on prompt structure, temperature, output parsing, or consistency checks. Without these, it remains unclear whether reported gains stem from robust reasoning or from LLM-specific priors.
[Offline Multi-Perspective Evidence Synthesis] Offline Multi-Perspective Evidence Synthesis, group-level evidence paragraph: The asymmetric teacher-student GNN is described at a high level as jointly aligning modalities after constructing LLM-enhanced contextualized graphs, but no architecture details, loss formulation, or ablation isolating its contribution to the final ranking are supplied. This component is presented as a core contribution, so its technical soundness directly affects the framework's novelty.

minor comments (1)

The abstract states that source code is available at an anonymous link; the camera-ready version should replace this with a permanent, non-anonymous repository to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We appreciate the recognition of the potential significance of our multi-perspective approach for unsupervised multimodal entity linking. Below, we provide point-by-point responses to the major comments and outline the revisions we will make to address them.

read point-by-point responses

Referee: [Online Multi-Perspective Evidence Reasoning] Online Multi-Perspective Evidence Reasoning: The central claim that the LLM reliably extracts semantics and interdependencies from the multi-perspective evidence to induce an effective unsupervised ranking strategy is load-bearing for the outperformance result, yet the manuscript provides no details on prompt structure, temperature, output parsing, or consistency checks. Without these, it remains unclear whether reported gains stem from robust reasoning or from LLM-specific priors.

Authors: We agree with the referee that providing implementation details for the LLM-based reasoning module is essential for reproducibility and to substantiate our claims. The original manuscript focused on the high-level framework, but we will revise the 'Online Multi-Perspective Evidence Reasoning' section to include comprehensive details: the complete prompt templates (including examples of input evidence formatting), the temperature setting of 0.2 for balanced creativity and consistency, the parsing procedure (converting LLM output to structured rankings via JSON mode if available or post-processing), and consistency verification through repeated inferences with seed variation. These additions will demonstrate that the ranking strategy is derived systematically from the evidence rather than relying on LLM priors. We will also release the exact prompts in the code repository. revision: yes
Referee: [Offline Multi-Perspective Evidence Synthesis] Offline Multi-Perspective Evidence Synthesis, group-level evidence paragraph: The asymmetric teacher-student GNN is described at a high level as jointly aligning modalities after constructing LLM-enhanced contextualized graphs, but no architecture details, loss formulation, or ablation isolating its contribution to the final ranking are supplied. This component is presented as a core contribution, so its technical soundness directly affects the framework's novelty.

Authors: We acknowledge that the description of the asymmetric teacher-student GNN in the offline stage is at a high level and lacks the requested technical specifics. In the revised manuscript, we will expand this section with: (1) detailed architecture, including the number of layers, hidden dimensions, and how the teacher (pre-trained on larger graph) transfers knowledge to the student via distillation; (2) the loss function, which combines a modality alignment loss (e.g., contrastive loss between modalities) and a neighborhood aggregation loss; (3) an ablation study isolating the GNN's contribution by reporting performance metrics with and without the asymmetric alignment component. These details will be added to the main paper, with additional hyperparameters in the appendix. We believe this will strengthen the presentation of this core contribution. revision: yes

Circularity Check

0 steps flagged

No circularity in MSR-MEL derivation chain

full rationale

The paper describes a two-stage unsupervised framework: offline synthesis of instance-centric, group-level (via LLM-enhanced graphs and asymmetric teacher-student GNN), lexical, and statistical evidence, followed by online LLM reasoning to induce a ranking strategy from correlations in that evidence. No equations, parameters, or steps are shown to reduce by construction to their own inputs; the ranking is not a fitted quantity renamed as prediction, nor is any uniqueness theorem or ansatz imported via self-citation. The central claim of outperformance rests on empirical results against external benchmarks using pretrained LLMs and standard graph techniques, which are independent of the target MEL metric. This is the normal non-circular case for a framework proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLMs can perform effective unsupervised reasoning over heterogeneous evidence streams and that the constructed group-level graphs supply additive signal beyond instance-centric features.

axioms (1)

domain assumption Large language models can analyze correlations and semantics among instance-centric, group-level, lexical, and statistical evidence to induce accurate entity rankings without supervision.
Invoked in the online multi-perspective evidence reasoning stage.

pith-pipeline@v0.9.0 · 5597 in / 1278 out tokens · 43940 ms · 2026-05-10T00:30:50.578085+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 21 canonical work pages · 7 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Alt- man, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Building a multimodal entity linking dataset from tweets. In Proceedings of the Twelfth Language Resources and Evaluation Conference. 4285–4292

2020
[3]

Akiko Aizawa. 2003. An information-theoretic perspective of tf–idf measures. Information Processing & Management39, 1 (2003), 45–65

2003
[4]

Zhuo Chen, Yichi Zhang, Yin Fang, Yuxia Geng, Lingbing Guo, Xiang Chen, Qian Li, Wen Zhang, Jiaoyan Chen, Yushan Zhu, et al . 2024. Knowledge graphs meet multi-modal learning: A comprehensive survey.arXiv preprint arXiv:2402.05391(2024)

work page arXiv 2024
[5]

Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on Wikipedia data. InProceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). 708–716

2007
[6]

Nicola De Cao, Wilker Aziz, and Ivan Titov. 2019. Question answering by reasoning across documents with graph convolutional networks. InProceed- ings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, Volume 1 (long and short papers). 2306–2317

2019
[7]

Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2020. Autoregressive entity retrieval.arXiv preprint arXiv:2010.00904(2020)

work page arXiv 2020
[8]

Floris P de Lange, Ole Jensen, and Stanislas Dehaene. 2010. Accumulation of evidence during sequential decision making: the importance of top–down factors.Journal of Neuroscience30, 2 (2010), 731–738

2010
[9]

Antonin Delpeuch. 2019. Opentapioca: Lightweight entity linking for wiki- data.arXiv preprint arXiv:1904.09131(2019)

work page arXiv 2019
[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language under- standing. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186

2019
[11]

Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, and Jens Lehmann
[12]

InInternational Semantic Web Conference

EARL: joint entity and relation linking for question answering over knowledge graphs. InInternational Semantic Web Conference. Springer, 108– 126
[13]

Yotam Eshel, Noam Cohen, Kira Radinsky, Shaul Markovitch, Ikuya Yamada, and Omer Levy. 2017. Named entity disambiguation for noisy text.arXiv preprint arXiv:1706.09147(2017)

work page arXiv 2017
[14]

Zheng Fang, Yanan Cao, Qian Li, Dongjie Zhang, Zhenyu Zhang, and Yanbing Liu. 2019. Joint entity linking with deep reinforcement learning. InThe world wide web conference. 438–447

2019
[15]

Jingru Gan, Jinchang Luo, Haiwei Wang, Shuhui Wang, Wei He, and Qing- ming Huang. 2021. Multimodal entity linking: a new dataset and a baseline. InProceedings of the 29th ACM international conference on multimedia. 993– 1001

2021
[16]

Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep joint entity dis- ambiguation with local neural attention.arXiv preprint arXiv:1704.04920 (2017)

work page arXiv 2017
[17]

Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997(2018)

work page arXiv 2018
[18]

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang
[19]

InProceedings of the 16th ACM conference on recommender systems

Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). InProceedings of the 16th ACM conference on recommender systems. 299–315
[20]

Dan Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. Learning dense rep- resentations for entity retrieval. InProceedings of the 23rd conference on computational natural language learning (CoNLL). 528–537

2019
[21]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs.Advances in neural information processing systems 30 (2017)

2017
[22]

Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. 2016. Exploiting entity linking in queries for entity retrieval. InProceedings of the 2016 acm international conference on the theory of information retrieval. 209–218

2016
[23]

Zhiwei Hu, Víctor Gutiérrez-Basulto, Ru Li, and Jeff Z Pan. 2025. Multi- level matching network for multimodal entity linking. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 508–519

2025
[24]

Yizhu Jiao, Yun Xiong, Jiawei Zhang, Yao Zhang, Tianqi Zhang, and Yangyong Zhu. 2020. Sub-graph contrast for scalable self-supervised graph representa- tion learning. In2020 IEEE international conference on data mining (ICDM). IEEE, 222–231

2020
[25]

Juyeon Kim, Geon Lee, Taeuk Kim, and Kijung Shin. 2025. KGMEL: Knowl- edge Graph-Enhanced Multimodal Entity Linking. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. 3015–3019

2025
[26]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016). SIGMOD ’27, June 13–19, 2027, Huntington Beach, CA, USA Mo Zhou, Jianwei Wang, Kai Wang, Helen Paik, Ying Zhang, and Wenjie Zhang

work page internal anchor Pith review arXiv 2016
[27]

Songtao Li and Hao Tang. 2024. Multimodal alignment and fusion: A survey. arXiv preprint arXiv:2411.17040(2024)

work page arXiv 2024
[28]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Qi Liu, Yongyi He, Tong Xu, Defu Lian, Che Liu, Zhi Zheng, and Enhong Chen. 2024. Unimel: A unified framework for multimodal entity linking with large language models. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 1909–1919

2024
[30]

Ziyan Liu, Junwen Li, Kaiwen Li, Tong Ruan, Chao Wang, Xinyan He, Zongyu Wang, Xuezhi Cao, and Jingping Liu. 2025. I2CR: Intra-and Inter-modal Collaborative Reflections for Multimodal Entity Linking. InProceedings of the 33rd ACM International Conference on Multimedia. 4942–4951

2025
[31]

Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, and Sameer Singh. 2021. Entity-based knowledge conflicts in ques- tion answering.arXiv preprint arXiv:2109.05052(2021)

work page arXiv 2021
[32]

Pengfei Luo, Tong Xu, Che Liu, Suojuan Zhang, Linli Xu, Minglei Li, and Enhong Chen. 2024. Bridging gaps in content and knowledge for multimodal entity linking. InProceedings of the 32nd ACM International Conference on Multimedia. 9311–9320

2024
[33]

Pengfei Luo, Tong Xu, Shiwei Wu, Chen Zhu, Linli Xu, and Enhong Chen
[34]

In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Multi-grained multimodal interaction network for entity linking. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1583–1594
[35]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient esti- mation of word representations in vector space.arXiv preprint arXiv:1301.3781 (2013)

work page internal anchor Pith review arXiv 2013
[36]

Seungwhan Moon, Leonardo Neves, and Vitor Carvalho. 2018. Multimodal named entity disambiguation for noisy social media posts. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2000–2008

2018
[37]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[38]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language super- vision. InInternational conference on machine learning. PmLR, 8748–8763

2021
[39]

Delip Rao, Paul McNamee, and Mark Dredze. 2012. Entity linking: Finding ex- tracted entities in a knowledge base. InMulti-source, multilingual information extraction and summarization. Springer, 93–115

2012
[40]

Wei Shen, Jianyong Wang, and Jiawei Han. 2014. Entity linking with a knowledge base: Issues, techniques, and solutions.IEEE Transactions on Knowledge and Data Engineering27, 2 (2014), 443–460

2014
[41]

Senbao Shi, Zhenran Xu, Baotian Hu, and Min Zhang. 2024. Generative multimodal entity linking. InProceedings of the 2024 Joint International Con- ference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 7654–7665

2024
[42]

Shezheng Song, Shan Zhao, Chengyu Wang, Tianwei Yan, Shasha Li, Xi- aoguang Mao, and Meng Wang. 2024. A dual-way enhanced framework from text matching point of view for multimodal entity linking. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19008–19016

2024
[43]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[44]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks.arXiv preprint arXiv:1710.10903(2017)

work page internal anchor Pith review arXiv 2017
[45]

Fang Wang, Tianwei Yan, Zonghao Yang, Minghao Hu, Jun Zhang, Zhunchen Luo, and Xiaoying Bai. 2026. DeepMEL: A multi-agent collaboration frame- work for multimodal entity linking.Information Processing & Management 63, 3 (2026), 104507

2026
[46]

Jianwei Wang, Kai Wang, Xuemin Lin, Wenjie Zhang, and Ying Zhang. 2024. Efficient unsupervised community search with pre-trained graph transformer. arXiv preprint arXiv:2403.18869(2024)

work page arXiv 2024
[47]

Jianwei Wang, Kai Wang, Xuemin Lin, Wenjie Zhang, and Ying Zhang. 2024. Neural attributed community search at billion scale.Proceedings of the ACM on Management of Data1, 4 (2024), 1–25

2024
[48]

Peng Wang, Jiangheng Wu, and Xiaohang Chen. 2022. Multimodal entity linking with gated hierarchical fusion and contrastive training. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 938–948

2022
[49]

Xuwu Wang, Junfeng Tian, Min Gui, Zhixu Li, Rui Wang, Ming Yan, Lihan Chen, and Yanghua Xiao. 2022. WikiDiverse: a multimodal entity linking dataset with diversified contextual topics and entity types.arXiv preprint arXiv:2204.06347(2022)

work page arXiv 2022
[50]

Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettle- moyer. 2020. Scalable zero-shot entity linking with dense entity retrieval. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 6397–6407

2020
[51]

Renjie Wu, Hu Wang, Hsiang-Ting Chen, and Gustavo Carneiro. 2024. Deep multimodal learning with missing modality: A survey.arXiv preprint arXiv:2409.07825(2024)

work page arXiv 2024
[52]

Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, and Xinyu Dai. 2023. Drin: Dynamic relation interactive network for multimodal entity linking. InProceedings of the 31st ACM International Conference on Multimedia. 3599–3608

2023
[53]

Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, and William Yang Wang
[54]

Improving question answering over incomplete kbs with knowledge- aware reader.arXiv preprint arXiv:1905.07098(2019)

work page arXiv 1905
[55]

Yuanyuan Xu, Yu Yin, Jun Wang, Jinmao Wei, Jian Liu, Lina Yao, and Wenjie Zhang. 2021. Unsupervised cross-view feature selection on incomplete data. Knowledge-Based Systems234 (2021), 107595

2021
[56]

Yuanyuan Xu, Wenjie Zhang, Ying Zhang, Xuemin Lin, and Xiwei Xu. 2026. Unlocking Multi-Modal Potentials for Link Prediction on Dynamic Text- Attributed Graphs. InProceedings of the AAAI Conference on Artificial Intelli- gence, Vol. 40. 27386–27394

2026
[57]

Li Zhang, Zhixu Li, and Qiang Yang. 2021. Attention-based multimodal entity linking with high-quality images. InInternational conference on database systems for advanced applications. Springer, 533–548

2021
[58]

Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, et al . 2024. Multimodal fusion on low-quality data: A comprehensive survey.arXiv preprint arXiv:2404.18947 (2024)

work page arXiv 2024
[59]

Mo Zhou, Jianwei Wang, Xuanmeng Zhang, Dylan Campbell, Kai Wang, Long Yuan, Wenjie Zhang, and Xuemin Lin. 2025. ProbDiffFlow: An Effi- cient Learning-Free Framework for Probabilistic Single-Image Optical Flow Estimation.arXiv preprint arXiv:2503.12348(2025)

work page arXiv 2025
[60]

Xinyi Zhu, Yongqi Zhang, and Lei Chen. 2025. OpenMEL: Unsupervised Multimodal Entity Linking Using Noise-Free Expanded Queries and Global Coherence.Proceedings of the VLDB Endowment18, 8 (2025), 2454–2467

2025