When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

Chao-Ming Huang; Min-Yen Kan; Tongyao Zhu

arxiv: 2606.13537 · v1 · pith:OE5BEPRFnew · submitted 2026-06-11 · 💻 cs.CL

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

Tongyao Zhu , Chao-Ming Huang , Min-Yen Kan This is my paper

Pith reviewed 2026-06-27 06:56 UTC · model grok-4.3

classification 💻 cs.CL

keywords multilingual dense retrievalquery embedding interpolationmixing ratiosEnglish dominancetypological distancemMARCOBGE-M3

0 comments

The pith

Interpolating parallel query embeddings at optimal ratios outperforms the best monolingual query in 88 of 105 retrieval tests on mMARCO.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether mixing query embeddings from parallel translations helps dense retrievers handle mixed-language queries. It constructs mixed queries by linear interpolation between monolingual embeddings and varies the mixing ratio systematically. Experiments show that an optimal ratio beats both pure-language endpoints in most cases, but the benefit is asymmetric: mixing helps consistently when the document index is non-English, while English-containing indices perform best with pure English queries. English emerges as the strongest partner for any non-English language, and after accounting for that dominance the size of the mixing gain decreases as typological distance grows. These patterns hold across model families and scales, suggesting that language-mix sensitivity in dense retrieval follows predictable structure rather than random variation.

Core claim

Linear interpolation of monolingual query embeddings at controlled ratios produces retrieval performance that exceeds the stronger monolingual endpoint in 88 of 105 language-pair and index combinations; the gains are uniformly positive for non-English document indices, English indices are optimized by unmixed English queries, English is the strongest mixing partner for every other language, and mixing gains correlate negatively with typological distance once English dominance is controlled.

What carries the argument

Embedding-level mixing: constructing a mixed query as a convex combination of two monolingual query embeddings produced by the same encoder.

If this is right

Retrieval systems can improve accuracy on non-English document collections by mixing in a second language at inference time instead of using a single query language.
For collections that already contain English documents, the system should default to the English query embedding without mixing.
English can be used as a universal mixing partner regardless of the other language in the query.
Typological distance between the two query languages predicts the magnitude of the mixing benefit once English is controlled for.
The same mixing-ratio patterns appear across different dense retriever families and sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Index builders could pre-compute and store a small set of mixed query vectors for common language pairs to avoid runtime interpolation.
The observed negative correlation with typological distance suggests that mixing may be especially useful for closely related language pairs once English dominance is removed.
If the linear-interpolation assumption holds only for certain encoder families, then newer contrastively trained models might show different optimal ratios or even different asymmetry patterns.

Load-bearing premise

Linear interpolation between two monolingual embeddings yields a vector whose retrieval behavior reflects a meaningful blend of the two languages' semantics.

What would settle it

A controlled experiment on the same model and index where the performance curve over mixing ratios is flat or peaks at an endpoint rather than an interior ratio.

Figures

Figures reproduced from arXiv: 2606.13537 by Chao-Ming Huang, Min-Yen Kan, Tongyao Zhu.

**Figure 2.** Figure 2: Illustration of Embedding-level query mixing: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Word vs. embedding-level mixing nDCG@10 on the EN–ZH pair across three document-language [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Embedding-space diagnostics (EN–ZH). Word-mixed query embeddings move approximately along the EN→ZH axis with small off-axis deviation δ. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 nDCG@10 (best mid best endpoint) 0 2 4 6 8 10 12 Number of settings distribution (n=105) 0 Mean = 0.7037 Median = 0.6508 (a) Distribution of ∆ over 105 settings. Embedding-level mixing is usually beneficial, and the harm is small. EN absent E… view at source ↗

**Figure 5.** Figure 5: Overview of effects of embedding-level code mixing. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: For multiple document languages L, English gives the largest ∆ among tested partners on L-only document-language settings. ment language outperforms the opposite-language endpoint. This is consistent with the widely observed cross-lingual alignment gap that crosslingual retrieval is more challenging than monolingual baselines (Zhang et al., 2023b; Conneau et al., 2020; Litschko et al., 2023). Moreover, a… view at source ↗

**Figure 8.** Figure 8: Optimal query mixing ratio. Stacked bars [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: ∆ vs. the stronger endpoint’s nDCG@10. The negative trend is largely driven by settings with English documents, where endpoints are already strong. Pair type Mean Gain #(Gains> 0) #(Gains> 0.1) Non-EN +0.4475 21/22 19/22 EN +0.0101 8/13 3/13 [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Typology distance of language pairs vs. mix [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Illustration of Word-level query mixing via [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Word vs Embed mix ratio curves for EN–ZH and EN–VI. [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: Word vs Embed mix ratio curves for ZH–VI and HI–ID. [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

read the original abstract

While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on mMARCO that systematically evaluates retrieval performance by varying the mixing proportion of parallel query translations via embedding-level mixing -- constructing mixed queries as an interpolation of monolingual embeddings. Experiments with BGE-M3 demonstrate that an optimal mixing ratio outperforms the best monolingual endpoint in 88/105 cases. We uncover a distinct asymmetry driven by English dominance: mixing is uniformly beneficial when retrieving from non-English document indices, whereas indices containing English are best served by pure English queries. Furthermore, English acts as the strongest mixing partner for every non-English document language. Finally, when controlling for English dominance, mixing gains correlate negatively with typological distance. We conclude that language-mix sensitivity is structured and predictable, and we validate the robustness of these patterns across model families and scales.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Controlled embedding interpolation beats monolingual queries in 88/105 cases on mMARCO, with English dominance and typological patterns as the concrete new results.

read the letter

The main thing to know is that this paper runs ratio-controlled interpolation of query embeddings from parallel translations and reports that the best mix outperforms the stronger monolingual endpoint in 88 out of 105 language-index pairs on mMARCO with BGE-M3. English emerges as the strongest mixing partner, helping non-English document indexes across the board while pure English queries win when English documents are present; after controlling for that, gains fall with typological distance.

What is new is the systematic sweep over mixing ratios and the resulting asymmetry and correlation findings. These are direct measurements from retrieval metrics on a public benchmark, not derived from fitted parameters. The setup is straightforward: embed the two language versions separately, interpolate the vectors, and evaluate end-to-end. The patterns are specific enough to be useful for tuning multilingual retrievers and appear robust across model families.

The work is empirically grounded and avoids circular claims. The interpolation is presented as a controlled probe rather than a claim that the vector literally encodes mixed semantics, which keeps the interpretation tight. The counts and asymmetries are observable from the protocol, so the central results look reproducible.

Soft spots are limited. The abstract leaves the exact statistical tests and variance details implicit, so one would want to check whether the 88/105 figure holds under multiple-testing correction or if ratio selection introduced any bias. The linear interpolation itself is an approximation whose semantic fidelity is not deeply justified, but the paper does not overclaim it. These are normal empirical caveats rather than load-bearing flaws.

This is for people working on multilingual dense retrieval who need practical guidance on mixed queries. A reader in IR or cross-lingual setups will get actionable patterns to test. It deserves a serious referee because the experiments are on a standard benchmark, the claims are falsifiable, and the results are specific enough to build on.

Referee Report

2 major / 1 minor

Summary. The paper presents a ratio-controlled empirical study on the mMARCO benchmark using embedding-level linear interpolation of parallel query translations in multilingual dense retrieval. With BGE-M3 (and other models), it reports that an optimal mixing ratio outperforms the best monolingual endpoint in 88/105 language-index pairs, identifies an English-dominance asymmetry (mixing helps uniformly for non-English document indices but pure English queries are best when English is present), shows English as the strongest mixing partner for all non-English languages, and finds that mixing gains correlate negatively with typological distance after controlling for English. The patterns are claimed to be robust across model families and scales.

Significance. If the results hold under a fully specified protocol, the work offers structured, actionable insights into language-mix sensitivity for multilingual retrieval systems, moving beyond anecdotal observations. The use of a public benchmark, systematic ratio variation, and cross-model validation are clear strengths that support the empirical claims.

major comments (2)

[Methods / Experiments] The central 88/105 count and the uniformity/asymmetry claims depend on the precise definition of 'optimal mixing ratio' and the evaluation protocol. The methods section should explicitly state whether ratios are selected per language-index pair on held-out data or the test set, and whether any multiple-testing correction or pre-specification was used, as this directly affects whether the headline result can be taken as evidence of structured sensitivity rather than selection artifact.
[Embedding-level mixing construction] The weakest assumption—that linear interpolation of monolingual embeddings produces retrieval behavior that meaningfully reflects combined cross-lingual semantics—is load-bearing for interpreting the asymmetry and typological-distance results as more than a performance probe. The paper should add a short discussion or ablation (e.g., comparing to non-linear mixing or to cross-lingual alignment metrics) to bound how much the observed patterns could be artifacts of the linear construction itself.

minor comments (1)

[Results] Tables or figures reporting the per-language-pair wins/losses and the typological-distance correlation should include exact sample sizes, confidence intervals, or p-values so readers can assess the strength of the 'uniformly beneficial' and 'negative correlation' statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity on the experimental protocol and to discuss the linear mixing assumption.

read point-by-point responses

Referee: [Methods / Experiments] The central 88/105 count and the uniformity/asymmetry claims depend on the precise definition of 'optimal mixing ratio' and the evaluation protocol. The methods section should explicitly state whether ratios are selected per language-index pair on held-out data or the test set, and whether any multiple-testing correction or pre-specification was used, as this directly affects whether the headline result can be taken as evidence of structured sensitivity rather than selection artifact.

Authors: We agree that the selection protocol requires explicit description. The optimal ratio per language-index pair was identified by evaluating the full ratio grid directly on the test set; this was an exploratory choice to establish whether beneficial mixtures exist rather than to validate a selection procedure. No held-out data, pre-specification, or multiple-testing correction was used. In the revised manuscript we will state this protocol verbatim in the Methods section. To reduce reliance on per-pair selection we will additionally report performance for a single fixed ratio (0.5) across all pairs, confirming that mixing gains remain widespread even without optimization. These changes clarify the scope of the 88/105 result while preserving the observed patterns. revision: yes
Referee: [Embedding-level mixing construction] The weakest assumption—that linear interpolation of monolingual embeddings produces retrieval behavior that meaningfully reflects combined cross-lingual semantics—is load-bearing for interpreting the asymmetry and typological-distance results as more than a performance probe. The paper should add a short discussion or ablation (e.g., comparing to non-linear mixing or to cross-lingual alignment metrics) to bound how much the observed patterns could be artifacts of the linear construction itself.

Authors: We accept that the linear-interpolation construction merits explicit discussion. In the revised manuscript we will add a concise paragraph in the Discussion section that (i) acknowledges linear mixing as a simplifying assumption that may not capture non-linear semantic interactions, (ii) notes that the consistency of English dominance and typological correlations across model families provides indirect support for the patterns reflecting embedding-space geometry, and (iii) references prior work on embedding arithmetic. A full non-linear ablation or alignment-metric comparison lies outside the present scope and would require additional model training; we therefore limit the revision to the requested discussion, which bounds the interpretive claims without new experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a purely empirical paper that reports retrieval metrics from controlled interpolation experiments on the external mMARCO benchmark across 105 language-index pairs using BGE-M3 and other models. The central claims (88/105 optimal-ratio wins, English dominance asymmetry, negative correlation with typological distance) are direct measurements from end-to-end task performance rather than derivations, fitted parameters renamed as predictions, or self-referential definitions. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the paper's own inputs; the embedding mixing construction is presented only as an experimental probe whose validity is assessed externally via benchmark scores.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that mMARCO provides high-quality parallel queries and relevance labels, and that the tested embedding models produce vectors whose linear combinations remain semantically coherent for retrieval.

axioms (1)

domain assumption mMARCO dataset supplies reliable parallel query translations and relevance judgments across languages
The entire evaluation pipeline depends on this benchmark without additional validation of its quality or coverage.

pith-pipeline@v0.9.1-grok · 5686 in / 1354 out tokens · 30586 ms · 2026-06-27T06:56:41.850965+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 18 canonical work pages

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

Publications Manual , year = "1983", publisher =

1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[5]

Dan Gusfield , title =. 1997

1997
[6]

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , url =

Kudugunta, Sneha and Caswell, Isaac and Zhang, Biao and Garcia, Xavier and Xin, Derrick and Kusupati, Aditya and Stella, Romi and Bapna, Ankur and Firat, Orhan , booktitle =. MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , url =
[7]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[8]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[9]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =. Advances in Neural Information Processing Systems , editor =
[10]

2020 , eprint=

A Survey of Code-switched Speech and Language Processing , author=. 2020 , eprint=

2020
[11]

2021 , note =

Census of Population 2020: Statistical Release 1 --- Demographic Characteristics, Education, Language and Religion , institution =. 2021 , note =

2020
[12]

Dense passage retrieval for open-domain question answering, in: Webber, B., Cohn, T., He, Y ., Liu, Y

Vladimir Karpukhin and Barlas Oguz and Sewon Min and Patrick Lewis and Ledell Wu and Sergey Edunov and Danqi Chen and Wen. Dense Passage Retrieval for Open-Domain Question Answering , booktitle =. doi:10.18653/v1/2020.emnlp-main.550

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[13]

2022 , journal =

No Language Left Behind: Scaling Human-Centered Machine Translation , author=. 2022 , journal =

2022
[14]

The Probabilistic Relevance Framework: BM25 and beyond.Foundations and Trends®in Information Retrieval, 4(1-2):1–174, 2009

Robertson, Stephen and Zaragoza, Hugo , title =. 2009 , issue_date =. doi:10.1561/1500000019 , journal =

work page doi:10.1561/1500000019 2009
[15]

1611.09268 , archivePrefix=

Payal Bajaj and Daniel Campos and Nick Craswell and Li Deng and Jianfeng Gao and Xiaodong Liu and Rangan Majumder and Andrew McNamara and Bhaskar Mitra and Tri Nguyen and Mir Rosenberg and Xia Song and Alina Stoica and Saurabh Tiwary and Tong Wang , year=. 1611.09268 , archivePrefix=

Pith/arXiv arXiv
[16]

Computacion y Sistemas , doi =

Chakma, Kunal and Das, Amitava , year =. Computacion y Sistemas , doi =
[17]

, title =

Ganguly, Debasis and Bandyopadhyay, Ayan and Mitra, Mandar and Jones, Gareth J.F. , title =. 2016 , isbn =. doi:10.1145/2911451.2914727 , booktitle =

work page doi:10.1145/2911451.2914727 2016
[18]

Proceedings of FIRE 2016 , year =

Banerjee, Somnath and Chakma, Kunal and Naskar, Sudip Kumar and Das, Amitava and Rosso, Paolo and Bandyopadhyay, Sivaji and Choudhury, Monojit , title =. Proceedings of FIRE 2016 , year =

2016
[19]

GLUEC o S : An Evaluation Benchmark for Code-Switched NLP

Khanuja, Simran and Dandapat, Sandipan and Srinivasan, Anirudh and Sitaram, Sunayana and Choudhury, Monojit. GLUEC o S : An Evaluation Benchmark for Code-Switched NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.329

work page doi:10.18653/v1/2020.acl-main.329 2020
[20]

L in CE : A Centralized Benchmark for Linguistic Code-switching Evaluation

Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar. L in CE : A Centralized Benchmark for Linguistic Code-switching Evaluation. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020
[21]

Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

Litschko, Robert and Artemova, Ekaterina and Plank, Barbara. Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.193

work page doi:10.18653/v1/2023.findings-acl.193 2023
[22]

C ontrastive M ix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval

Do, Junggeun and Lee, Jaeseong and Hwang, Seung-won. C ontrastive M ix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.naacl...

work page doi:10.18653/v1/2024.naacl-short.17 2024
[23]

2402.03216 , archivePrefix=

Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu , year=. 2402.03216 , archivePrefix=

Pith/arXiv arXiv
[24]

2022 , eprint=

Unsupervised Dense Information Retrieval with Contrastive Learning , author=. 2022 , eprint=

2022
[25]

Multilingual Large Language Models Are Not (Yet) Code-Switchers

Zhang, Ruochen and Cahyawijaya, Samuel and Cruz, Jan Christian Blaise and Winata, Genta and Aji, Alham Fikri. Multilingual Large Language Models Are Not (Yet) Code-Switchers. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.774

work page doi:10.18653/v1/2023.emnlp-main.774 2023
[26]

2505.16631 , archivePrefix=

Jonghwi Kim and Deokhyung Kang and Seonjeong Hwang and Yunsu Kim and Jungseul Ok and Gary Lee , year=. 2505.16631 , archivePrefix=

arXiv
[27]

2108.13897 , archivePrefix=

Luiz Bonifacio and Vitor Jeronymo and Hugo Queiroz Abonizio and Israel Campiotti and Marzieh Fadaee and Roberto Lotufo and Rodrigo Nogueira , year=. 2108.13897 , archivePrefix=

arXiv
[28]

S tanza: A Python Natural Language Processing Toolkit for Many Human Languages

Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D. S tanza: A Python natural language processing toolkit for many human languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020. doi:10.18653/v1/2020.acl-demos.14

work page doi:10.18653/v1/2020.acl-demos.14 2020
[29]

and Allan, James and Carterette, Ben , title =

Smucker, Mark D. and Allan, James and Carterette, Ben , title =. 2007 , isbn =. doi:10.1145/1321440.1321528 , booktitle =

work page doi:10.1145/1321440.1321528 2007
[30]

IEEE Transactions on Big Data , year =

Billion-scale similarity search with GPUs , author =. IEEE Transactions on Big Data , year =
[31]

Text REtrieval Conference (TREC) , year =

Craswell, Nick and Mitra, Bhaskar and Yilmaz, Emine and Campos, Daniel , title =. Text REtrieval Conference (TREC) , year =
[32]

Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently? , url=

Tamm, Yan-Martin and Damdinov, Rinchin and Vasilev, Alexey , year=. Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently? , url=. doi:10.1145/3460231.3478848 , booktitle=

work page doi:10.1145/3460231.3478848
[33]

Matthijs Douze and Alexandr Guzhva and Chengqi Deng and Jeff Johnson and Gergely Szilvasy and Pierre-Emmanuel Mazaré and Maria Lomeli and Lucas Hosseini and Hervé Jégou , journal=. The. 2024 , eprint=

2024
[34]

D ista L s: a Comprehensive Collection of Language Distance Measures

Goot, Rob Van Der and Ploeger, Esther and Blaschke, Verena and Samardzic, Tanja. D ista L s: a Comprehensive Collection of Language Distance Measures. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2025. doi:10.18653/v1/2025.emnlp-demos.23

work page doi:10.18653/v1/2025.emnlp-demos.23 2025
[35]

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Joshi, Pratik and Santy, Sebastin and Budhiraja, Amar and Bali, Kalika and Choudhury, Monojit. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.560

work page doi:10.18653/v1/2020.acl-main.560 2020
[36]

Aslib Journal of Information Management , volume =

Fu, Hengyi , title =. Aslib Journal of Information Management , volume =. 2018 , month =. doi:10.1108/AJIM-04-2018-0091 , url =

work page doi:10.1108/ajim-04-2018-0091 2018
[37]

MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages

Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy. MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00595

work page doi:10.1162/tacl_a_00595 2023
[38]

Zhang, Xinyu and Ma, Xueguang and Shi, Peng and Lin, Jimmy. Mr. T y D i: A Multi-lingual Benchmark for Dense Retrieval. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021. doi:10.18653/v1/2021.mrl-1.12

work page doi:10.18653/v1/2021.mrl-1.12 2021
[39]

Multilingual

Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu , journal=. Multilingual. 2024 , url=

2024
[40]

jina-embeddings-v3: Multilingual embeddings with task

Saba Sturua and Isabelle Mohr and Mohammad Kalim Akram and Michael Günther and Bo Wang and Markus Krimmel and Feng Wang and Georgios Mastrapas and Andreas Koukounas and Nan Wang and Han Xiao , year=. jina-embeddings-v3: Multilingual embeddings with task. 2409.10173 , archivePrefix=

arXiv
[41]

2025 , url=

Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren , journal=. 2025 , url=

2025
[42]

mGTE : Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and Zhang, Meishan and Li, Wenjie and Zhang, Min. mGTE : Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval. Proceedings of the 2024 Conference on Empiri...

work page doi:10.18653/v1/2024.emnlp-industry.103 2024
[43]

Unsupervised Cross-lingual Representation Learning at Scale

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020

[1] [1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[2] [2]

Publications Manual , year = "1983", publisher =

1983

[3] [3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[4] [4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

[5] [5]

Dan Gusfield , title =. 1997

1997

[6] [6]

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , url =

Kudugunta, Sneha and Caswell, Isaac and Zhang, Biao and Garcia, Xavier and Xin, Derrick and Kusupati, Aditya and Stella, Romi and Bapna, Ankur and Firat, Orhan , booktitle =. MADLAD-400: A Multilingual And Document-Level Large Audited Dataset , url =

[7] [7]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[8] [8]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

[9] [9]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =. Advances in Neural Information Processing Systems , editor =

[10] [10]

2020 , eprint=

A Survey of Code-switched Speech and Language Processing , author=. 2020 , eprint=

2020

[11] [11]

2021 , note =

Census of Population 2020: Statistical Release 1 --- Demographic Characteristics, Education, Language and Religion , institution =. 2021 , note =

2020

[12] [12]

Dense passage retrieval for open-domain question answering, in: Webber, B., Cohn, T., He, Y ., Liu, Y

Vladimir Karpukhin and Barlas Oguz and Sewon Min and Patrick Lewis and Ledell Wu and Sergey Edunov and Danqi Chen and Wen. Dense Passage Retrieval for Open-Domain Question Answering , booktitle =. doi:10.18653/v1/2020.emnlp-main.550

work page doi:10.18653/v1/2020.emnlp-main.550 2020

[13] [13]

2022 , journal =

No Language Left Behind: Scaling Human-Centered Machine Translation , author=. 2022 , journal =

2022

[14] [14]

The Probabilistic Relevance Framework: BM25 and beyond.Foundations and Trends®in Information Retrieval, 4(1-2):1–174, 2009

Robertson, Stephen and Zaragoza, Hugo , title =. 2009 , issue_date =. doi:10.1561/1500000019 , journal =

work page doi:10.1561/1500000019 2009

[15] [15]

1611.09268 , archivePrefix=

Payal Bajaj and Daniel Campos and Nick Craswell and Li Deng and Jianfeng Gao and Xiaodong Liu and Rangan Majumder and Andrew McNamara and Bhaskar Mitra and Tri Nguyen and Mir Rosenberg and Xia Song and Alina Stoica and Saurabh Tiwary and Tong Wang , year=. 1611.09268 , archivePrefix=

Pith/arXiv arXiv

[16] [16]

Computacion y Sistemas , doi =

Chakma, Kunal and Das, Amitava , year =. Computacion y Sistemas , doi =

[17] [17]

, title =

Ganguly, Debasis and Bandyopadhyay, Ayan and Mitra, Mandar and Jones, Gareth J.F. , title =. 2016 , isbn =. doi:10.1145/2911451.2914727 , booktitle =

work page doi:10.1145/2911451.2914727 2016

[18] [18]

Proceedings of FIRE 2016 , year =

Banerjee, Somnath and Chakma, Kunal and Naskar, Sudip Kumar and Das, Amitava and Rosso, Paolo and Bandyopadhyay, Sivaji and Choudhury, Monojit , title =. Proceedings of FIRE 2016 , year =

2016

[19] [19]

GLUEC o S : An Evaluation Benchmark for Code-Switched NLP

Khanuja, Simran and Dandapat, Sandipan and Srinivasan, Anirudh and Sitaram, Sunayana and Choudhury, Monojit. GLUEC o S : An Evaluation Benchmark for Code-Switched NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.329

work page doi:10.18653/v1/2020.acl-main.329 2020

[20] [20]

L in CE : A Centralized Benchmark for Linguistic Code-switching Evaluation

Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar. L in CE : A Centralized Benchmark for Linguistic Code-switching Evaluation. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020

[21] [21]

Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

Litschko, Robert and Artemova, Ekaterina and Plank, Barbara. Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.193

work page doi:10.18653/v1/2023.findings-acl.193 2023

[22] [22]

C ontrastive M ix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval

Do, Junggeun and Lee, Jaeseong and Hwang, Seung-won. C ontrastive M ix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.naacl...

work page doi:10.18653/v1/2024.naacl-short.17 2024

[23] [23]

2402.03216 , archivePrefix=

Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu , year=. 2402.03216 , archivePrefix=

Pith/arXiv arXiv

[24] [24]

2022 , eprint=

Unsupervised Dense Information Retrieval with Contrastive Learning , author=. 2022 , eprint=

2022

[25] [25]

Multilingual Large Language Models Are Not (Yet) Code-Switchers

Zhang, Ruochen and Cahyawijaya, Samuel and Cruz, Jan Christian Blaise and Winata, Genta and Aji, Alham Fikri. Multilingual Large Language Models Are Not (Yet) Code-Switchers. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.774

work page doi:10.18653/v1/2023.emnlp-main.774 2023

[26] [26]

2505.16631 , archivePrefix=

Jonghwi Kim and Deokhyung Kang and Seonjeong Hwang and Yunsu Kim and Jungseul Ok and Gary Lee , year=. 2505.16631 , archivePrefix=

arXiv

[27] [27]

2108.13897 , archivePrefix=

Luiz Bonifacio and Vitor Jeronymo and Hugo Queiroz Abonizio and Israel Campiotti and Marzieh Fadaee and Roberto Lotufo and Rodrigo Nogueira , year=. 2108.13897 , archivePrefix=

arXiv

[28] [28]

S tanza: A Python Natural Language Processing Toolkit for Many Human Languages

Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D. S tanza: A Python natural language processing toolkit for many human languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020. doi:10.18653/v1/2020.acl-demos.14

work page doi:10.18653/v1/2020.acl-demos.14 2020

[29] [29]

and Allan, James and Carterette, Ben , title =

Smucker, Mark D. and Allan, James and Carterette, Ben , title =. 2007 , isbn =. doi:10.1145/1321440.1321528 , booktitle =

work page doi:10.1145/1321440.1321528 2007

[30] [30]

IEEE Transactions on Big Data , year =

Billion-scale similarity search with GPUs , author =. IEEE Transactions on Big Data , year =

[31] [31]

Text REtrieval Conference (TREC) , year =

Craswell, Nick and Mitra, Bhaskar and Yilmaz, Emine and Campos, Daniel , title =. Text REtrieval Conference (TREC) , year =

[32] [32]

Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently? , url=

Tamm, Yan-Martin and Damdinov, Rinchin and Vasilev, Alexey , year=. Quality Metrics in Recommender Systems: Do We Calculate Metrics Consistently? , url=. doi:10.1145/3460231.3478848 , booktitle=

work page doi:10.1145/3460231.3478848

[33] [33]

Matthijs Douze and Alexandr Guzhva and Chengqi Deng and Jeff Johnson and Gergely Szilvasy and Pierre-Emmanuel Mazaré and Maria Lomeli and Lucas Hosseini and Hervé Jégou , journal=. The. 2024 , eprint=

2024

[34] [34]

D ista L s: a Comprehensive Collection of Language Distance Measures

Goot, Rob Van Der and Ploeger, Esther and Blaschke, Verena and Samardzic, Tanja. D ista L s: a Comprehensive Collection of Language Distance Measures. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2025. doi:10.18653/v1/2025.emnlp-demos.23

work page doi:10.18653/v1/2025.emnlp-demos.23 2025

[35] [35]

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Joshi, Pratik and Santy, Sebastin and Budhiraja, Amar and Bali, Kalika and Choudhury, Monojit. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.560

work page doi:10.18653/v1/2020.acl-main.560 2020

[36] [36]

Aslib Journal of Information Management , volume =

Fu, Hengyi , title =. Aslib Journal of Information Management , volume =. 2018 , month =. doi:10.1108/AJIM-04-2018-0091 , url =

work page doi:10.1108/ajim-04-2018-0091 2018

[37] [37]

MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages

Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy. MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00595

work page doi:10.1162/tacl_a_00595 2023

[38] [38]

Zhang, Xinyu and Ma, Xueguang and Shi, Peng and Lin, Jimmy. Mr. T y D i: A Multi-lingual Benchmark for Dense Retrieval. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021. doi:10.18653/v1/2021.mrl-1.12

work page doi:10.18653/v1/2021.mrl-1.12 2021

[39] [39]

Multilingual

Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu , journal=. Multilingual. 2024 , url=

2024

[40] [40]

jina-embeddings-v3: Multilingual embeddings with task

Saba Sturua and Isabelle Mohr and Mohammad Kalim Akram and Michael Günther and Bo Wang and Markus Krimmel and Feng Wang and Georgios Mastrapas and Andreas Koukounas and Nan Wang and Han Xiao , year=. jina-embeddings-v3: Multilingual embeddings with task. 2409.10173 , archivePrefix=

arXiv

[41] [41]

2025 , url=

Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren , journal=. 2025 , url=

2025

[42] [42]

mGTE : Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and Zhang, Meishan and Li, Wenjie and Zhang, Min. mGTE : Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval. Proceedings of the 2024 Conference on Empiri...

work page doi:10.18653/v1/2024.emnlp-industry.103 2024

[43] [43]

Unsupervised Cross-lingual Representation Learning at Scale

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020