arxiv: 2604.20199 · v1 · submitted 2026-04-22 · 💻 cs.CL

Recognition: unknown

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Dan Wang , Guozhao Mo , Yafei Shi , Cheng Zhang , Bo Zheng , Boxi Cao , Xuanang Chen , Yaojie Lu

show 4 more authors

Hongyu Lin Ben He Xianpei Han Le Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:48 UTC · model grok-4.3

classification 💻 cs.CL

keywords multilingual RAGlanguage biasreranker alignmentcross-lingual retrievalretrieval-augmented generationLAURAgenerative utility

0 comments

The pith

Multilingual RAG rerankers favor English and the query language, creating a gap that LAURA closes by aligning scores to generative utility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multilingual retrieval-augmented generation systems exhibit language bias during reranking, systematically preferring English and the query's native language while suppressing useful evidence from other languages. An estimated oracle analysis quantifies the resulting performance gap between current rerankers and the best achievable evidence selection. The authors identify a distributional mismatch in which answer-critical documents are spread across languages yet current systems downrank them. They introduce LAURA to realign reranking scores according to the evidence's utility for the final generation task. Experiments across languages and models show that this reduces bias and raises mRAG performance.

Core claim

Current mRAG systems suffer from language bias during reranking that favors English and the query's native language, as shown by a substantial gap to the oracle upper bound and by systematic suppression of answer-critical documents scattered across multiple languages; LAURA bridges this gap by aligning multilingual evidence ranking directly with downstream generative utility.

What carries the argument

LAURA, a reranker that aligns multilingual evidence scores with the utility of that evidence for the downstream generation model rather than with language identity.

If this is right

Rerankers select evidence from a broader set of languages when it improves generation quality.
The gap between observed mRAG performance and the oracle bound shrinks.
Improvements appear consistently across tested languages and generation models without retraining the generator.
Answer-critical documents that were previously suppressed become available to the generator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same utility-alignment idea could be applied to other retrieval biases such as domain or cultural preferences.
Feedback from generation quality may become a standard signal for training cross-lingual retrievers.
Low-resource languages may show larger gains if the bias is stronger there.

Load-bearing premise

The estimated oracle evidence analysis gives a reliable upper bound and that realigning reranker scores to generative utility will not create new unintended biases or drops in specific languages.

What would settle it

Running the full LAURA pipeline on a new collection of languages and models and finding no reduction in language bias or no gain in end-to-end mRAG accuracy compared with the original reranker.

Figures

Figures reproduced from arXiv: 2604.20199 by Ben He, Boxi Cao, Bo Zheng, Cheng Zhang, Dan Wang, Guozhao Mo, Hongyu Lin, Le Sun, Xianpei Han, Xuanang Chen, Yafei Shi, Yaojie Lu.

**Figure 2.** Figure 2: Illustration of the oracle evidence estimation strategy, where candidate documnents are grouped by lan [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Heatmaps showing the proportion of selected document languages (y-axis) for each query language (x [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Two-stage data construction pipeline in the LAURA framework. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Language distribution of queries (inner ring) [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Vanilla document reranking with BGE-emma, BGE-Minicpm and Qwen3-Reranker-0.6B rerankers. The [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Oracle evidence estimation with BGE-Gemma, BGE-Minicpm and Qwen3-Reranker-0.6B rerankers. The [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

mRAG rerankers favor English and query language, LAURA realigns them to generation utility, but the oracle bound may overstate the fixable gap.

read the letter

The paper identifies a real distributional issue in multilingual RAG: rerankers systematically downweight answer-critical documents that sit in languages other than English or the query language. They measure the resulting performance gap with an estimated oracle analysis and introduce LAURA to train the reranker directly against downstream generative utility instead of isolated retrieval scores. That diagnosis and the alignment objective are the concrete new pieces relative to standard mRAG pipelines. The experiments report gains across several languages and generator models, which at least shows the approach is testable in practice. The work is useful for anyone who has to ship retrieval that serves non-English users, because it names a failure mode that shows up in production logs but rarely gets isolated in papers. The main soft spot is the oracle itself. If the upper bound is built by selecting evidence under assumptions that real cross-lingual fusion cannot meet, the reported bias gap inflates and LAURA's improvements become partly artifactual. The stress-test note is on target here; without tighter controls on how the oracle is constructed and without full per-language breakdowns, it is hard to rule out new drops in low-resource languages when the objective optimizes only for final answer utility. The citation pattern looks standard for the subfield and does not hide prior bias-mitigation work. This is for practitioners and researchers who maintain or evaluate multilingual retrieval stacks. A reader who already runs mRAG evaluations will get immediate value from the bias measurement even if they adapt the method. It deserves a serious referee because the problem is deployment-relevant and the proposed fix is straightforward to implement and test. I would send it for review with a request for clearer oracle details and language-specific result tables.

Referee Report

2 major / 2 minor

Summary. The paper claims that multilingual RAG systems exhibit language bias in reranking that favors English and the query's native language, quantifies a substantial gap to an estimated oracle upper bound via oracle evidence analysis, identifies a distributional mismatch where systems suppress answer-critical multi-language documents, and proposes LAURA (Language-Agnostic Utility-driven Reranker Alignment) to align reranker scores with downstream generative utility; experiments across languages and models reportedly show bias mitigation and consistent mRAG gains.

Significance. If the oracle bound is tight and LAURA's gains hold without new language-specific degradations, the work would provide a practical method to reduce language bias in mRAG, improving equity and performance for non-English and low-resource languages in retrieval-augmented generation.

major comments (2)

[Abstract / Oracle Evidence Analysis] The estimated oracle evidence analysis (described in the abstract as quantifying the performance gap) is load-bearing for the headline claim of a 'substantial performance gap'; its construction must be specified in detail (e.g., how optimal cross-lingual evidence is selected and whether it assumes idealized fusion) to confirm the bound is not inflated by unrealistic assumptions.
[Experiments] The claim that LAURA 'consistently improves mRAG performance' across diverse languages rests on the assumption that utility-driven alignment does not create compensating drops in low-resource languages; experiments must report per-language breakdowns, statistical tests, and controls to rule out new biases introduced by the generative-utility proxy.

minor comments (2)

[Abstract] The abstract introduces the LAURA acronym and method but provides no concrete details on baselines, reranker models, or generation models used; adding a brief experimental summary would improve readability.
[Introduction / Analysis] Ensure all terms such as 'answer-critical' documents and 'distributional mismatch' are defined on first use with a reference to the relevant analysis section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract / Oracle Evidence Analysis] The estimated oracle evidence analysis (described in the abstract as quantifying the performance gap) is load-bearing for the headline claim of a 'substantial performance gap'; its construction must be specified in detail (e.g., how optimal cross-lingual evidence is selected and whether it assumes idealized fusion) to confirm the bound is not inflated by unrealistic assumptions.

Authors: We agree that additional detail on the oracle construction is warranted to support the performance gap claim. In the revised manuscript, we will expand the description in Section 3.2 (and add a clarifying paragraph in the abstract if space permits) to specify the exact procedure: the oracle enumerates feasible combinations of documents from the multilingual retrieval pool, selects the subset that maximizes downstream answer accuracy (exact match/F1) when fed to the same generator and fusion method used by the evaluated mRAG systems, and reports the resulting upper-bound performance. No idealized cross-lingual fusion or perfect retrieval is assumed beyond what is achievable with the given evidence pool and standard concatenation. This makes the bound a realistic, tight estimate rather than an inflated theoretical ceiling. revision: yes
Referee: [Experiments] The claim that LAURA 'consistently improves mRAG performance' across diverse languages rests on the assumption that utility-driven alignment does not create compensating drops in low-resource languages; experiments must report per-language breakdowns, statistical tests, and controls to rule out new biases introduced by the generative-utility proxy.

Authors: We appreciate the referee's emphasis on verifying the absence of new biases. The current manuscript already provides per-language breakdowns in the appendix tables, which show gains (or no degradation) across both high- and low-resource languages. In the revision we will move the key per-language results to the main paper, add paired statistical significance tests (t-tests with p-values across multiple random seeds), and include control experiments that compare LAURA against both language-specific rerankers and a language-agnostic baseline. These additions will explicitly demonstrate that the generative-utility alignment does not introduce compensating drops or new language biases. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical alignment method with independent experimental validation

full rationale

The paper presents LAURA as a utility-driven reranker alignment technique for mRAG, supported by experiments across languages and models that demonstrate bias mitigation and performance gains. No equations, derivations, or self-referential constructions appear in the abstract or described method. The estimated oracle analysis quantifies gaps but is not shown to reduce to a fitted parameter or self-definition by construction. Any self-citations (if present in full text) are not load-bearing for the central claim, which rests on external empirical results rather than internal redefinition. This is a standard empirical contribution without the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the oracle analysis and LAURA objective are described at a high level without mathematical or implementation details.

pith-pipeline@v0.9.0 · 5500 in / 1084 out tokens · 34264 ms · 2026-05-09T23:48:10.652742+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 25 canonical work pages · 4 internal anchors

[3]

Akari Asai, Xinyan Yu, Jungo Kasai, and Hanna Hajishirzi. 2021 b . One question answering model for many languages with cross-lingual dense passage retrieval. Advances in Neural Information Processing Systems, 34:7547--7560

2021
[4]

Akari Asai, Xinyan Yu, Jungo Kasai, and Hannaneh Hajishirzi. 2021 c . One question answering model for many languages with cross-lingual dense passage retrieval. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS '21, Red Hook, NY, USA. Curran Associates Inc

2021
[5]

Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, and Rodrigo Nogueira. 2022. https://arxiv.org/abs/2108.13897 mmarco: A multilingual version of the ms marco passage ranking dataset . Preprint, arXiv:2108.13897

work page arXiv 2022
[6]

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation
[8]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, and others. 2025. https://arxiv.org/abs/2501.12948 Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning . Preprint, arXiv:2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, and 1 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3 herd of models . Preprint, arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Patrick Lewis, Ethan Perez, Aleksandara Piktus, Filippo Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv: Computation and Language,arXiv: Computation and Language

2020
[16]

Ilya Loshchilov and Frank Hutter. 2019. https://openreview.net/forum?id=Bkg6RiCqY7 Decoupled weight decay regularization . In International Conference on Learning Representations

2019
[20]

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, and 25 others. 2025. https://arxiv.org/abs/2412.15115 Qwen2.5 technical report . Preprint, arXiv:2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[27]

Publications Manual , year = "1983", publisher =

1983
[28]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[29]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[30]

Dan Gusfield , title =. 1997

1997
[31]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[32]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[33]

Retrieval-augmented generation in multilingual settings

Chirkova, Nadezhda and Rau, David and D \'e jean, Herv \'e and Formal, Thibault and Clinchant, St \'e phane and Nikoulina, Vassilina. Retrieval-augmented generation in multilingual settings. Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024). 2024. doi:10.18653/v1/2024.knowllm-1.15

work page doi:10.18653/v1/2024.knowllm-1.15 2024
[34]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[35]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024
[36]

MKQA : A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Longpre, Shayne and Lu, Yi and Daiber, Joachim , year=. MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering , url=. doi:10.1162/tacl_a_00433 , journal=

work page doi:10.1162/tacl_a_00433
[37]

https://aclanthology.org/ Q19-1026/

Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav , year=...

work page doi:10.1162/tacl_a_00276
[38]

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation , author=
[39]

2024 , howpublished =

Model Card for Command-R , author =. 2024 , howpublished =

2024
[40]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal=

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandara and Petroni, Filippo and Karpukhin, Vladimir and Goyal, Naman and Küttler, Heinrich and Lewis, Mike and Yih, Wen-tau and Rocktäschel, Tim and Riedel, Sebastian and Kiela, Douwe , year=. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal=
[41]

In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics, 11:1316–1331, 2023

Ram, Ori and Levine, Yoav and Dalmedigos, Itay and Muhlgay, Dor and Shashua, Amnon and Leyton-Brown, Kevin and Shoham, Yoav. In-Context Retrieval-Augmented Language Models. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00605

work page doi:10.1162/tacl_a_00605 2023
[42]

Retrieval-based Language Models and Applications

Asai, Akari and Min, Sewon and Zhong, Zexuan and Chen, Danqi. Retrieval-based Language Models and Applications. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts). 2023. doi:10.18653/v1/2023.acl-tutorials.6

work page doi:10.18653/v1/2023.acl-tutorials.6 2023
[43]

2024 , eprint=

Retrieval-Augmented Generation for Large Language Models: A Survey , author=. 2024 , eprint=

2024
[44]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =

Fan, Wenqi and Ding, Yujuan and Ning, Liangbo and Wang, Shijie and Li, Hengyun and Yin, Dawei and Chua, Tat-Seng and Li, Qing , title =. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =. 2024 , isbn =. doi:10.1145/3637528.3671470 , abstract =

work page doi:10.1145/3637528.3671470 2024
[45]

Advances in Neural Information Processing Systems , volume=

One question answering model for many languages with cross-lingual dense passage retrieval , author=. Advances in Neural Information Processing Systems , volume=
[46]

B ord IR lines: A Dataset for Evaluating Cross-lingual Retrieval Augmented Generation

Li, Bryan and Haider, Samar and Luo, Fiona and Agashe, Adwait and Callison-Burch, Chris. B ord IR lines: A Dataset for Evaluating Cross-lingual Retrieval Augmented Generation. Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia. 2024. doi:10.18653/v1/2024.wikinlp-1.3

work page doi:10.18653/v1/2024.wikinlp-1.3 2024
[47]

Investigating Language Preference of Multilingual RAG Systems

Park, Jeonghyun and Lee, Hwanhee. Investigating Language Preference of Multilingual RAG Systems. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.295

work page doi:10.18653/v1/2025.findings-acl.295 2025
[48]

The Cross-Lingual Cost: Retrieval Biases in RAG over A rabic- E nglish Corpora

Amiraz, Chen and Fyodorov, Yaroslav and Haramaty, Elad and Karnin, Zohar and Lewin-Eytan, Liane. The Cross-Lingual Cost: Retrieval Biases in RAG over A rabic- E nglish Corpora. Proceedings of The Third Arabic Natural Language Processing Conference. 2025. doi:10.18653/v1/2025.arabicnlp-main.6

work page doi:10.18653/v1/2025.arabicnlp-main.6 2025
[49]

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

Qi, Jirui and Fern \'a ndez, Raquel and Bisazza, Arianna. On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025). 2025. doi:10.18653/v1/2025.mrl-main.15

work page doi:10.18653/v1/2025.mrl-main.15 2025
[50]

Quality-Aware Translation Tagging in Multilingual RAG system

Moon, Hoyeon and Kim, Byeolhee and Verma, Nikhil. Quality-Aware Translation Tagging in Multilingual RAG system. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025). 2025. doi:10.18653/v1/2025.mrl-main.12

work page doi:10.18653/v1/2025.mrl-main.12 2025
[51]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review arXiv
[52]

SWIFT: A scalable lightweight infrastructure for fine-tuning

SWIFT: A Scalable Lightweight Infrastructure for Fine-Tuning , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2025 , month=. doi:10.1609/aaai.v39i28.35383 , abstractNote=

work page doi:10.1609/aaai.v39i28.35383 2025
[53]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=
[54]

Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

Gao, Yifan and Yin, Qingyu and Li, Zheng and Meng, Rui and Zhao, Tong and Yin, Bing and King, Irwin and Lyu, Michael. Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.92

work page doi:10.18653/v1/2022.findings-naacl.92 2022
[55]

XOR QA : Cross-lingual Open-Retrieval Question Answering

Asai, Akari and Kasai, Jungo and Clark, Jonathan and Lee, Kenton and Choi, Eunsol and Hajishirzi, Hannaneh. XOR QA : Cross-lingual Open-Retrieval Question Answering. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.46

work page doi:10.18653/v1/2021.naacl-main.46 2021
[56]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Asai, Akari and Yu, Xinyan and Kasai, Jungo and Hajishirzi, Hannaneh , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

2021
[57]

ACM Trans

Zhang, Xinyu and Ogueji, Kelechi and Ma, Xueguang and Lin, Jimmy , title =. ACM Trans. Inf. Syst. , month = sep, articleno =. 2023 , issue_date =. doi:10.1145/3613447 , abstract =

work page doi:10.1145/3613447 2023
[58]

2024 , eprint=

What are the limits of cross-lingual dense passage retrieval for low-resource languages? , author=. 2024 , eprint=

2024
[59]

2024 , eprint=

Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation , author=. 2024 , eprint=

2024
[60]

Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

Sharma, Nikhil and Murray, Kenton and Xiao, Ziang. Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.411

work page doi:10.18653/v1/2025.naacl-long.411 2025
[61]

2025 , eprint=

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. 2025 , eprint=

2025
[62]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025
[63]

and Lee, Kenton and Choi, Eunsol and Hajishirzi, Hannaneh , year=

Asai, Akari and Kasai, Jungo and Clark, JonathanH. and Lee, Kenton and Choi, Eunsol and Hajishirzi, Hannaneh , year=. XOR QA: Cross-lingual Open-Retrieval Question Answering , journal=
[64]

and Choi, Eunsol and Collins, Michael and Garrette, Dan and Kwiatkowski, Tom and Nikolaev, Vitaly and Palomaki, Jennimaria , year =

Clark, Jonathan H. and Choi, Eunsol and Collins, Michael and Garrette, Dan and Kwiatkowski, Tom and Nikolaev, Vitaly and Palomaki, Jennimaria , year=. T<scp>y</scp>D<scp>i</scp> QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , url=. doi:10.1162/tacl_a_00317 , journal=

work page doi:10.1162/tacl_a_00317
[65]

In: Yang, G.H., Wang, H., Han, S., Hauff, C., Zuccon, G., Zhang, Y

Yang, Eugene and J\". Language Fairness in Multilingual Information Retrieval , year =. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. doi:10.1145/3626772.3657943 , abstract =

work page doi:10.1145/3626772.3657943
[66]

Liu, Wei and Trenous, Sony and Ribeiro, Leonardo F. R. and Byrne, Bill and Hieber, Felix. XRAG : Cross-lingual Retrieval-Augmented Generation. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.849

work page doi:10.18653/v1/2025.findings-emnlp.849 2025
[67]

2022 , eprint=

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset , author=. 2022 , eprint=

2022