A survey on multilingual large language models: corpora, alignment, and bias , volume=

Xu, Yuemei, Hu, Ling, Zhao, Jiayi, Qiu, Zihan, Xu, Kexin, Ye, Yuqi · DOI 10.1007/s11704-024-40579-4

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.

Are Multilingual Models Actually Improving? Isolating True Cross-Lingual Transfer

cs.CL · 2026-06-20 · unverdicted · novelty 6.0

HAT Score analysis of 20 models on 3 benchmarks finds transfer functional in small models, slower-than-expected gains with scale, and clear progress over time.

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

cs.CL · 2026-06-13 · unverdicted · novelty 6.0

XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.

citing papers explorer

Showing 3 of 3 citing papers.

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation cs.CL · 2026-04-20 · unverdicted · none · ref 37
MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
Are Multilingual Models Actually Improving? Isolating True Cross-Lingual Transfer cs.CL · 2026-06-20 · unverdicted · none · ref 55
HAT Score analysis of 20 models on 3 benchmarks finds transfer functional in small models, slower-than-expected gains with scale, and clear progress over time.
Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus cs.CL · 2026-06-13 · unverdicted · none · ref 15
XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.

A survey on multilingual large language models: corpora, alignment, and bias , volume=

fields

years

verdicts

representative citing papers

citing papers explorer