MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
A survey on multilingual large language models: corpora, alignment, and bias , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
HAT Score analysis of 20 models on 3 benchmarks finds transfer functional in small models, slower-than-expected gains with scale, and clear progress over time.
XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.
citing papers explorer
-
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation
MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
-
Are Multilingual Models Actually Improving? Isolating True Cross-Lingual Transfer
HAT Score analysis of 20 models on 3 benchmarks finds transfer functional in small models, slower-than-expected gains with scale, and clear progress over time.
-
Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus
XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.