SEATauBench is the first agent benchmark for SEA languages, finding that performance holds for language-only changes but degrades sharply with full domain localization.
SEA - HELM : S outheast A sian Holistic Evaluation of Language Models
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 5years
2026 5verdicts
UNVERDICTED 5representative citing papers
SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.
MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.
citing papers explorer
-
SEATauBench: Adapting Tool-Agent-User Evaluation Into Low-Resource Southeast Asian Languages
SEATauBench is the first agent benchmark for SEA languages, finding that performance holds for language-only changes but degrades sharply with full domain localization.
-
SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia
SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.
-
M\"OVE: A Holistic LLM Benchmark for the German Public Sector
MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.
-
DEPART: DEcomposing PARiTy across Multilingual LLMs
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
-
DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer
DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.