SEA - HELM : S outheast A sian Holistic Evaluation of Language Models

Susanto, Yosephine, Hulagadri, Adithya Venkatadri, Montalan, Jann Railey, Ngui, Jian Gang, Yong, Xianbin, Leong, Wei Qi · 2025 · DOI 10.18653/v1/2025.findings-acl.636

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

SEATauBench: Adapting Tool-Agent-User Evaluation Into Low-Resource Southeast Asian Languages

cs.CL · 2026-06-27 · unverdicted · novelty 7.0

SEATauBench is the first agent benchmark for SEA languages, finding that performance holds for language-only changes but degrades sharply with full domain localization.

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

cs.CL · 2026-06-11 · unverdicted · novelty 6.0

MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.

DEPART: DEcomposing PARiTy across Multilingual LLMs

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.

DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

cs.CL · 2026-06-03 · unverdicted · novelty 5.0

DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.

citing papers explorer

Showing 5 of 5 citing papers after filters.

SEATauBench: Adapting Tool-Agent-User Evaluation Into Low-Resource Southeast Asian Languages cs.CL · 2026-06-27 · unverdicted · none · ref 50
SEATauBench is the first agent benchmark for SEA languages, finding that performance holds for language-only changes but degrades sharply with full domain localization.
SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia cs.CL · 2026-06-02 · unverdicted · none · ref 43
SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.
M\"OVE: A Holistic LLM Benchmark for the German Public Sector cs.CL · 2026-06-11 · unverdicted · none · ref 34
MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.
DEPART: DEcomposing PARiTy across Multilingual LLMs cs.CL · 2026-05-27 · unverdicted · none · ref 37
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer cs.CL · 2026-06-03 · unverdicted · none · ref 18
DuDi is a dual-signal distillation method with cross-lingual verbalizer that improves multilingual SLM performance on SEA languages and outperforms baselines on SEA-HELM.

SEA - HELM : S outheast A sian Holistic Evaluation of Language Models

fields

years

verdicts

representative citing papers

citing papers explorer