A new dataset design for LLM cultural alignment assessment yields test sets with greater power to discriminate models specialized for a given culture from those that are not.
Progressing beyond Art Masterpieces or Touristic Clich\'es: how to assess your LLMs for cultural alignment?
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Although the cultural (mis)alignment of Large Language Models (LLMs) has attracted increasing attention -- often framed in terms of cultural bias -- until recently there has been limited work on the design and development of datasets for cultural assessment. Here, we review existing approaches to such datasets and identify their main limitations. To address these issues, we propose design guidelines for annotators and report on the construction of a dataset built according to these principles. We further present a series of contrastive experiments conducted with this dataset. The results demonstrate that our design yields test sets with greater discriminative power, effectively distinguishing between models specialized for a given culture and those that are not, ceteris paribus.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Progressing beyond Art Masterpieces or Touristic Clich\'es: how to assess your LLMs for cultural alignment?
A new dataset design for LLM cultural alignment assessment yields test sets with greater power to discriminate models specialized for a given culture from those that are not.