LLMs infer cultural context from cues but fail to apply it for adapted responses unless prompted sequentially, shown via the CAPRI dataset on units, time, and quantity expressions.
Communicate to Play: Pragmatic Reasoning for Efficient Cross-Cultural Communication
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
citing papers explorer
-
LLMs Infer Cultural Context but Fail to Apply It When Responding
LLMs infer cultural context from cues but fail to apply it for adapted responses unless prompted sequentially, shown via the CAPRI dataset on units, time, and quantity expressions.
-
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.