Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora
Pith reviewed 2026-05-22 05:32 UTC · model grok-4.3
The pith
Machine translation preserves subtle moral semantics in social media texts across languages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Despite shortcomings in handling slang, vulgarity, and culturally-loaded expressions, direct translation preserves subtle moral cues well enough to be harvested by cross-lingual machine learning -- with mean cosine similarity of 0.86 and AUC gaps of 0.01--0.02 across all foundations closing further under fine-tuning of language models.
What carries the argument
Four-method validation pipeline of LaBSE cross-lingual embedding similarity, Centered Kernel Alignment (CKA), LLM-as-judge evaluation, and deep learning classifier parity tests.
If this is right
- Moral values classification extends to Polish and related Slavic languages using translated English data.
- Fine-tuning language models reduces the remaining performance gaps between languages.
- Machine translation offers a practical, low-cost route to moral research in languages without native annotated corpora.
- The same translation-plus-validation approach supports generalization to other under-resourced languages.
Where Pith is reading between the lines
- The pipeline could transfer to other value-laden domains such as political or ethical discourse analysis.
- Small native-language validation sets could be added to translated data to handle rare cultural moral expressions.
- The method invites tests on more distant language pairs to check how far the preservation holds.
- Real-time monitoring of moral language across multilingual social media becomes more feasible without new annotations in every language.
Load-bearing premise
That embedding similarity, kernel alignment, judge scores, and classifier parity together sufficiently confirm preservation of subtle moral meanings even for culturally specific or low-frequency expressions.
What would settle it
Native Polish speakers rating paired original English and translated Polish posts as carrying substantially different moral foundations, or classifiers trained on translated data showing large performance drops on held-out native Polish text.
Figures
read the original abstract
Moral language is subtle and culturally variable, making it difficult to translate faithfully across languages. Idiomatic expressions, slang, and cultural references introduce hard-to-avoid translation artifacts. Yet automated moral values classification depends on language-specific annotated corpora that exist almost exclusively in English. We investigate whether LLM-based translation can bridge this gap, taking Polish as a test case. Using $\sim$50k morally-annotated social media posts from a diverse range of topics, we apply a principled four-method validation pipeline: LaBSE cross-lingual embedding similarity, Centered Kernel Alignment (CKA), LLM-as-judge evaluation, and deep learning classifier parity tests. We show that despite shortcomings in handling slang, vulgarity, and culturally-loaded expressions, direct translation preserves subtle moral cues well enough to be harvested by cross-lingual machine learning -- with mean cosine similarity of 0.86 and AUC gaps of 0.01--0.02 across all foundations closing further under fine-tuning of language models. These results demonstrate that machine translation is a practical and cost-effective path to moral values research in languages currently under-resourced in this domain. We demonstrate this for Polish as a representative Slavic language, with expected generalisation to related languages.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLM-based direct translation of Polish social media posts preserves subtle moral semantics sufficiently for cross-lingual machine learning, despite shortcomings with slang, vulgarity, and cultural expressions. Using ~50k morally annotated posts and a four-method validation pipeline (LaBSE cosine similarity, CKA, LLM-as-judge, and classifier parity), it reports a mean similarity of 0.86 and AUC gaps of 0.01-0.02 across foundations that narrow further with fine-tuning, positioning translation as a practical route for under-resourced languages such as Polish.
Significance. If the central results hold, the work supplies a concrete, low-cost method for extending Moral Foundations research beyond English by leveraging existing annotations and models. The provision of specific quantitative benchmarks (0.86 similarity, small AUC gaps) together with four complementary checks is a positive feature that supports potential generalization to related Slavic languages.
major comments (2)
- [Methods] Methods section: the manuscript provides no details on sampling of the ~50k posts, the specific translation model or LLM used, or the exact exclusion rules applied to posts containing slang or vulgarity. These omissions are load-bearing because they prevent assessment of whether post-hoc decisions inflate the reported LaBSE similarity and AUC parity.
- [Validation pipeline] Validation pipeline and results: the four proxies (LaBSE, CKA, LLM-as-judge, classifier AUC) could remain high even if translation alters nuanced moral framing in low-frequency or culturally specific Polish expressions, since embeddings and classifiers may capture topical overlap or surface patterns rather than original moral semantics. A targeted test isolating culturally loaded items is needed to support the claim that subtle cues survive.
minor comments (2)
- [Abstract] Abstract: state explicitly whether translation is Polish-to-English, English-to-Polish, or bidirectional, and list the specific Moral Foundations examined.
- [Results] Results: report statistical significance, confidence intervals, or p-values for the AUC gaps and similarity scores to allow readers to judge the practical importance of the 0.01-0.02 differences.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important areas for improving the clarity and robustness of our work. We address each major comment below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section: the manuscript provides no details on sampling of the ~50k posts, the specific translation model or LLM used, or the exact exclusion rules applied to posts containing slang or vulgarity. These omissions are load-bearing because they prevent assessment of whether post-hoc decisions inflate the reported LaBSE similarity and AUC parity.
Authors: We agree that greater methodological transparency is essential for reproducibility and for allowing readers to evaluate potential selection effects. In the revised manuscript we will add a dedicated subsection to the Methods that specifies: (1) the sampling procedure and source of the ~50k morally annotated posts, including any topic diversity criteria; (2) the exact translation model and LLM version used for direct translation; and (3) the precise exclusion or handling rules applied to posts containing slang, vulgarity, or culturally specific expressions. These details will demonstrate that decisions were made prior to analysis rather than post hoc. revision: yes
-
Referee: [Validation pipeline] Validation pipeline and results: the four proxies (LaBSE, CKA, LLM-as-judge, classifier AUC) could remain high even if translation alters nuanced moral framing in low-frequency or culturally specific Polish expressions, since embeddings and classifiers may capture topical overlap or surface patterns rather than original moral semantics. A targeted test isolating culturally loaded items is needed to support the claim that subtle cues survive.
Authors: We acknowledge that aggregate metrics across the full corpus could mask translation issues specific to low-frequency or culturally loaded expressions. Although the LLM-as-judge component was intended to probe semantic fidelity beyond surface patterns, we agree that an explicit targeted analysis would provide stronger support for our claims. In the revision we will add a new analysis subsection that isolates a subset of posts containing Polish idioms, slang, and culturally specific references, reporting LaBSE similarity and LLM-as-judge scores for this subset alone to directly test preservation of moral semantics in these challenging cases. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper presents an empirical study using a four-method validation pipeline consisting of LaBSE cross-lingual embedding similarity, Centered Kernel Alignment (CKA), LLM-as-judge evaluation, and deep learning classifier parity tests on held-out data. The reported metrics (mean cosine similarity of 0.86 and AUC gaps of 0.01-0.02) are computed directly from these external benchmarks and standard classifiers rather than being defined in terms of the authors' own choices or prior results. No equations, self-citations, or fitted parameters are shown that reduce the central claim to a self-referential construction. The derivation is self-contained against independent external benchmarks, with no load-bearing steps that qualify as self-definitional, fitted-input predictions, or ansatz smuggling.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Moral foundations categories transfer across languages in a way that can be measured by embedding similarity and classifier parity
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
mean cosine similarity of 0.86 and AUC gaps of 0.01–0.02 across all foundations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guil- laume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. InProceedings of ACL. 8440–8451
work page 2020
-
[2]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media. 512–515
work page 2017
-
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. 4171–4186
work page 2019
-
[4]
Matthew Feinberg and Robb Willer. 2013. The Moral Roots of Environmental Attitudes.Psychological Science24, 1 (2013), 56–62
work page 2013
- [5]
-
[6]
Jesse Graham, Jonathan Haidt, Sena Koleva, Matt Motyl, Ravi Iyer, Sean Wojcik, and Peter H Ditto. 2013. Moral foundations theory: The pragmatic validity of moral pluralism.Advances in Experimental Social Psychology47 (2013), 55–130
work page 2013
-
[7]
Jesse Graham, Jonathan Haidt, and Brian A Nosek. 2009. Liberals and conserva- tives rely on different sets of moral foundations.Journal of personality and social psychology96, 5 (2009), 1029–1046. doi:10.1037/a0015141
-
[8]
Jonathan Haidt and Craig Joseph. 2004. Intuitive ethics: How innately prepared intuitions generate culturally variable virtues.Daedalus133, 4 (2004), 55–66
work page 2004
-
[9]
Chang, Jenna Chin, Christian Leong, Jun Yen Leung, Arineh Mirinjian, and Morteza Dehghani
Joe Hoover, Gwenyth Portillo-Wightman, Leigh Yeh, Shreya Havaldar, Aida Mostafazadeh Davani, Ying Lin, Brendan Kennedy, Mohammad Atari, Zahra Kamel, Madelyn Mendlen, Gabriela Moreno, Christina Park, Tingyee E. Chang, Jenna Chin, Christian Leong, Jun Yen Leung, Arineh Mirinjian, and Morteza Dehghani. 2020. Moral Foundations Twitter Corpus: A Collection of ...
-
[10]
Frederic R. Hopp, Jacob T. Fisher, Devin Cornell, Richard Huskey, and René Weber
-
[11]
The Extended Moral Foundations Dictionary (eMFD): Development and Applications of a Crowd-Sourced Approach to Extracting Moral Intuitions from Text.Behavior Research Methods53, 1 (Feb. 2021), 232–246. doi:10.3758/s13428- 020-01433-0
-
[12]
Katharina Kann, Ryan Cotterell, and Hinrich Schütze. 2017. One-Shot Neural Cross-Lingual Transfer for Paradigm Completion. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, Canada, 1993–2003. doi:10...
-
[13]
Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning. 3519–3529
work page 2019
-
[14]
Tuan Dung Nguyen, Ziyu Chen, Nicholas George Carroll, Alasdair Tran, Colin Klein, and Lexing Xie. 2024. Measuring Moral Dimensions in Social Media with Mformer.Proceedings of the International AAAI Conference on Web and Social Media18 (May 2024), 1134–1147. doi:10.1609/icwsm.v18i1.31378
-
[15]
Tuan Dung Nguyen, Georgina Lyall, Alasdair Tran, Minkyoung Shin, Nicholas G Carroll, Colin Klein, and Lexing Xie. 2022. Mapping Topics in 100,000 Real-Life Moral Dilemmas. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 699–710
work page 2022
-
[16]
Gabriel Nicholas and Aliya Bhatia. 2023. Lost in Translation: Large Language Models in Non-English Content Analysis.arXiv e-prints(2023), arXiv–2306
work page 2023
-
[17]
Flor Miriam Plaza-del Arco, Amanda Cercas Curry, Alba Curry, Gavin Aber- crombie, and Dirk Hovy. 2024. Angry men, sad women: Large language models reflect gendered stereotypes in emotion attribution. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 7682–7696
work page 2024
-
[18]
Vjosa Preniqi, Iacopo Ghinassi, Julia Ive, Charalampos Saitis, and Kyriaki Kalimeri. 2024. MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions. InProceedings of the 2024 International Conference on Information Technology for Social Good. ACM, Bremen Germany, 433–442. doi:10.1145/3677525.3678694
-
[19]
Shamik Roy and Dan Goldwasser. 2021. Analysis of Nuanced Stances and Senti- ment Towards Entities of US Politicians through the Lens of Moral Foundation Theory. InProceedings of the Ninth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Online, 1–13. doi:10.18653/v1/2021.socialnlp-1.1
-
[20]
Maciej Skorski and Alina Landowska. 2025. Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding. InProceedings of the 2nd Workshop on Uncertainty-A ware NLP (UncertaiNLP 2025), Bryan Eikema, Raúl Vázquez, Jonathan Berant, Marie-Catherine de Marneffe, Barbara Plank, Artem Shelmanov, Swabha Swayamdipta, Jörg Tiedemann, Chrysoula Z...
-
[21]
Maciej Skorski and Alina Landowska. 2025. The Moral Gap of Large Language Models. (2025). arXiv:2507.18523 [cs] doi:10.13140/RG.2.2.26221.70880
-
[22]
Jackson Trager, Alireza S. Ziabari, Aida Mostafazadeh Davani, Preni Golazizian, Farzan Karimi-Malekabadi, Ali Omrani, Zhihe Li, Brendan Kennedy, Nils Karl Reimer, Melissa Reyes, Kelsey Cheng, Mellow Wei, Christina Merrifield, Arta Khosravi, Evans Alvarez, and Morteza Dehghani. 2022. The Moral Foundations Reddit Corpus. doi:10.48550/ARXIV.2208.05545
-
[23]
Greco, Davide Picca, and Andrea Tagarelli
Lorenzo Zangari, Candida M. Greco, Davide Picca, and Andrea Tagarelli. 2025. ME2-BERT: Are Events and Emotions What You Need for Moral Foundation Prediction?. InProceedings of the 31st International Conference on Computational Linguistics, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert (Eds.). Asso...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.