C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to synthesize better instruction data.
InProceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, Ying Ding, Jie Tang, Juan F
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 4roles
dataset 1polarities
use dataset 1representative citing papers
MEMOed framework attributes LLM generations about cultures to pretraining memorization and finds frequency-based biases across 110 cultures for food and clothing.
SemEval-2026 Task 7 presents a benchmark and two evaluation tracks for assessing LLMs on everyday knowledge in diverse languages and cultures without allowing training on the test data.
citing papers explorer
-
C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment
C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to synthesize better instruction data.
-
Attributing Culture-Conditioned Generations to Pretraining Corpora
MEMOed framework attributes LLM generations about cultures to pretraining memorization and finds frequency-based biases across 110 cultures for food and clothing.
-
SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
SemEval-2026 Task 7 presents a benchmark and two evaluation tracks for assessing LLMs on everyday knowledge in diverse languages and cultures without allowing training on the test data.
- Cross-Lingual Consensus: Aligning Multilingual Cultural Knowledge via Multilingual Self-Consistency