XL-SafetyBench is a new cross-cultural benchmark showing frontier LLMs decouple jailbreak robustness from cultural sensitivity while local models trade off attack success against neutral-safe rates in a near-linear pattern indicating generation failure rather than alignment.
Gaperon: A peppered English–French generative language model suite
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
Using a 1PL IRT model on real cultural questions across 13 locales, the study identifies a local-language knowledge-access advantage masked by lower proficiency in raw accuracy.
Unlearning one backdoor in LLMs generalizes to suppress other backdoors across three model families, with a new metric to measure activation shifts.
CARTE is a new benchmark for fine-grained regional knowledge in France that shows LLMs exhibit performance gaps across regions and scales, pointing to uneven pretraining coverage.
citing papers explorer
-
XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity
XL-SafetyBench is a new cross-cultural benchmark showing frontier LLMs decouple jailbreak robustness from cultural sensitivity while local models trade off attack success against neutral-safe rates in a near-linear pattern indicating generation failure rather than alignment.
-
The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs
Using a 1PL IRT model on real cultural questions across 13 locales, the study identifies a local-language knowledge-access advantage masked by lower proficiency in raw accuracy.
-
Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs
Unlearning one backdoor in LLMs generalizes to suppress other backdoors across three model families, with a new metric to measure activation shifts.
-
CARTE: A Benchmark for Mapping Language Model Knowledge Across France
CARTE is a new benchmark for fine-grained regional knowledge in France that shows LLMs exhibit performance gaps across regions and scales, pointing to uneven pretraining coverage.