XL-SafetyBench is a new cross-cultural benchmark showing frontier LLMs decouple jailbreak robustness from cultural sensitivity while local models trade off attack success against neutral-safe rates in a near-linear pattern indicating generation failure rather than alignment.
RigoChat 2: An adapted language model to Spanish using a bounded dataset and reduced hardware
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
HULAT2 submitted three runs to the Spanish MER-TRANS 2026 track; a LangGraph multi-agent workflow with internal quality signals achieved the best SARI score (44.05) among them, outperforming a linear regeneration baseline.
citing papers explorer
-
XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity
XL-SafetyBench is a new cross-cultural benchmark showing frontier LLMs decouple jailbreak robustness from cultural sensitivity while local models trade off attack success against neutral-safe rates in a near-linear pattern indicating generation failure rather than alignment.
-
HULAT2 at MER-TRANS 2026: Governed Multi-Agent Simplification for Spanish Easy-to-Read Generation
HULAT2 submitted three runs to the Spanish MER-TRANS 2026 track; a LangGraph multi-agent workflow with internal quality signals achieved the best SARI score (44.05) among them, outperforming a linear regeneration baseline.