StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
HONEST : Measuring hurtful sentence completion in language models
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
English-only safety alignment fails to transfer cross-lingually, while multilingual DPO training on the new RefusEU dataset improves safety across 12 European languages without degrading Global MMLU performance.
citing papers explorer
-
StarCoder 2 and The Stack v2: The Next Generation
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
-
Ethical and social risks of harm from Language Models
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
-
Multilingual Refusal Alignment for Safer Large Language Models
English-only safety alignment fails to transfer cross-lingually, while multilingual DPO training on the new RefusEU dataset improves safety across 12 European languages without degrading Global MMLU performance.