Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Safetyprompts: a systematic review of open datasets for evaluating, improving large language model safety , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Measuring Distribution Shift in User Prompts and Its Effects on LLM Performance

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

The LENS framework applied to 192 real-world settings shows moderate natural prompt distribution shifts cause 73% average performance loss in deployed LLMs, especially across user groups and regions.

Multilingual Refusal Alignment for Safer Large Language Models

cs.CL · 2026-04-24 · conditional · novelty 5.0

English-only safety alignment fails to transfer cross-lingually, while multilingual DPO training on the new RefusEU dataset improves safety across 12 European languages without degrading Global MMLU performance.

citing papers explorer

Showing 2 of 2 citing papers.

Measuring Distribution Shift in User Prompts and Its Effects on LLM Performance cs.CL · 2026-04-19 · unverdicted · none · ref 78
The LENS framework applied to 192 real-world settings shows moderate natural prompt distribution shifts cause 73% average performance loss in deployed LLMs, especially across user groups and regions.
Multilingual Refusal Alignment for Safer Large Language Models cs.CL · 2026-04-24 · conditional · none · ref 35
English-only safety alignment fails to transfer cross-lingually, while multilingual DPO training on the new RefusEU dataset improves safety across 12 European languages without degrading Global MMLU performance.

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

fields

years

verdicts

representative citing papers

citing papers explorer