AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian
Pith reviewed 2026-06-29 17:48 UTC · model grok-4.3
The pith
The first safety evaluation dataset for LLMs in Albanian contains 2,951 prompts across 11 categories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, containing 2,951 prompts spanning 11 safety categories, including self-harm, violence, racist content, child exploitation, and radicalization, with an average of 268 prompts per category. Each prompt is provided in Albanian with an English reference translation and a detailed category label. This resource addresses a significant gap in safety evaluation infrastructure for low-resource languages and provides an essential benchmark for developing safer, more inclusive LLMs.
What carries the argument
The AlbanianLLMSafety dataset, which supplies categorized prompts in Albanian with English translations to evaluate LLM safety responses.
If this is right
- Enables systematic safety evaluation of LLMs when used with Albanian inputs.
- Supports red-teaming and guardrail development targeted at Albanian-speaking communities.
- Provides a benchmark for fine-tuning models to reduce unsafe outputs in low-resource settings.
- Allows direct comparison of model safety performance across languages.
Where Pith is reading between the lines
- The same prompt-collection method could be applied to create safety datasets for other low-resource languages.
- Using the dataset alongside English safety sets might reveal whether multilingual models treat safety risks differently by language.
- Developers could extend the resource with new prompts based on actual user reports from Albanian speakers.
Load-bearing premise
The selected prompts and category labels accurately reflect safety risks relevant to Albanian speakers and are free of unstated cultural or translation biases that would undermine evaluation utility.
What would settle it
A review by Albanian native speakers that finds a substantial share of the prompts fail to match real safety concerns in Albanian communities would undermine the dataset's claimed utility.
read the original abstract
Safety evaluation of Large Language Models (LLMs) has largely focused on high-resource languages, leaving low-resource languages critically underserved. We present AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, a linguistically distinct low-resource language with approximately 7.5 million speakers across Albania, Kosovo, North Macedonia, and the diaspora. The dataset contains 2,951 prompts spanning 11 safety categories, including self-harm, violence, racist content, child exploitation, and radicalization, with an average of 268 prompts per category. Each prompt is provided in Albanian with an English reference translation and a detailed category label. This resource addresses a significant gap in safety evaluation infrastruc-ture for low-resource languages and provides an essential benchmark for developing safer, more inclusive LLMs. The dataset will be provided upon request to support safety evaluation, fine-tuning, red-teaming, and guardrail development for Albanian-speaking communities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, consisting of 2,951 prompts across 11 safety categories such as self-harm, violence, racist content, child exploitation, and radicalization. Each prompt is provided in Albanian with an English reference translation and a detailed category label, addressing the gap in safety evaluation infrastructure for this low-resource language with 7.5 million speakers.
Significance. If the prompts are shown to be appropriately sourced and validated, the dataset would provide a valuable first benchmark for Albanian LLM safety evaluation, enabling red-teaming, fine-tuning, and guardrail development for an underserved language community. The work's strength is its focus on extending safety resources beyond high-resource languages.
major comments (2)
- [Abstract] Abstract: The central claim that the 2,951 prompts 'accurately reflect safety risks relevant to Albanian speakers' cannot be evaluated because the manuscript supplies no account of prompt sourcing, authorship, native-speaker validation for cultural fit, or how the 11 category boundaries were defined and assigned.
- [Abstract] Abstract: The statement that the dataset is 'publicly available' is contradicted by the later claim that it 'will be provided upon request,' which directly affects the accessibility and reproducibility of the claimed contribution.
minor comments (2)
- [Abstract] Abstract: Typo in 'infrastruc-ture' (hyphenated across line break).
- [Abstract] Abstract: The reported average of 268 prompts per category is approximate (2,951 / 11 = 268.27); clarify whether this is exact or rounded.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and commit to revising the paper to strengthen the description of the dataset creation process and to resolve the inconsistency in availability statements.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the 2,951 prompts 'accurately reflect safety risks relevant to Albanian speakers' cannot be evaluated because the manuscript supplies no account of prompt sourcing, authorship, native-speaker validation for cultural fit, or how the 11 category boundaries were defined and assigned.
Authors: We acknowledge that the current version of the manuscript does not include a dedicated account of prompt sourcing, authorship, native-speaker validation, or the process for defining and assigning the 11 categories. In the revised manuscript we will add a new Methods section that details: (1) the sources from which prompts were drawn or adapted, (2) the involvement of native Albanian speakers in review for cultural fit, and (3) the iterative process used to establish category boundaries and assign labels. This addition will allow readers to evaluate the relevance claim directly. revision: yes
-
Referee: [Abstract] Abstract: The statement that the dataset is 'publicly available' is contradicted by the later claim that it 'will be provided upon request,' which directly affects the accessibility and reproducibility of the claimed contribution.
Authors: We agree that the abstract contains an internal contradiction on dataset availability. In the revision we will replace the conflicting phrasing with a clear statement that the dataset will be released publicly on a permanent repository (e.g., Hugging Face Datasets) upon acceptance of the paper, thereby ensuring open access and reproducibility. revision: yes
Circularity Check
No circularity: dataset resource paper with no derivations or fitted claims
full rationale
The manuscript presents AlbanianLLMSafety as a new data resource containing 2,951 prompts across 11 categories. No equations, parameter fits, predictions, or derivation chains appear anywhere in the text. The contribution is the dataset release itself rather than a computed result derived from prior quantities. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The paper is therefore self-contained against external benchmarks with a circularity score of 0.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Digital morphology of the Albanian lan - guage. Journal of Responsible Technology , page 100151. Çağrı Çöltekin. 2020. A corpus of Turkish offensive language on social media. In Proceedings of the twelfth language resources and evaluation conference, pages 6174–6184. Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, ...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[2]
Safety assessment of chinese large language models.arXiv preprint arXiv:2304.10436,
Red teaming language models with lan - guage models. In Proceedings of the 2022 Con- ference on Empirical Methods in Natural Lan - guage Processing, pages 3419–3448. Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, and Minlie Huang. 2023. Safety assess- ment of Chinese large language models. arXiv preprint arXiv:2304.10436. Bertie Vidgen and Leon Derczyns...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.