AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian

Isra Fejzullaj; Kholoud K. Aldous; Wajdi Zaghouani

arxiv: 2605.26954 · v1 · pith:6J5U23KQnew · submitted 2026-05-26 · 💻 cs.CL

AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian

Wajdi Zaghouani , Kholoud K. Aldous , Isra Fejzullaj This is my paper

Pith reviewed 2026-06-29 17:48 UTC · model grok-4.3

classification 💻 cs.CL

keywords Albanian languageLLM safetysafety evaluation datasetlow-resource languagesAI safetyred-teamingguardrails

0 comments

The pith

The first safety evaluation dataset for LLMs in Albanian contains 2,951 prompts across 11 categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper creates AlbanianLLMSafety to test large language model safety in Albanian, a low-resource language spoken by about 7.5 million people. Most prior safety work has focused on high-resource languages, leaving Albanian without tools to check for issues like self-harm prompts or violent content. The dataset supplies prompts in Albanian plus English translations and category labels for eleven risk areas. A sympathetic reader would care because the resource lets developers measure and reduce harmful outputs for Albanian users. Without such benchmarks, models risk producing unsafe responses in languages that lack evaluation data.

Core claim

We present AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, containing 2,951 prompts spanning 11 safety categories, including self-harm, violence, racist content, child exploitation, and radicalization, with an average of 268 prompts per category. Each prompt is provided in Albanian with an English reference translation and a detailed category label. This resource addresses a significant gap in safety evaluation infrastructure for low-resource languages and provides an essential benchmark for developing safer, more inclusive LLMs.

What carries the argument

The AlbanianLLMSafety dataset, which supplies categorized prompts in Albanian with English translations to evaluate LLM safety responses.

If this is right

Enables systematic safety evaluation of LLMs when used with Albanian inputs.
Supports red-teaming and guardrail development targeted at Albanian-speaking communities.
Provides a benchmark for fine-tuning models to reduce unsafe outputs in low-resource settings.
Allows direct comparison of model safety performance across languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prompt-collection method could be applied to create safety datasets for other low-resource languages.
Using the dataset alongside English safety sets might reveal whether multilingual models treat safety risks differently by language.
Developers could extend the resource with new prompts based on actual user reports from Albanian speakers.

Load-bearing premise

The selected prompts and category labels accurately reflect safety risks relevant to Albanian speakers and are free of unstated cultural or translation biases that would undermine evaluation utility.

What would settle it

A review by Albanian native speakers that finds a substantial share of the prompts fail to match real safety concerns in Albanian communities would undermine the dataset's claimed utility.

read the original abstract

Safety evaluation of Large Language Models (LLMs) has largely focused on high-resource languages, leaving low-resource languages critically underserved. We present AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, a linguistically distinct low-resource language with approximately 7.5 million speakers across Albania, Kosovo, North Macedonia, and the diaspora. The dataset contains 2,951 prompts spanning 11 safety categories, including self-harm, violence, racist content, child exploitation, and radicalization, with an average of 268 prompts per category. Each prompt is provided in Albanian with an English reference translation and a detailed category label. This resource addresses a significant gap in safety evaluation infrastruc-ture for low-resource languages and provides an essential benchmark for developing safer, more inclusive LLMs. The dataset will be provided upon request to support safety evaluation, fine-tuning, red-teaming, and guardrail development for Albanian-speaking communities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

First Albanian safety dataset but no details on how the prompts were built or checked.

read the letter

The paper's main contribution is releasing AlbanianLLMSafety, a set of 2,951 prompts across 11 safety categories for a low-resource language that has seen almost no prior work. That fills a clear gap and gives people working on Albanian LLMs something concrete to test against.

What the paper does is straightforward: it states the dataset exists, lists the categories, and notes that each item comes with an Albanian prompt plus an English reference. The abstract positions it as the first such resource, which appears accurate based on the literature it cites.

The soft spot is the complete absence of any account of how the prompts were created, sourced, translated, or reviewed. There is no mention of native-speaker validation, cultural fit checks, inter-annotator agreement, or even basic quality controls. Without that, it is impossible to tell whether the prompts actually capture risks relevant to Albanian speakers or whether they carry translation artifacts or mismatched category boundaries. That is not a minor omission for a safety benchmark.

The work is aimed at researchers doing multilingual safety evaluation or building guardrails for low-resource languages. A reader who needs an Albanian test set will find the resource useful once the construction details are supplied. The paper is coherent on its own terms and shows honest engagement with the gap it targets, so it deserves a serious referee who can ask for the missing methodology section rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, consisting of 2,951 prompts across 11 safety categories such as self-harm, violence, racist content, child exploitation, and radicalization. Each prompt is provided in Albanian with an English reference translation and a detailed category label, addressing the gap in safety evaluation infrastructure for this low-resource language with 7.5 million speakers.

Significance. If the prompts are shown to be appropriately sourced and validated, the dataset would provide a valuable first benchmark for Albanian LLM safety evaluation, enabling red-teaming, fine-tuning, and guardrail development for an underserved language community. The work's strength is its focus on extending safety resources beyond high-resource languages.

major comments (2)

[Abstract] Abstract: The central claim that the 2,951 prompts 'accurately reflect safety risks relevant to Albanian speakers' cannot be evaluated because the manuscript supplies no account of prompt sourcing, authorship, native-speaker validation for cultural fit, or how the 11 category boundaries were defined and assigned.
[Abstract] Abstract: The statement that the dataset is 'publicly available' is contradicted by the later claim that it 'will be provided upon request,' which directly affects the accessibility and reproducibility of the claimed contribution.

minor comments (2)

[Abstract] Abstract: Typo in 'infrastruc-ture' (hyphenated across line break).
[Abstract] Abstract: The reported average of 268 prompts per category is approximate (2,951 / 11 = 268.27); clarify whether this is exact or rounded.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and commit to revising the paper to strengthen the description of the dataset creation process and to resolve the inconsistency in availability statements.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the 2,951 prompts 'accurately reflect safety risks relevant to Albanian speakers' cannot be evaluated because the manuscript supplies no account of prompt sourcing, authorship, native-speaker validation for cultural fit, or how the 11 category boundaries were defined and assigned.

Authors: We acknowledge that the current version of the manuscript does not include a dedicated account of prompt sourcing, authorship, native-speaker validation, or the process for defining and assigning the 11 categories. In the revised manuscript we will add a new Methods section that details: (1) the sources from which prompts were drawn or adapted, (2) the involvement of native Albanian speakers in review for cultural fit, and (3) the iterative process used to establish category boundaries and assign labels. This addition will allow readers to evaluate the relevance claim directly. revision: yes
Referee: [Abstract] Abstract: The statement that the dataset is 'publicly available' is contradicted by the later claim that it 'will be provided upon request,' which directly affects the accessibility and reproducibility of the claimed contribution.

Authors: We agree that the abstract contains an internal contradiction on dataset availability. In the revision we will replace the conflicting phrasing with a clear statement that the dataset will be released publicly on a permanent repository (e.g., Hugging Face Datasets) upon acceptance of the paper, thereby ensuring open access and reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset resource paper with no derivations or fitted claims

full rationale

The manuscript presents AlbanianLLMSafety as a new data resource containing 2,951 prompts across 11 categories. No equations, parameter fits, predictions, or derivation chains appear anywhere in the text. The contribution is the dataset release itself rather than a computed result derived from prior quantities. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The paper is therefore self-contained against external benchmarks with a circularity score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset release paper with no mathematical content; contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5702 in / 1079 out tokens · 36717 ms · 2026-06-29T17:48:08.508659+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Digital morphology of the Albanian lan - guage. Journal of Responsible Technology , page 100151. Çağrı Çöltekin. 2020. A corpus of Turkish offensive language on social media. In Proceedings of the twelfth language resources and evaluation conference, pages 6174–6184. Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, ...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[2]

Safety assessment of chinese large language models.arXiv preprint arXiv:2304.10436,

Red teaming language models with lan - guage models. In Proceedings of the 2022 Con- ference on Empirical Methods in Natural Lan - guage Processing, pages 3419–3448. Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, and Minlie Huang. 2023. Safety assess- ment of Chinese large language models. arXiv preprint arXiv:2304.10436. Bertie Vidgen and Leon Derczyns...

work page arXiv 2022

[1] [1]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Digital morphology of the Albanian lan - guage. Journal of Responsible Technology , page 100151. Çağrı Çöltekin. 2020. A corpus of Turkish offensive language on social media. In Proceedings of the twelfth language resources and evaluation conference, pages 6174–6184. Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, ...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[2] [2]

Safety assessment of chinese large language models.arXiv preprint arXiv:2304.10436,

Red teaming language models with lan - guage models. In Proceedings of the 2022 Con- ference on Empirical Methods in Natural Lan - guage Processing, pages 3419–3448. Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, and Minlie Huang. 2023. Safety assess- ment of Chinese large language models. arXiv preprint arXiv:2304.10436. Bertie Vidgen and Leon Derczyns...

work page arXiv 2022