pith. sign in

arxiv: 2605.19190 · v1 · pith:BARCPOS3new · submitted 2026-05-18 · 💻 cs.CY · cs.AI· cs.HC

Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South

Pith reviewed 2026-05-20 07:02 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HC
keywords text-to-image safetyred teamingGlobal Southparticipatory methodscultural pluralismadversarial promptsdatasetnormative dissonance
0
0 comments X

The pith

Text-to-image safety frameworks calibrated to Western defaults miss culturally specific harms revealed by Global South red teaming.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Text-to-image models are deployed globally but their safety is mostly set to Western norms, leaving gaps for other regions. The paper conducts community-centered red teaming studies in secondary urban centers in Ghana, Nigeria, and parts of India through workshops with local universities. This produces the PLACES dataset of over 26,000 failure examples that show more socio-cultural diversity, unique adversarial patterns from local nuances, and new harms involving clashes with religious norms and customs. If true, this means safety work needs to shift from large-scale generic data to deeply participatory and localized collection methods to cover the full range of real-world uses.

Core claim

We present PLACES, a dataset of over 26,000 examples of text-to-image model failures collected through localized community engagement and training workshops in partnership with universities in Ghana, Nigeria, Karnataka and Punjab. The prompts display wide-ranging diversity in socio-cultural and linguistic attributes compared to existing data, unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct regional clusters such as around religion in India. The work also identifies novel harms that demonstrate normative dissonance, including violating religious norms, ignoring local customs, and ominous symbolism, thereby exposing structural contextual gaps in the 1

What carries the argument

The PLACES dataset gathered through participatory workshops that contextualize local norms in the Global South.

If this is right

  • Current T2I safety frameworks contain structural gaps that fail to account for local cultural and religious norms.
  • Adversarial patterns in prompts can be enabled specifically by cultural and linguistic nuances from the Global South.
  • Distinct thematic clusters emerge in different regions, such as religion-related prompts in India.
  • Expanding safety requires participatory methodologies for data collection rather than relying solely on scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could be extended to other AI systems to identify region-specific safety issues beyond text-to-image generation.
  • Integrating such localized datasets into model training or evaluation pipelines might reduce the occurrence of culturally insensitive outputs.
  • Future studies could explore how to balance these localized findings with the need for consistent global safety standards.

Load-bearing premise

The community engagement workshops and selected secondary urban centers produce representative, unbiased samples of local norms and that the observed harms are genuinely novel rather than simply untested by prior Western-centric red-teaming efforts.

What would settle it

Testing the PLACES prompts on T2I models equipped with current safety mechanisms to determine if they still produce the reported failures or if the novel harms are already mitigated, or comparing the diversity and uniqueness metrics directly against a large Western-centric red-teaming dataset.

Figures

Figures reproduced from arXiv: 2605.19190 by Aishwarya Verma, Alicia Parrish, Charu Kalia, Charvi Rastogi, Evgeniia Razumovskaia, Ibrahim Said Ahmad, Jessica Quaye, Lora Aroyo, Madhurima Maji, Minjae Lee, Minsuk Kahng, Mukul Bhutani, Priyanka Suresh, Shamsuddeen Hassan Muhammad, Vijay Janapa Reddi, Yaaseen Mahomed.

Figure 1
Figure 1. Figure 1: Visualization of prompts in red teaming dataset, PLACES, by projecting their embedding representations into two [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of prompt characteristics by attack mode, harm type, and target identity. We compare AdvNib (Non [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Rate of occurrence of cultural references under top-14 most frequently occurring categories from the ethnographic [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Number of unique languages used and linguistic composition of prompts in each challenge. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Correlation between language use and attack mode, harm type, and target attack identity frequencies. It presents [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Red teaming interface provided to participants. This screen capture shows the features for prompt entry, image [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: After selecting a harmful image, participants had to answer the four questions shown in the image to complete the [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: UMAP visualization of prompts showing all the challenges in PLACES and Adversarial Nibbler. [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Normalised co-occurrence matrix between human annotated harm categories and ShieldGemma 2 [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: For each challenge, we report prompt statistics by linguistic composition: a) Monolingual English – prompt only in [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Percentage of prompts classified as harmful by ShieldGemma [ [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Percentage of prompts classified as toxic by Perspective API, comparing Codemixed and English inputs across four [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
read the original abstract

Despite the global deployment of text-to-image (T2I) models, their safety frameworks are largely calibrated to a Western-centric default, creating significant vulnerabilities for the rest of the world. To embrace cultural pluralism and bring historically under-represented perspectives in T2I safety, we conduct localised community-centered red teaming studies in the Global South. Our two-fold approach prioritizes localization and participation, by focusing on secondary urban centers in these regions, and conducting community engagement and training workshops to contextualize local norms. As a result, we present PLACES, a dataset comprising over 26,000 examples of T2I model failures collected in partnership with universities in Ghana, Nigeria, and two regions of India (Karnataka and Punjab). Analysis of prompts collected reveals a wide-ranging diversity in socio-cultural and linguistic attributes, when compared to existing geography-agnostic crowdsourced red-teaming data. We observe unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct clusters within region around specific themes, such as religion in India. Moreover, we uncover structural contextual gaps in existing safety frameworks by identifying novel harms showing normative dissonance (e.g., violating religious norms, ignoring local customs, and ominous symbolism). This work argues that expanding T2I safety requires moving beyond mere scale to incorporate deeply localised, participatory methodologies for data collection and contextualization. Content warning: This paper includes examples containing potentially harmful or offensive content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents PLACES, a dataset of over 26,000 text-to-image model failure examples gathered via participatory workshops in secondary urban centers in Ghana, Nigeria, Karnataka, and Punjab, in partnership with local universities. It claims that analysis of these prompts demonstrates wide socio-cultural and linguistic diversity relative to existing geography-agnostic red-teaming data, unique adversarial patterns arising from local cultural and linguistic nuances, distinct regional thematic clusters (e.g., religion in India), and novel harms involving normative dissonance such as violations of religious norms or local customs. The work concludes that T2I safety requires localized, participatory methodologies beyond mere scale.

Significance. If the core claims hold after addressing methodological gaps, the paper would make a valuable contribution by supplying a large-scale, regionally grounded dataset and empirical evidence that Western-centric safety frameworks miss culturally specific harms. The partnerships with universities in the Global South and the emphasis on community engagement are clear strengths that could support reproducible follow-on work and falsifiable tests of cultural sensitivity in AI safety.

major comments (3)
  1. [§3 and §4] §3 (Localized Red Teaming Methodology) and §4 (Data Collection): The central claim that the collected prompts reveal 'unique adversarial patterns enabled by local cultural and linguistic nuances' and 'novel harms showing normative dissonance' rests on the assumption that workshop participants in the selected secondary urban centers produce representative samples of regional norms. No selection criteria, demographic breakdowns (age, gender, education, rural/urban origin), or comparison against census baselines are provided, leaving open the possibility that observed patterns reflect convenience sampling rather than broader local norms.
  2. [§5] §5 (Analysis): The diversity and novelty claims are supported only by qualitative observations and thematic clustering; the manuscript reports no quantitative metrics (e.g., embedding distances, statistical tests, or overlap measures) comparing PLACES against existing red-teaming datasets. This absence makes it difficult to verify that the reported socio-cultural attributes and harms are genuinely novel rather than simply untested by prior efforts.
  3. [§4.2] §4.2 (Harm Labeling): No inter-rater reliability statistics or labeling protocol details are given for identifying harms and normative dissonance. Because the identification of 'novel harms' is load-bearing for the argument that existing safety frameworks have structural contextual gaps, the lack of reliability measures weakens the evidential basis for those conclusions.
minor comments (2)
  1. [Abstract and §1] The abstract and §1 could more explicitly state the exact number of workshops, total participants, and prompt-generation protocol to allow readers to assess scale and reproducibility.
  2. [Figures] Figure captions (e.g., those showing thematic clusters) would benefit from additional detail on the clustering algorithm and distance metric used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas to strengthen the methodological transparency and evidential support in our manuscript on the PLACES dataset. We address each major comment point by point below, outlining planned revisions.

read point-by-point responses
  1. Referee: [§3 and §4] §3 (Localized Red Teaming Methodology) and §4 (Data Collection): The central claim that the collected prompts reveal 'unique adversarial patterns enabled by local cultural and linguistic nuances' and 'novel harms showing normative dissonance' rests on the assumption that workshop participants in the selected secondary urban centers produce representative samples of regional norms. No selection criteria, demographic breakdowns (age, gender, education, rural/urban origin), or comparison against census baselines are provided, leaving open the possibility that observed patterns reflect convenience sampling rather than broader local norms.

    Authors: We agree that additional details on participant selection and demographics would enhance transparency and allow readers to better assess the scope of our findings. In the revised manuscript, we will expand §3 and §4 to include explicit selection criteria for the secondary urban centers and workshop participants, recruitment processes through local university partnerships, and available demographic breakdowns (age, gender, education, and urban/rural origin where recorded). We will also add comparisons to relevant regional census data to contextualize the sample. Our participatory methodology intentionally prioritizes community engagement and local expertise over strict statistical representativeness, as the primary aim is to surface culturally specific harms that geography-agnostic approaches miss; however, we will clarify this distinction and its implications for generalizability. revision: yes

  2. Referee: [§5] §5 (Analysis): The diversity and novelty claims are supported only by qualitative observations and thematic clustering; the manuscript reports no quantitative metrics (e.g., embedding distances, statistical tests, or overlap measures) comparing PLACES against existing red-teaming datasets. This absence makes it difficult to verify that the reported socio-cultural attributes and harms are genuinely novel rather than simply untested by prior efforts.

    Authors: We concur that quantitative metrics would provide stronger empirical grounding for the diversity and novelty claims. In the revised §5, we will add quantitative comparisons, including embedding-based distances (using models such as Sentence-BERT to measure semantic similarity between PLACES prompts and existing red-teaming datasets), thematic overlap measures (e.g., Jaccard index on clustered themes), and statistical tests (e.g., chi-squared tests for differences in theme distributions such as religion across regions). These additions will complement the existing qualitative analysis and thematic clustering to more rigorously demonstrate socio-cultural attributes and novel harms. revision: yes

  3. Referee: [§4.2] §4.2 (Harm Labeling): No inter-rater reliability statistics or labeling protocol details are given for identifying harms and normative dissonance. Because the identification of 'novel harms' is load-bearing for the argument that existing safety frameworks have structural contextual gaps, the lack of reliability measures weakens the evidential basis for those conclusions.

    Authors: We recognize the importance of transparency in the harm labeling process, particularly for claims about novel harms and normative dissonance. In the revised manuscript, we will provide a detailed account of the labeling protocol in §4.2, including how local collaborators and researchers identified harms through participatory review. We will also report inter-rater reliability statistics (such as percentage agreement or Cohen's/Fleiss' kappa) for the categorization of harms where multiple annotators were involved. This will strengthen the evidential basis for our conclusions on contextual gaps in existing safety frameworks. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical data-collection study

full rationale

The paper is an empirical study that collects and analyzes a dataset of T2I model failures through community workshops in Ghana, Nigeria, Karnataka, and Punjab. No equations, fitted parameters, predictions, or derivations are present that could reduce to self-referential inputs by construction. Claims of diversity, unique adversarial patterns, and novel harms rest directly on the observed workshop outputs and prompt analysis rather than any self-definitional loop, fitted-input renaming, or load-bearing self-citation chain. The methodology is self-contained against external benchmarks of participatory data collection and does not invoke uniqueness theorems or ansatzes that collapse the central results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is empirical rather than theoretical; it rests on domain assumptions about cultural representation rather than free parameters or new entities.

axioms (1)
  • domain assumption Workshops in secondary urban centers with local university partners can surface representative local norms and harms for T2I safety evaluation.
    This premise underpins the localization and participation strategy described in the abstract.

pith-pipeline@v0.9.0 · 5876 in / 1183 out tokens · 54968 ms · 2026-05-20T07:02:56.232357+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We present PLACES, a dataset comprising over 26,000 examples of T2I model failures collected in partnership with universities in Ghana, Nigeria, and two regions of India... Analysis reveals wide-ranging diversity in socio-cultural and linguistic attributes, unique adversarial patterns enabled by local cultural and linguistic nuances, and novel harms showing normative dissonance such as violating religious norms.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We observe unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct clusters within region around specific themes, such as religion in India.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

102 extracted references · 102 canonical work pages · 4 internal anchors

  1. [1]

    Ife Adebara and Muhammad Abdul-Mageed. 2022. Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, ...

  2. [2]

    Chimamanda Ngozi Adichie. 2009. The Danger of a Single Story. TEDGlobal. https://www.ted.com/talks/chimamanda_ngozi_adichie_ the_danger_of_a_single_story Video, 18:46. Accessed 1 Dec 2025

  3. [3]

    Muhammad Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Alham Aji, Jacki O’Neill, Ashutosh Modi, and Monojit Choudhury. 2024. Towards Measuring and Modeling “Culture” in LLMs: A Survey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 15763–15784. Participatory Localized Red Teaming for T2I Safe...

  4. [4]

    Hedderich, David Ifeoluwa Adelani, and Dietrich Klakow

    Jesujoba Oluwadara Alabi, Michael A. Hedderich, David Ifeoluwa Adelani, and Dietrich Klakow. 2025. Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead.ArXivabs/2505.21315 (2025). https://api.semanticscholar.org/CorpusID:278911377

  5. [5]

    American Folklore Society. n.d.. American Folklore Society Ethnographic Thesaurus. Library of Congress Linked Data Service. https://id.loc.gov/vocabulary/ethnographicTerms.html Accessed: 2025-10-01

  6. [6]

    Anderson, Brian C

    Melissa S. Anderson, Brian C. Martinson, and Raymond De Vries. 2007. Normative Dissonance in Science: Results from a National Survey of U.S. Scientists.Journal of Empirical Research on Human Research Ethics2, 4 (2007), 3–14. doi:10.1525/jer.2007.2.4.3 PMID: 19385804

  7. [7]

    Urvashi Aneja, Aarushi Gupta, and Aditya Vashistha. 2025. Beyond Semantics: Examining Gender Bias in LLMs Deployed within Low-resource Contexts in India. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 2784–2795. doi:10.1145/3715275.3732180

  8. [8]

    Rehema Baguma, Hajarah Namuwaya, Joyce Nakatumba-Nabende, and Qazi Mamunur Rashid. 2023. Examining potential harms of large language models (llms) in africa. InInternational Conference on Safe, Secure, Ethical, Responsible Technologies and Emerging Applications. Springer, 3–19

  9. [9]

    Somnath Banerjee, Pratyush Chatterjee, Shanu Kumar, Sayan Layek, Parag Agrawal, Rima Hazra, and Animesh Mukherjee. 2025. Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations.arXiv preprint arXiv:2505.14469(2025)

  10. [10]

    Solon Barocas, Kate Crawford, Aaron Shapiro, and Hanna Wallach. 2017. The Problem with Bias: Allocative versus Representational Harms in Machine Learning. InProceedings of the 9th Annual Conference of the Special Interest Group for Computing, Information and Society (SIGCIS). Philadelphia

  11. [11]

    2023 , url =

    Abhipsa Basu, R. Venkatesh Babu, and Danish Pruthi. 2023. Inspecting the Geographical Representativeness of Images from Text-to- Image Models . In2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 5113–5124. doi:10.1109/ICCV51070.2023.00474

  12. [12]

    Stevie Bergman, Nahema Marchal, John Mellor, Shakir Mohamed, Iason Gabriel, and William Isaac. 2024. STELA: a community-centred approach to norm elicitation for AI alignment.Scientific Reports14 (03 2024). doi:10.1038/s41598-024-56648-4

  13. [13]

    Rishabh Bhardwaj and Soujanya Poria. 2023. Red-teaming large language models using chain of utterances for safety-alignment.arXiv preprint arXiv:2308.09662(2023)

  14. [14]

    Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Demographic stereotypes in text-to-image generation. (2023)

  15. [15]

    Abeba Birhane, Sree Krishna Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2022. The values encoded in machine learning research. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 173–184

  16. [16]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative research in psychology3, 2 (2006), 77–101. doi:10.1191/1478088706qp063oa

  17. [17]

    Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T Freeman, Michael Rubinstein, et al. 2023. Muse: Text-to-image generation via masked generative transformers.arXiv preprint arXiv:2301.00704 (2023)

  18. [18]

    Yu Ying Chiu, Liwei Jiang, Bill Yuchen Lin, Chan Young Park, Shuyue Stella Li, Sahithya Ravi, Mehar Bhatia, Maria Antoniak, Yulia Tsvetkov, Vered Shwartz, et al. 2024. CulturalBench: A Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming.arXiv preprint arXiv:2410.02677(2024)

  19. [20]

    Kate Crawford. 2017. The trouble with bias. InConference on Neural Information Processing Systems, invited speaker

  20. [21]

    Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, and Lidong Bing. 2024. Multilingual Jailbreak Challenges in Large Language Models. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=vESNKdEMGp

  21. [22]

    Andrés Domínguez Hernández, Diana Mosquera, and Francisco Gallegos. 2025. Lessons From the Margins: Contextual- izing, Reimagining, and Hacking Generative AI in the Global South.Harvard Data Science Review7, 4 (oct 31 2025). https://hdsr.mitpress.mit.edu/pub/lrqoe1ny

  22. [23]

    Deng, Zachary C

    Michael Feffer, Anusha Sinha, Wesley H. Deng, Zachary C. Lipton, and Hoda Heidari. 2025.Red-Teaming for Generative AI: Silver Bullet or Security Theater?AAAI Press, 421–437

  23. [24]

    Juliana Freire, Grace Fan, Benjamin Feuer, Christos Koutras, Yurong Liu, Eduardo Pena, Aécio Santos, Cláudio T Silva, and Eden Wu. [n. d.]. Large Language Models for Data Discovery and Integration: Challenges and Opportunities.IEEE Data Engineering Bulletin ([n. d.]). https://par.nsf.gov/biblio/10656713

  24. [25]

    Iason Gabriel. 2020. Artificial intelligence, values, and alignment.Minds and Machines30, 3 (2020), 411–437. FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Charvi Rastogi et al

  25. [26]

    Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. 2024. Bias and fairness in large language models: A survey.Computational Linguistics50, 3 (2024), 1097–1179

  26. [27]

    Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. 2022. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858(2022)

  27. [28]

    Sourojit Ghosh, Sanjana Gautam, Pranav Narayanan Venkit, and Avijit Ghosh. 2025. Documenting Patterns of Exoticism of Marginalized Populations Within Text-to-Image Generators.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1107–1119. doi:10.1609/aies.v8i2.36614

  28. [29]

    Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Shomir Wilson, and Aylin Caliskan. 2024. Do generative AI models output harm while representing non-Western cultures: Evidence from a community-centered approach. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 7. 476–489

  29. [30]

    Tarleton Gillespie, Ryland Shaw, Mary L Gray, and Jina Suh. 2024. AI red-teaming is a sociotechnical challenge: on values, labor, and harms.arXiv preprint arXiv:2412.09751(2024)

  30. [31]

    Pamir Gogoi, Neha Joshi, Ayushi Pandey, Vivek Seshadri, Deepthi Sudharsan, Kalika Bali, Saransh Kumar Gupta, Lipika Dey, and Partha Pratim Das. 2025. What’s Not on the Plate? Rethinking Food Computing through Indigenous Indian Datasets. InProceedings of the 1st International Workshop on Multi-Modal Food Computing(Ireland)(MMFood ’25). Association for Comp...

  31. [32]

    Maiya Goloburda, Nurkhan Laiyk, Diana Turmakhan, Yuxia Wang, Mukhammed Togmanov, Jonibek Mansurov, Askhat Sametov, Nurdaulet Mukhituly, Minghan Wang, Daniil Orel, Zain Muhammad Mujahid, Fajri Koto, Timothy Baldwin, and Preslav Nakov. 2025. Qor´gau: Evaluating Safety in Kazakh-Russian Bilingual Contexts. InFindings of the Association for Computational Ling...

  32. [33]

    Google. 2024. Embeddings-001 | Gemini API. https://ai.google.dev/gemini-api/docs/embeddings. Accessed: Dec 01, 2025

  33. [34]

    2025.Gemini 3 Pro Model Card

    Google. 2025.Gemini 3 Pro Model Card. Technical Report. Google DeepMind. https://storage.googleapis.com/deepmind-media/Model- Cards/Gemini-3-Pro-Model-Card.pdf Accessed: January 12, 2026

  34. [35]

    Google DeepMind. 2025. Gemini 2.5 Flash. https://ai.google.dev/models/gemini. Accessed: Dec 01, 2025

  35. [36]

    Google DeepMind. 2025. Gemini 3 Flash: Frontier Intelligence Built for Speed. https://blog.google/products-and-platforms/products/ gemini/gemini-3-flash/. Accessed: Dec 20, 2025

  36. [37]

    Rishav Hada, Safiya Husain, Varun Gumma, Harshita Diddee, Aditya Yadavalli, Agrima Seth, Nidhi Kulkarni, Ujwal Gadiraju, Aditya Vashistha, Vivek Seshadri, and Kalika Bali. 2024. Akal badi ya bias: An exploratory study of gender bias in hindi language technology. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1926–1939

  37. [38]

    Bell, Candace Ross, Adina Williams, Michal Drozdzal, and Adriana Romero Soriano

    Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, and Adriana Romero Soriano. 2024. Towards Geographic Inclusion in the Evaluation of Text-to-Image Models. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY, ...

  38. [39]

    Siobhan Mackenzie Hall, Samantha Dalal, Raesetje Sefala, Foutse Yuehgoh, Aisha Alaagib, Imane Hamzaoui, Shu Ishida, Jabez Magomere, Lauren Crais, Aya Salama, and Tejumade Afonja. 2025. The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes.Proc. ACM Hum.-Comput. Interact.9, 7, Article CSCW492 (Oct. 2025), 43 pages. doi:10.11...

  39. [40]

    Susan Hao, Renee Shelby, Yuchi Liu, Hansa Srinivasan, Mukul Bhutani, Burcu Karagol Ayan, Ryan Poplin, Shivani Poddar, and Sarah Laszlo. 2024. Harm amplification in text-to-image models.arXiv preprint arXiv:2402.01787(2024)

  40. [41]

    Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, and Patrick Schramowski. 2024. Llavaguard: An open vlm-based framework for safeguarding vision datasets and models.arXiv preprint arXiv:2406.05113(2024)

  41. [42]

    Yujia Hu, Ming Shan Hee, Preslav Nakov, and Roy Ka-Wei Lee. 2025. Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 12194–12212

  42. [43]

    Akshita Jha, Vinodkumar Prabhakaran, Remi Denton, Sarah Laszlo, Shachi Dave, Rida Qadri, Chandan Reddy, and Sunipa Dev. 2024. ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vi...

  43. [44]

    Haotian Jin, Haihui Fan, Jinchao Zhang, Yang Li, Bo Li, and Junhao Zhou. 2025. Dangerous Language Habits! Exploiting Code-Mixing for Backdoor Attacks on NLP Models. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 1220–1230

  44. [45]

    Nari Johnson, Hamna ., Deepthi Sudharsan, Theo Holroyd, Samantha Dalal, Siobhan Mackenzie Hall, Jennifer Wortman Vaughan, Daniela Massiceti, and Cecily Morrison. 2025. Position: To Make Text-to-Image Models that Work for Marginalized Communities, We Need New Measurement Practices for the Long Tail. (July 2025). https://www.microsoft.com/en-us/research/pub...

  45. [46]

    Kyuheon Jung, Noah Lee, and Sungchul Choi. 2025. KoDi: A Korean Diffusion Model for Bilingual Text-to-Image Generation and Cultural Fidelity.IEEE Access13 (2025), 200290–200307. doi:10.1109/ACCESS.2025.3633798

  46. [47]

    FACE: Feasible and actionable counterfactual explanations

    Amba Kak. 2020. "The Global South is everywhere, but also always somewhere": National Policy Narratives and AI Justice. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society(New York, NY, USA)(AIES ’20). Association for Computing Machinery, New York, NY, USA, 307–312. doi:10.1145/3375627.3375859

  47. [48]

    Nithish Kannen, Arif Ahmad, Marco Andreetto, Vinodkumar Prabhakaran, Utsav Prabhu, Adji Bousso Dieng, Pushpak Bhattacharyya, and Shachi Dave. 2024. Beyond Aesthetics: Cultural Competence in Text-to-Image Models. (2024). arXiv:2407.06863 [cs.CV] https://arxiv.org/abs/2407.06863

  48. [49]

    Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, et al. 2021. Dynabench: Rethinking benchmarking in NLP. InProceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies. 4110–4124

  49. [50]

    Hannah Kirk, Abeba Birhane, Bertie Vidgen, and Leon Derczynski. 2022. Handling and Presenting Harmful Text in NLP Research. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 497–510. https://aclanthology.org/2022.findings-emnlp.35

  50. [51]

    Hannah R Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, et al. 2024. The prism alignment dataset: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models.Advances in Neur...

  51. [52]

    Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su, Shuming Shi, and Zhaopeng Tu. 2023. On the cultural gap in text-to-image generation.arXiv preprint arXiv:2307.02971(2023)

  52. [53]

    Zilong Liu, Krzysztof Janowicz, Ivan Majic, Meilin Shi, Alexandra Fortacz, Mina Karimi, Gengchen Mai, and Kitty Currier. 2025. Operationalizing Geographic Diversity for the Evaluation of AI -Generated Content.Transactions in GIS29 (05 2025). doi:10.1111/tgis. 70057

  53. [54]

    Alexandra Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. 2023. Stable Bias: Analyzing Societal Representa- tions in Diffusion Models. InAdvances in Neural Information Processing Systems, Vol. 36. 56338–56351

  54. [55]

    Jabez Magomere, Shu Ishida, Tejumade Afonja, Aya Salama, Daniel Kochin, Yuehgoh Foutse, Imane Hamzaoui, Raesetje Sefala, Aisha Alaagib, Samantha Dalal, Beatrice Marchegiani, Elizaveta Semenova, Lauren Crais, and Siobhan Mackenzie Hall. 2025. The World Wide recipe: A community-centred framework for fine-grained data collection and regional bias operational...

  55. [56]

    Sina Malakouti, Boqing Gong, and Adriana Kovashka. 2025. Culture in Action: Evaluating Text-to-Image Models through Social Activities.arXiv preprint arXiv:2511.05681(2025)

  56. [57]

    Nathan Matias and Megan Price

    J. Nathan Matias and Megan Price. 2025. How public involvement can improve the science of AI.Proceedings of the National Academy of Sciences122, 48 (2025), e2421111122. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2421111122 doi:10.1073/pnas.2421111122

  57. [58]

    Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.Journal of Open Source Software2, 11 (2017), 205. doi:10.21105/joss.00205

  58. [59]

    Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426(2018)

  59. [60]

    Shakir Mohamed, Marie-Therese Png, and William Isaac. 2020. Decolonial AI: Decolonial theory as sociotechnical foresight in artificial intelligence.Philosophy & Technology33, 4 (2020), 659–684

  60. [61]

    Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Sa’id Ahmad, Saminu Mo- hammad Aliyu, Paul Röttger, Abigail Oppong, Andiswa Bukula, Chiamaka Ijeoma Chukwuneke, et al. 2025. Afrihate: A multilingual collection of hate speech and abusive language datasets for african languages. InProceedings of the 2025 Confe...

  61. [62]

    Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa’id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Alipio Jorge, Felermino Dário Mário António Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwada...

  62. [63]

    Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd Van Steenkiste, Lisa Anne Hendricks, Karolina Stanczak, and Aishwarya Agrawal. 2024. Benchmarking Vision Language Models for Cultural Understanding. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.)....

  63. [64]

    Aidan Peppin, Julia Kreutzer, Alice Schoenauer Sebag, Kelly Marchisio, Beyza Ermis, John Dang, Samuel Cahyawijaya, Shivalika Singh, Seraphina Goldfarb-Tarrant, Viraat Aryabumi, et al. 2025. The Multilingual Divide and Its Impact on Global AI Safety.arXiv preprint arXiv:2505.21344(2025)

  64. [65]

    Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving

  65. [66]

    InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

    Red teaming language models with language models. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 3419–3448

  66. [67]

    Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, and Mohit Iyyer. 2024. TopicGPT: A Prompt-based Topic Modeling Framework. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). A...

  67. [68]

    Mahika Phutane and Aditya Vashistha. 2025. Disability Across Cultures: A Human-Centered Audit of Ableism in Western and Indic LLMs. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 2000–2014

  68. [69]

    1978.Syntactic structure and social function of code-switching

    Shana Poplack. 1978.Syntactic structure and social function of code-switching. Vol. 2. Centro de Estudios Puertorriqueños,[City University of New York]

  69. [70]

    thick evaluations

    Rida Qadri, Mark Diaz, Ding Wang, and Michael Madaio. 2025. The case for" thick evaluations" of cultural representation in ai.arXiv preprint arXiv:2503.19075(2025)

  70. [71]

    Rida Qadri, Michael Madaio, and Mary L. Gray. 2025. Confusing the Map for the Territory.Commun. ACM68, 10 (Sept. 2025), 32–34. doi:10.1145/3731655

  71. [72]

    Bennett, and Remi Denton

    Rida Qadri, Renee Shelby, Cynthia L. Bennett, and Remi Denton. 2023. AI’s Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York, NY, USA, 506–517. doi:10.1...

  72. [73]

    Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin Van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, and Lora Aroyo. 2024. Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation. InProceedings of th...

  73. [74]

    Jessica Quaye, Charvi Rastogi, Alicia Parrish, Oana Inel, Minsuk Kahng, Lora Aroyo, and Vijay Janapa Reddi. 2025. From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models.arXiv preprint arXiv:2507.17922(2025)

  74. [75]

    Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, and Preethi Lahoti. 2023. Aart: Ai-assisted red-teaming with diverse data generation for new llm-powered applications. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track. 380–395

  75. [76]

    Charvi Rastogi, Tian Huey Teh, Pushkar Mishra, Roma Patel, Ding Wang, Mark Díaz, Alicia Parrish, Aida Mostafazadeh Davani, Zoe Ashwood, Michela Paganini, et al. 2025. Whose view of safety? a deep dive dataset for pluralistic alignment of text-to-image models. arXiv preprint arXiv:2507.13383(2025)

  76. [77]

    Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, and Saleema Amershi. 2023. Supporting human-ai collaboration in auditing llms with llms. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 913–926

  77. [78]

    Aniket Rege, Zinnia Nie, Mahesh Ramesh, Unmesh Raskar, Zhuoran Yu, Aditya Kusupati, Yong Jae Lee, and Ramya Korlakai Vinayak

  78. [79]

    InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    CuRe: Cultural Gaps in the Long -Tail of Text-to-Image Models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). https://aniketrege.github.io/cure

  79. [80]

    Bixuan Ren, EunJeong Cheon, and Jianghui Li. 2025. Organization Matters: A Qualitative Study of Organizational Dynamics in Red Teaming Practices For Generative AI.Proc. ACM Hum.-Comput. Interact.9, 7, Article CSCW460 (Oct. 2025), 26 pages. doi:10.1145/3757641

  80. [81]

    Marco Tulio Ribeiro and Scott Lundberg. 2022. Adaptive Testing and Debugging of NLP Models. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 3253–3267. doi:10.18653/v...

Showing first 80 references.