pith. machine review for the scientific record. sign in

arxiv: 2604.21036 · v1 · submitted 2026-04-22 · 💻 cs.AI

Recognition: unknown

Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models

James Davis, Marzia Binta Nizam

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:43 UTC · model grok-4.3

classification 💻 cs.AI
keywords text-to-image modelsbias mitigationprompt engineeringdemographic fairnessskin tone representationinference-time interventionuser-defined fairnessgenerative AI
0
0 comments X

The pith

Prompt engineering lets users set their own demographic targets for fair representation in AI image generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an inference-time method that lets users pick a fairness target, such as equal skin-tone representation or a distribution drawn from external data, and then builds multiple prompt variants in those exact proportions. These variants steer text-to-image models like Stable Diffusion toward outputs whose skin tones match the chosen target, without any model retraining or dataset changes. Experiments with 36 prompts across 30 occupations and 6 other contexts show that skin-tone results move in the direction the user specified, and align more closely when the target is expressed directly in skin-tone terms rather than abstract categories. The approach treats fairness as something the user defines, not a single fixed property of the model.

Core claim

Instead of assuming one universal definition of fairness, the method lets users choose among multiple specifications, from uniform distributions to LLM-generated ones that cite sources and give confidence levels; these specifications determine the proportions in which demographic-specific prompt variants are generated, producing skin-tone distributions in the model outputs that shift consistently with the declared target and show lower deviation when the target is stated directly in skin-tone space.

What carries the argument

Target-based prompting: the construction of demographic-specific prompt variants in user-specified proportions that guide the generative model's output distribution toward the chosen fairness target.

If this is right

  • Fairness specifications become explicit and choosable by the user instead of being fixed by the model developer.
  • Bias mitigation works at inference time with no changes to model weights or training data.
  • Evaluation measures how well outputs match the specific target chosen rather than assuming any single distribution is fair.
  • The same prompting structure applies across both occupational prompts and non-occupational contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar prompt-proportion techniques could be tested on other visual attributes such as age or gender presentation.
  • The method might be combined with retrieval of real-world demographic statistics to make targets more evidence-based.
  • Different base models could be compared to see how strongly their internal biases resist or amplify the prompt interventions.
  • User interfaces could expose the LLM source and confidence estimates so people can judge the reliability of a chosen target.

Load-bearing premise

The generative model will respond to the demographic proportions in the prompt variants by producing matching skin-tone distributions in its outputs rather than overriding them with its own biases.

What would settle it

Generating thousands of images with a prompt set that specifies 100 percent of one skin tone and finding that the actual skin-tone distribution remains close to the model's default bias instead of shifting toward the target.

Figures

Figures reproduced from arXiv: 2604.21036 by James Davis, Marzia Binta Nizam.

Figure 1
Figure 1. Figure 1: Visual pipeline of our demographically-aware image generation system. Given an input prompt (e.g., “A full-color headshot of a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Skin tone distribution shifts for three occupational prompts before and after target-conditioned generation using SD Realistic [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Skin tone distribution by occupational status group before and after target-conditioned generation using SD Realistic Vision [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-occupation Fitzpatrick skin tone distributions across high-, moderate-, and low-prestige occupational groups before [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Skin tone distribution by occupational status group (High, Mid, Low) and method, aggregated into three Fitzpatrick bins: Light [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison across other methods for the high-status prompt [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Skin tone distributions for three non-occupational prompt groups: (a) emotional/abstract, (b) socioeconomic, and (c) criminal [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison of model-generated and fallback-targeted image outputs for the non-occupational prompt [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Skin tone distribution (Light I–II, Medium III–IV, Dark V–VI) for high-status occupations across four declared target settings: [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual comparison of model-generated and fallback-targeted image outputs for the non-occupational prompt “A full-color [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparison of model-generated and fallback-targeted image outputs for the non-occupational prompt “A full-color [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Target-conditioned prompting for gender presentation in occupational prompts. Baseline outputs (top) for “librarian” and [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Target-conditioned prompting for gender presentation in a non-occupational prompt. Baseline outputs for “a friendly person” [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative examples for SDXL Turbo across representative high-, moderate-, and low-status occupational prompts. Top rows: [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative comparison of occupational image generations before and after prompt adjustment for DALL-E 2. The top rows [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Monk Skin Tone distributions (MST 1–10) across high-, moderate-, and low-status occupational groups before (solid) and [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Fitzpatrick skin tone distributions (Types I–VI) across high-, moderate-, and low-status occupational groups before (solid) and [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Monk Skin Tone distributions (MST 1–10) across high-, moderate-, and low-status occupational groups before (solid) and [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Fitzpatrick skin tone distributions (Types I–VI) across high-, moderate-, and low-status occupational groups before (solid) [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Monk Skin Tone distributions (MST 1–10) across high-, moderate-, and low-status occupational groups before (solid) and [PITH_FULL_IMAGE:figures/full_fig_p028_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Fitzpatrick skin tone distributions (Types I–VI) across high-, moderate-, and low-status occupational groups before (solid) [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Monk Skin Tone distributions (MST 1–10) across high-, moderate-, and low-status occupational groups before (solid) and [PITH_FULL_IMAGE:figures/full_fig_p028_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Qualitative comparison across all methods for occupational prompts using SD Realistic Vision v5.1: [PITH_FULL_IMAGE:figures/full_fig_p030_23.png] view at source ↗
read the original abstract

Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such as 'doctor' or 'CEO' frequently yield lighter-skinned outputs, while lower-status roles like 'janitor' show more diversity, reinforcing stereotypes. Existing mitigation methods typically require retraining or curated datasets, making them inaccessible to most users. We propose a lightweight, inference-time framework that mitigates representational bias through prompt-level intervention without modifying the underlying model. Instead of assuming a single definition of fairness, our approach allows users to select among multiple fairness specifications-ranging from simple choices such as a uniform distribution to more complex definitions informed by a large language model(LLM) that cites sources and provides confidence estimates. These distributions guide the construction of demographic specific prompt variants in the corresponding proportions, and we evaluate alignment by auditing adherence to the declared target and measuring the resulting skin tone distribution rather than assuming uniformity as 'fairness'. Across 36 prompts spanning 30 occupations and 6 non-occupational contexts, our method shifts observed skin-tone outcomes in directions consistent with the declared target, and reduces deviation from targets when the target is defined directly in skin-tone space(fallback). This work demonstrates how fairness interventions can be made transparent, controllable, and usable at inference time, directly empowering users of generative AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an inference-time, prompt-based framework for controlling demographic (skin-tone) representation in text-to-image models. Users select among multiple fairness targets (uniform distribution or LLM-generated distributions citing sources), which dictate the proportions of demographic-specific prompt variants (e.g., 'a [skin-tone descriptor] [occupation]'). These variants are sampled to generate outputs, which are then audited for alignment with the declared target. Across 36 prompts (30 occupations + 6 non-occupational contexts), the authors claim that observed skin-tone distributions shift directionally consistent with the target and show reduced deviation from the target when the target is specified directly in skin-tone space.

Significance. If the empirical results are substantiated, the work offers a practical, transparent alternative to retraining-based debiasing methods by making fairness definitions user-controllable and auditable rather than assuming uniformity. It highlights prompt engineering as a lightweight intervention and provides external grounding via target alignment rather than self-referential metrics. The multi-specification approach (including LLM-informed targets with citations) is a notable strength for usability.

major comments (2)
  1. [Abstract] Abstract: the central empirical claim of directional shifts and reduced deviation from targets is presented without any details on skin-tone measurement/auditing methods, per-prompt sample sizes, variance across runs, statistical tests against targets, or baselines, making it impossible to assess whether the data support the alignment results.
  2. [Evaluation] The evaluation section (implied by the 36-prompt results): the assumption that constructing prompt variants in exact target proportions will produce matching output distributions is load-bearing, yet the manuscript provides no controls for prompt sensitivity, no cases of model override, and no quantitative comparison to the skeptic concern of non-linear diffusion-model responses to demographic descriptors.
minor comments (1)
  1. [Abstract] The parenthetical '(fallback)' in the abstract is undefined and should be clarified or removed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's feedback highlighting areas where the manuscript can be strengthened for clarity and rigor. We have made revisions to address both major comments as detailed below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim of directional shifts and reduced deviation from targets is presented without any details on skin-tone measurement/auditing methods, per-prompt sample sizes, variance across runs, statistical tests against targets, or baselines, making it impossible to assess whether the data support the alignment results.

    Authors: We agree with the referee that the abstract lacks sufficient detail on the empirical methods to allow full assessment of the claims. We have revised the abstract to include information on the skin-tone measurement and auditing approach (using automated classification of generated images), per-prompt sample sizes, variance across runs, statistical tests employed, and comparison to baselines. These details are now summarized in the abstract while referring readers to the full Evaluation section for complete methodology. revision: yes

  2. Referee: [Evaluation] The evaluation section (implied by the 36-prompt results): the assumption that constructing prompt variants in exact target proportions will produce matching output distributions is load-bearing, yet the manuscript provides no controls for prompt sensitivity, no cases of model override, and no quantitative comparison to the skeptic concern of non-linear diffusion-model responses to demographic descriptors.

    Authors: We agree that the evaluation relies on the assumption that prompt proportions translate to output distributions, and that additional controls would strengthen the work. The current manuscript does not include explicit sensitivity analyses, override cases, or direct comparisons to non-linear models. We have therefore revised the Evaluation section to add: controls for prompt sensitivity through testing of alternative demographic descriptors; documented cases where the model partially overrides the target proportions; and a quantitative discussion comparing observed deviations to what would be expected under linear vs. non-linear response assumptions, using the data from the 36 prompts. This addresses the skeptic concern while maintaining the paper's focus on practical usability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation is externally grounded

full rationale

The paper proposes constructing prompt variants from user- or LLM-specified target distributions and then audits generated outputs by directly measuring skin-tone distributions against those declared targets. This evaluation step is independent of the input targets and does not reduce to a self-referential fit, redefinition, or self-citation chain. No equations, parameter fitting, or load-bearing self-citations appear in the abstract or described method. The central result (directional shifts and reduced deviation) is presented as an empirical observation rather than a derivation that collapses to its own premises by construction. The approach remains self-contained against external image auditing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that prompt-level demographic specifications can steer generative model outputs toward user targets without retraining; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Prompt modifications specifying demographics can control skin-tone distributions in generated images
    This is the core premise enabling the inference-time intervention without model changes.

pith-pipeline@v0.9.0 · 5559 in / 1245 out tokens · 26152 ms · 2026-05-09T23:43:07.458289+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 24 canonical work pages · 5 internal anchors

  1. [1]

    Hritik Bansal, Da Yin, Masoud Monajatipoor, and Kai-Wei Chang. 2022. How well can text-to-image generative models understand ethical natural language interventions?. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 1358–1370

  2. [2]

    2017.Fairness and machine learning

    Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2017.Fairness and machine learning. fairmlbook. org

  3. [3]

    Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale.arXiv preprint arXiv:2211.03759 (2023)

  4. [4]

    Alain Chardon, Isabelle Cretois, and Colette Hourseau. 1991. Skin colour typology and suntanning pathways.International journal of cosmetic science13, 4 (1991), 191–208. 16 Marzia Binta Nizam and James Davis

  5. [5]

    Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, et al. 2023. Pixart Fast training of diffusion transformer for photorealistic text-to-image synthesis.arXiv preprint arXiv:2310.00426(2023)

  6. [6]

    Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa García, and Yuta Nakashima. 2024. Would deep generative models amplify bias in future models? arXiv preprint arXiv:2404.08242(2024)

  7. [7]

    Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, and Matthew Turk. 2023. TIBET: Identifying and evaluating biases in text-to-image generative models.arXiv preprint arXiv:2312.01261(2023)

  8. [8]

    Jaemin Cho, Abhaysinh Zala, and Mohit Bansal. 2022. DALL-EVAL: Probing the reasoning skills and social biases of text-to-image generation models.arXiv preprint arXiv:2202.04053(2022)

  9. [9]

    Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Retinaface: Single-shot multi-level face localisation in the wild. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5203–5212

  10. [10]

    Moreno D’Inca, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, and Nicu Sebe. 2024. OpenBias: Open-set bias detection in text-to-image generative models.arXiv preprint arXiv:2404.07990(2024)

  11. [11]

    Thomas B Fitzpatrick. 1988. The validity and practicality of sun-reactive skin types I through VI.Archives of dermatology124, 6 (1988), 869–871

  12. [12]

    Fraser, Svetlana Kiritchenko, and Isar Nejadgholi

    Kathleen C. Fraser, Svetlana Kiritchenko, and Isar Nejadgholi. 2023. Diversity is Not a One-Way Street: Pilot Study on Ethical Interventions for Racial Bias in Text-to-Image Systems. InProceedings of the 14th International Conference on Computational Creativity (ICCC). 288–292

  13. [13]

    Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, and Kristian Kersting. 2023. Fair diffusion: Instructing text-to-image generation models on fairness.arXiv preprint arXiv:2302.10893(2023)

  14. [14]

    Sourojit Ghosh and Aylin Caliskan. 2023. ’Person’== light-skinned, western man, and sexualization of women of color: Stereotypes in stable diffusion.arXiv preprint arXiv:2310.19981(2023)

  15. [15]

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets.Advances in neural information processing systems27 (2014)

  16. [16]

    Google. [n. d.]. Skin Tone by Google. https://skintone.google/. Accessed: 2026-03-23

  17. [17]

    Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, and Xiaojuan Qi. 2024. Debiasing text-to-image diffusion models. In Proceedings of the 1st ACM Multimedia Workshop on Multi-modal Misinformation Governance in the Era of Foundation Models. 29–36

  18. [18]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems33 (2020), 6840–6851

  19. [19]

    Lin-Chun Huang, Ching Chieh Tsao, Fang-Yi Su, and Jung-Hsien Chiang. 2025. Debiasing Diffusion Model: Enhancing Fairness through Latent Representation Learning in Stable Diffusion Model.arXiv preprint arXiv:2503.12536(2025)

  20. [20]

    Kimmo Kärkkäinen and Jungseock Joo. 2021. FairFace: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. Proceedings of the IEEE/CVF winter conference on applications of computer vision(2021), 1548–1558

  21. [21]

    Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, and Sunil Aryal. 2024. The Pursuit of Fairness in Artificial Intelligence Models: A Survey.arXiv preprint arXiv:2403.17333(2024). doi:10.48550/arXiv.2403.17333

  22. [22]

    Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, and Sungroh Yoon. 2025. Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion. InProceedings of the Computer Vision and Pattern Recognition Conference. 13361–13370

  23. [23]

    Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. 2023. Stable bias: Evaluating societal representations in diffusion models. arXiv preprint arXiv:2303.11408(2023)

  24. [24]

    Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM computing surveys54, 6 (2021), 1–35

  25. [25]

    Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. 2019. Diversity in faces.arXiv preprint arXiv:1901.10436(2019)

  26. [26]

    Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens.arXiv preprint arXiv:2304.06034(2023)

  27. [27]

    William Peebles and Saining Xie. 2023. Scalable diffusion models with transformers.arXiv preprint arXiv:2212.09748(2023)

  28. [28]

    Shah Prerak. 2024. Addressing Bias in Text-to-Image Generation: A Review of Mitigation Methods. In2024 Third International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN). doi:10.1109/ICSTSN61422.2024.10671230

  29. [29]

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125(2022)

  30. [30]

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation.arXiv preprint arXiv:2102.12092(2021)

  31. [31]

    Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis.arXiv preprint arXiv:1605.05396(2016)

  32. [32]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695

  33. [33]

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding.Advances in Neural Information Processing Systems35 (2022), 36479–36494

  34. [34]

    Peter Saunders. 2004. What is Fair About a ’Fair Go’?Policy20, 1 (2004), 3–10. Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models 17

  35. [35]

    Parkes, and Yang Liu

    Nripsuta Ani Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic, David C. Parkes, and Yang Liu. 2019. How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES ’19). doi:10.1145/3306618.3314248

  36. [36]

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456(2020)

  37. [37]

    Donald J Treiman. 2019. Occupational prestige in comparative perspective. InSocial Stratification, Class, Race, and Gender in Sociological Perspective, Second Edition. Routledge, 260–263

  38. [38]

    Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. InProceedings of the International Workshop on Software Fairness (FairWare ’18). doi:10.1145/3194770.3194776

  39. [39]

    Yixin Wan, Arjun Subramonian, Anaelia Ovalle, Zongyu Lin, Ashima Suvarna, Christina Chance, Hritik Bansal, Rebecca Pattichis, and Kai-Wei Chang. 2024. Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation.arXiv preprint arXiv:2404.01030(2024)

  40. [40]

    Yankun Wu, Yuta Nakashima, and Noa García. 2023. Stable diffusion exposed: Gender bias from prompt to image.arXiv preprint arXiv:2308.03399 (2023)

  41. [41]

    A full-color headshot of a [OCCUPATION]

    Cheng Zhang, Xuanbai Chen, Siqi Chai, Chen Henry Wu, Dmitry Lagun, Thabo Beeler, and Fernando De la Torre. 2023. ITI-GEN: Inclusive text-to-image generation.arXiv preprint arXiv:2309.05569(2023). A Appendix: Prompt List A.1 Occupational Prompts We used 30 occupational prompts grouped into three status categories: High, Moderate, and Low. Each prompt follo...