arxiv: 2604.13305 · v1 · submitted 2026-04-14 · 💻 cs.CV

Recognition: unknown

Bias at the End of the Score

Amaya Dharmasiri, Esin Tureci, Grace Guo, Hanspeter Pfister, Olga Russakovsky, Salma Abdel Magid, Vikram V. Ramaswamy

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords reward modelstext-to-image generationdemographic biasfairnessimage optimizationstereotypesAI evaluation metrics

0 comments

The pith

Reward models used in text-to-image systems encode demographic biases that drive optimization toward sexualized female subjects, reinforced stereotypes, and reduced diversity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper audits reward models that score image quality and guide optimization in text-to-image generation. It shows these models, intended as neutral measures of alignment or preference, instead carry demographic biases that shape training and filtering steps. A sympathetic reader would care because reward models now sit at multiple points in production pipelines, so their biases become baked into the images people actually see and use. The work tests this by measuring how reward-guided generation changes across demographic prompts and subjects. If the findings hold, current reliance on these models as safety or quality tools needs re-examination.

Core claim

Reward models are non-neutral value functions that encode demographic biases. When applied during dataset filtering, evaluation, parameter optimization, or post-generation filtering in text-to-image systems, these biases cause reward-guided optimization to disproportionately sexualize female image subjects, reinforce gender and racial stereotypes, and collapse demographic diversity.

What carries the argument

Reward models functioning as scoring functions that assign preference or quality values to generated images, thereby directing gradient updates or filtering decisions.

If this is right

Reward-guided training will systematically increase the rate at which female subjects appear in sexualized poses or attire.
Generated images will show stronger alignment with common gender and racial stereotypes than the input prompts alone would suggest.
The variety of demographic attributes (age, race, body type, etc.) across batches of generated images will narrow.
Safety and quality filters that rely on these reward models will pass or reject images in ways that embed the same demographic skew.
Evaluation metrics based on reward-model scores will report higher quality for outputs that match the encoded biases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety filters built on the same reward models may suppress non-stereotypical or non-sexualized images of women while permitting others.
The same audit approach could be applied to reward models used in text-only or multimodal language-model alignment to check for parallel effects.
If reward models are retrained with explicit demographic balance constraints, downstream image generators could regain diversity without changing prompts or base models.

Load-bearing premise

The measured biases and their downstream effects on generated images come from the reward models themselves rather than from the text prompts, base image generators, or datasets chosen for the audit.

What would settle it

Re-running the same optimization and filtering experiments after replacing the audited reward models with versions trained on explicitly balanced demographic data and checking whether the sexualization, stereotype, and diversity-collapse effects disappear.

Figures

Figures reproduced from arXiv: 2604.13305 by Amaya Dharmasiri, Esin Tureci, Grace Guo, Hanspeter Pfister, Olga Russakovsky, Salma Abdel Magid, Vikram V. Ramaswamy.

**Figure 1.** Figure 1: Reward models, e.g. PickScore [33], ImageReward [61], and HPS [59] are trained to estimate overall image quality. When used to optimize the initial noise vector ϵ 0 , they can exhibit unintended effects including hypersexualization and change of large structural aspects such as perceived demographics of subjects. We study the such unintended effects of using reward models in the T2I pipeline. Blurred regi… view at source ↗

**Figure 2.** Figure 2: Noise optimization with PickScore [33] results in hypersexualization and higher rates of NSFW content, disproportionately affecting female subjects. See [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Mean change in NSFW rate (∆nsfw, top) and skin exposure (∆skin, bottom) induced by reward optimization, stratified by reward model (x-axis), gender (color: red = female, blue = male), and base model (fill pattern). Positive values indicate that optimization increased the corresponding hypersexualization signal. tent into generated images, we measure two complementary signals: (1) the emergence of NSFW c… view at source ↗

**Figure 4.** Figure 4: Demographic transition heatmaps showing how reward model optimization shifts perceived race and gender. Each cell shows the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Example transitions across reward models. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: For each RM and dataset combination, we plot the effect size ( [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Average reward ranking per race group across counterfactual image sets, for each reward model. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: For each combination of reward model (PickScore, ImageReward, HPS, VQAScore, CLIP) and dataset (CausalFace, Social [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: For each combination of reward model (PickScore, ImageReward, HPS, VQAScore, CLIP) and dataset (CausalFace, Social [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: For each combination of reward model (PickScore, ImageReward, HPS, VQAScore, CLIP) and dataset (CausalFace, Social [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Sample counterfactual sets from three datasets used for the analysis in Sec. [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Demographic transition heatmaps showing how reward model optimization shifts perceived race and gender. Each cell shows [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

read the original abstract

Reward models (RMs) are inherently non-neutral value functions designed and trained to encode specific objectives, such as human preferences or text-image alignment. RMs have become crucial components of text-to-image (T2I) generation systems where they are used at various stages for dataset filtering, as evaluation metrics, as a supervisory signal during optimization of parameters, and for post-generation safety and quality filtering of T2I outputs. While specific problems with the integration of RMs into the T2I pipeline have been studied (e.g. reward hacking or mode collapse), their robustness and fairness as scoring functions remains largely unknown. We conduct a large scale audit of RM robustness with respect to demographic biases during T2I model training and generation. We provide quantitative and qualitative evidence that while originally developed as quality measures, RMs encode demographic biases, which cause reward-guided optimization to disproportionately sexualize female image subjects reinforce gender/racial stereotypes, and collapse demographic diversity. These findings highlight shortcomings in current reward models, challenge their reliability as quality metrics, and underscore the need for improved data collection and training procedures to enable more robust scoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper audits reward models for demographic bias in T2I pipelines but does not isolate their causal role from prompts and base models.

read the letter

The core finding is that reward models, used for scoring in text-to-image systems, carry demographic biases that lead to more sexualized female subjects, reinforced stereotypes, and reduced output diversity during generation and optimization. The work examines this across dataset filtering, training supervision, and post-generation filtering stages. What is new is the targeted, large-scale look at RMs themselves rather than the diffusion models or prompts in isolation. The quantitative trends paired with qualitative examples make the scale of the issue concrete for systems already in production. This part is useful because RMs are now common components and the audit flags a downstream fairness problem that prior work on reward hacking did not cover in the same way. The soft spot is the missing isolation. The observed patterns could stem from prompt statistics, the inductive biases of the underlying image model, or the original training data rather than the RM scoring function. The setup does not appear to include matched runs that hold prompts and base model fixed while swapping in neutral or randomized scorers, so the causal link to the RM remains open. The abstract states that quantitative and qualitative evidence was collected but gives no specifics on model selection, prompt construction, or statistical controls, which leaves the strength of the support hard to judge from the summary alone. The citation pattern is standard and draws on existing bias and alignment literature without circularity. This paper is for researchers and engineers working on fairness, alignment, or evaluation of generative models. A reader who needs to understand practical risks in current T2I pipelines will find the examples and scope relevant. It deserves peer review because the topic is timely and the scale of the audit is substantial, even though the experiments will likely need tighter ablations and clearer method reporting before the claims can be taken as settled.

Referee Report

3 major / 1 minor

Summary. The manuscript conducts a large-scale audit of reward models (RMs) used in text-to-image (T2I) generation pipelines. It claims that RMs, despite being developed as quality measures for human preferences or text-image alignment, encode demographic biases; these biases then drive reward-guided optimization to disproportionately sexualize female image subjects, reinforce gender and racial stereotypes, and collapse demographic diversity. The work supplies quantitative and qualitative evidence for these effects and concludes that current RMs are unreliable as quality metrics, calling for better data collection and training procedures.

Significance. If the causal attribution to RM biases can be isolated from prompt statistics, base-model priors, and dataset composition, the findings would be significant for the CV and generative-AI communities. They would directly challenge the widespread use of RMs for dataset filtering, optimization, and post-generation filtering, and would motivate concrete improvements in RM training. At present the evidence is asserted at a high level without the methodological detail needed to assess whether the claimed causal mechanism holds.

major comments (3)

[Abstract] Abstract: the manuscript asserts 'quantitative and qualitative evidence' that RM biases cause disproportionate sexualization, stereotype reinforcement, and diversity collapse, yet supplies no information on the specific RMs audited, the T2I base models, the prompt sets, the generation parameters, the statistical tests, or any controls for confounders.
[Methods / Experiments] Experimental design (throughout): the central causal claim requires isolation of RM value functions from prompt distributions and base-model inductive biases. No matched ablations are described that hold prompts and the underlying diffusion model fixed while toggling RM guidance, nor are controls with randomly initialized or demonstrably unbiased scorers reported.
[Results] Results and discussion: without the above controls, the observed patterns could arise from the statistics of the text prompts or from the training data of the base T2I model rather than from biases internal to the RMs; the attribution therefore remains unestablished.

minor comments (1)

[Abstract / Introduction] The abstract and introduction would benefit from a concise table or paragraph listing the exact RMs, T2I models, and prompt categories used in the audit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below with clarifications on our methodology and note the revisions we will incorporate to enhance transparency.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts 'quantitative and qualitative evidence' that RM biases cause disproportionate sexualization, stereotype reinforcement, and diversity collapse, yet supplies no information on the specific RMs audited, the T2I base models, the prompt sets, the generation parameters, the statistical tests, or any controls for confounders.

Authors: We agree that the abstract would benefit from greater specificity. The full manuscript details the specific RMs audited (preference-tuned and alignment-based models), T2I base models (Stable Diffusion v1.5 and v2.1), prompt sets (real-world captions and synthetic demographic templates), generation parameters (guidance scales and sampling steps), and statistical tests (chi-square and diversity indices). In the revised version we will expand the abstract to briefly reference these elements while preserving conciseness. revision: yes
Referee: [Methods / Experiments] Experimental design (throughout): the central causal claim requires isolation of RM value functions from prompt distributions and base-model inductive biases. No matched ablations are described that hold prompts and the underlying diffusion model fixed while toggling RM guidance, nor are controls with randomly initialized or demonstrably unbiased scorers reported.

Authors: Our design isolates RM effects by holding both prompt sets and base diffusion models fixed while varying only the RM used for guidance and optimization. We compare RM-guided outputs against unguided generation and across multiple distinct RMs on identical inputs, allowing attribution of differences in sexualization and stereotype rates to RM-specific value functions. Randomly initialized scorers were not included because they do not produce meaningful quality signals and would confound rather than clarify the comparison; we instead rely on cross-RM consistency. We will add a dedicated ablation subsection and figure to make these controls explicit. revision: partial
Referee: [Results] Results and discussion: without the above controls, the observed patterns could arise from the statistics of the text prompts or from the training data of the base T2I model rather than from biases internal to the RMs; the attribution therefore remains unestablished.

Authors: The differential outcomes we report—varying degrees of sexualization, stereotype reinforcement, and diversity collapse across RMs despite identical prompts and base models—support attribution to RM biases rather than prompt statistics or base-model priors alone. Qualitative inspection further shows that high-RM-score images align with the demographic preferences encoded in each RM. We will expand the discussion to explicitly address and rule out the listed alternative explanations using the fixed-prompt, fixed-model comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical audit with no derivation chain reducing to inputs

full rationale

The paper presents a large-scale empirical audit of reward models in T2I systems, reporting quantitative and qualitative observations on demographic biases. No mathematical derivation, first-principles prediction, or equation chain is claimed. The central findings rest on direct measurement of generated outputs under RM scoring rather than any fitted parameter renamed as a prediction or self-citation that defines the target quantity. Self-citations, if present, are not load-bearing for the audit results. The work is self-contained against external benchmarks via its experimental protocol.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the unstated premise that standard demographic categories and bias metrics are appropriate for evaluating reward models and that the tested models and prompts are representative of real-world T2I usage.

axioms (1)

domain assumption Reward models can be meaningfully audited for demographic bias using existing fairness evaluation techniques
Implicit in the decision to conduct the audit and report bias effects

pith-pipeline@v0.9.0 · 5517 in / 1249 out tokens · 57625 ms · 2026-05-10T15:33:14.990181+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 16 canonical work pages · 2 internal anchors

[1]

Evaluating clip: towards characterization of broader capabilities and downstream implications.arXiv preprint arXiv:2108.02818,

Sandhini Agarwal, Gretchen Krueger, Jack Clark, Alec Rad- ford, Jong Wook Kim, and Miles Brundage. Evaluating clip: towards characterization of broader capabilities and downstream implications.arXiv preprint arXiv:2108.02818,

work page arXiv
[2]

Arniqa: Learning distortion mani- fold for image quality assessment

Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, and Alberto Del Bimbo. Arniqa: Learning distortion mani- fold for image quality assessment. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 189–198, 2024. 2

2024
[3]

Cycle consistency as reward: Learning image- text alignment without human preferences.arXiv preprint arXiv:2506.02095, 2025

Hyojin Bahng, Caroline Chan, Fredo Durand, and Phillip Isola. Cycle consistency as reward: Learning image- text alignment without human preferences.arXiv preprint arXiv:2506.02095, 2025. 2, 3

work page arXiv 2025
[4]

Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale. InProceedings of the 2023 ACM con- ference on fairness, accountability, and transparency, pages 1493–15...

2023
[5]

On hate scaling laws for data-swamps.arXiv preprint arXiv:2306.13141, 2023

Abeba Birhane, Vinay Prabhu, Sang Han, and Vishnu Naresh Boddeti. On hate scaling laws for data-swamps.arXiv preprint arXiv:2306.13141, 2023. 3

work page arXiv 2023
[6]

Training diffusion models with reinforce- ment learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforce- ment learning. InICML 2023 Workshop on Structured Prob- abilistic Inference{\&}Generative Modeling, 2023. 3

2023
[7]

Training diffusion models with reinforce- ment learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforce- ment learning. InThe Twelfth International Conference on Learning Representations, 2024. 1

2024
[8]

From preferences to prejudice: The role of alignment tuning in shaping social bias in video diffusion models.arXiv preprint arXiv:2510.17247, 2025

Zefan Cai, Haoyi Qiu, Haozhe Zhao, Ke Wan, Jiachen Li, Jiuxiang Gu, Wen Xiao, Nanyun Peng, and Junjie Hu. From preferences to prejudice: The role of alignment tuning in shaping social bias in video diffusion models.arXiv preprint arXiv:2510.17247, 2025. 3

work page arXiv 2025
[9]

Tibet: Identifying and evaluating biases in text-to-image generative models

Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, and Matthew Turk. Tibet: Identifying and evaluating biases in text-to-image generative models. InEuropean Conference on Computer Vision, pages 429–446. Springer, 2024. 3

2024
[10]

Dall-eval: Probing the reasoning skills and social biases of text-to- image generation models

Jaemin Cho, Abhay Zala, and Mohit Bansal. Dall-eval: Probing the reasoning skills and social biases of text-to- image generation models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3043–3054, 2023. 3, 4, 7

2023
[11]

Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable re- wards.arXiv preprint arXiv:2309.17400, 2023. 1, 3

work page arXiv 2023
[12]

Openbias: Open-set bias detection in text-to-image generative models

Moreno D’Inc `a, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, and Nicu Sebe. Openbias: Open-set bias detection in text-to-image generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12225–12235, 2024. 5

2024
[13]

Raft: Reward ranked finetuning for generative foundation model alignment.arXiv preprint arXiv:2304.06767, 2023

Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, and Tong Zhang. Raft: Reward ranked finetuning for generative foundation model alignment.arXiv preprint arXiv:2304.06767, 2023. 1, 3

work page arXiv 2023
[14]

Reno: Enhancing one-step text-to-image models through reward-based noise optimiza- tion.Advances in Neural Information Processing Systems, 37:125487–125519, 2024

Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. Reno: Enhancing one-step text-to-image models through reward-based noise optimiza- tion.Advances in Neural Information Processing Systems, 37:125487–125519, 2024. 1, 2, 3, 4

2024
[15]

Dpok: Reinforcement learning for fine-tuning text-to-image diffu- sion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Moham- mad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffu- sion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023. 1, 3

2023
[16]

Re- inforcement learning for fine-tuning text-to-image diffusion models

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Moham- mad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Re- inforcement learning for fine-tuning text-to-image diffusion models. InThirty-seventh Conference on Neural Informa- tion Processing Systems (NeurIPS) 2023. Neural Information Processing Systems Foundation, 2023. 3

2023
[17]

Eva-02: A visual representation for neon genesis

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xin- long Wang, and Yue Cao. Eva-02: A visual representation for neon genesis.arXiv preprint arXiv:2303.11331, 2023. 2

work page arXiv 2023
[18]

Univer- sal dimensions of social cognition: Warmth and competence

Susan T Fiske, Amy JC Cuddy, and Peter Glick. Univer- sal dimensions of social cognition: Warmth and competence. Trends in cognitive sciences, 11(2):77–83, 2007. 4, 7

2007
[19]

Fraser and Svetlana Kiritchenko

Kathleen C. Fraser and Svetlana Kiritchenko. Examining gender and racial bias in large vision-language models using a novel dataset of parallel images. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024. 4, 7

2024
[20]

A large scale analysis of gender bi- ases in text-to-image generative models.arXiv preprint arXiv:2503.23398, 2025

Leander Girrbach, Stephan Alaniz, Genevieve Smith, and Zeynep Akata. A large scale analysis of gender bi- ases in text-to-image generative models.arXiv preprint arXiv:2503.23398, 2025. 3

work page arXiv 2025
[21]

Mediapipe image segmenter

Google AI Edge. Mediapipe image segmenter. https://developers.google.com/mediapipe/ solutions / vision / image _ segmenter, 2025. Accessed: 2025-02-14. 2

2025
[22]

Versat2i: Improving text-to-image mod- els with versatile reward.arXiv preprint arXiv:2403.18493,

Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, and Gaoang Wang. Versat2i: Improving text-to-image mod- els with versatile reward.arXiv preprint arXiv:2403.18493,

work page arXiv
[23]

Initno: Boosting text-to-image diffu- sion models via initial noise optimization

Xiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, and Di Huang. Initno: Boosting text-to-image diffu- sion models via initial noise optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9380–9389, 2024. 3

2024
[24]

Identifying implicit so- cial biases in vision-language models, 2024

Kimia Hamidieh, Haoran Zhang, Walter Gerych, Thomas Hartvigsen, and Marzyeh Ghassemi. Identifying implicit so- cial biases in vision-language models, 2024. 3

2024
[25]

Safety and fair- ness for content moderation in generative models.arXiv preprint arXiv:2306.06135, 2023

Susan Hao, Piyush Kumar, Sarah Laszlo, Shivani Poddar, Bhaktipriya Radharapu, and Renee Shelby. Safety and fair- ness for content moderation in generative models.arXiv preprint arXiv:2306.06135, 2023. 5

work page arXiv 2023
[26]

Social perception of faces in a vision- language model

Carina I Hausladen, Manuel Knott, Colin F Camerer, and Pietro Perona. Social perception of faces in a vision- language model. InProceedings of the 2025 ACM Confer- ence on Fairness, Accountability, and Transparency, pages 639–659, 2025. 4, 7, 6

2025
[27]

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation met- ric for image captioning.arXiv preprint arXiv:2104.08718,

work page internal anchor Pith review arXiv
[28]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1

2020
[29]

Social- counterfactuals: Probing and mitigating intersectional social biases in vision-language models with counterfactual exam- ples, 2024

Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, and Vasudev Lal. Social- counterfactuals: Probing and mitigating intersectional social biases in vision-language models with counterfactual exam- ples, 2024. 1, 4, 7

2024
[30]

If at first you don’t succeed, try, try again: Faithful diffusion-based text-to-image generation by selection, 2023

Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, and Zeynep Akata. If at first you don’t succeed, try, try again: Faithful diffusion-based text-to-image generation by selec- tion.arXiv preprint arXiv:2305.13308, 2023. 3

work page arXiv 2023
[31]

Confidence-aware reward optimiza- tion for fine-tuning text-to-image models.arXiv preprint arXiv:2404.01863, 2024

Kyuyoung Kim, Jongheon Jeong, Minyong An, Moham- mad Ghavamzadeh, Krishnamurthy Dvijotham, Jinwoo Shin, and Kimin Lee. Confidence-aware reward optimiza- tion for fine-tuning text-to-image models.arXiv preprint arXiv:2404.01863, 2024. 3

work page arXiv 2024
[32]

Test- time alignment of diffusion models without reward over- optimization

Sunwoo Kim, Minkyu Kim, and Dongmin Park. Test- time alignment of diffusion models without reward over- optimization. InThe Thirteenth International Conference on Learning Representations, 2025. 1

2025
[33]

Pick-a-pic: An open dataset of user preferences for text-to-image generation.Ad- vances in neural information processing systems, 36:36652– 36663, 2023

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Ad- vances in neural information processing systems, 36:36652– 36663, 2023. 1, 2, 3, 5, 7

2023
[34]

The abc of stereotypes about groups: Agency/socioeconomic success, conservative–progressive beliefs, and communion.Journal of personality and social psychology, 110(5):675, 2016

Alex Koch, Roland Imhoff, Ron Dotsch, Christian Unkel- bach, and Hans Alves. The abc of stereotypes about groups: Agency/socioeconomic success, conservative–progressive beliefs, and communion.Journal of personality and social psychology, 110(5):675, 2016. 4, 7

2016
[35]

Calibrated multi-preference optimization for aligning diffusion models

Kyungmin Lee, Xiahong Li, Qifei Wang, Junfeng He, Junjie Ke, Ming-Hsuan Yang, Irfan Essa, Jinwoo Shin, Feng Yang, and Yinxiao Li. Calibrated multi-preference optimization for aligning diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18465– 18475, 2025. 3

2025
[36]

Holis- tic evaluation of text-to-image models, 2023

Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Min- guk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, and Percy Liang. Holis- tic evaluation of text-to-image models, 2023. 3

2023
[37]

T2isafety: Benchmark for assessing fairness, toxicity, and privacy in image gener- ation

Lijun Li, Zhelun Shi, Xuhao Hu, Bowen Dong, Yiran Qin, Xihui Liu, Lu Sheng, and Jing Shao. T2isafety: Benchmark for assessing fairness, toxicity, and privacy in image gener- ation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13381–13392, 2025. 3

2025
[38]

Bench- marking algorithmic bias in face recognition: An experimen- tal approach using synthetic faces and human evaluation

Hao Liang, Pietro Perona, and Guha Balakrishnan. Bench- marking algorithmic bias in face recognition: An experimen- tal approach using synthetic faces and human evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4977–4987, 2023. 4, 7

2023
[39]

Evaluating text-to-visual generation with image-to-text gen- eration

Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, and Deva Ramanan. Evaluating text-to-visual generation with image-to-text gen- eration. InEuropean Conference on Computer Vision, pages 366–384. Springer, 2024. 1, 2, 4, 7

2024
[40]

Stable bias: Evaluating societal representa- tions in diffusion models.Advances in Neural Information Processing Systems, 36:56338–56351, 2023

Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. Stable bias: Evaluating societal representa- tions in diffusion models.Advances in Neural Information Processing Systems, 36:56338–56351, 2023. 3

2023
[41]

Stable bias: Evaluating societal representa- tions in diffusion models.Advances in Neural Information Processing Systems, 36, 2024

Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. Stable bias: Evaluating societal representa- tions in diffusion models.Advances in Neural Information Processing Systems, 36, 2024. 5

2024
[42]

The chicago face database: A free stimulus set of faces and norm- ing data.Behavior research methods, 47(4):1122–1135,

Debbie S Ma, Joshua Correll, and Bernd Wittenbrink. The chicago face database: A free stimulus set of faces and norm- ing data.Behavior research methods, 47(4):1122–1135,
[43]

Scaling inference time compute for diffusion models

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu- Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, et al. Scaling inference time compute for diffusion models. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 2523–2534,
[44]

Hpsv3: Towards wide-spectrum human preference score

Yuhang Ma, Xiaoshi Wu, Keqiang Sun, and Hongsheng Li. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15086–15095, 2025. 3

2025
[45]

Is what you ask for what you get? investigating concept asso- ciations in text-to-image models.Transactions on Machine Learning Research, 2025

Salma Abdel Magid, Weiwei Pan, Simon Warchol, Grace Guo, Junsik Kim, Mahia Rahman, and Hanspeter Pfister. Is what you ask for what you get? investigating concept asso- ciations in text-to-image models.Transactions on Machine Learning Research, 2025. 3, 5

2025
[46]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational conference on machine learning, pages 8162–8171. PMLR,
[47]

Bureau of Labor Statistics

U.S. Bureau of Labor Statistics. 2, 7, 8
[48]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 3, 4, 6, 7

2021
[49]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1

2022
[50]

Generating images of rare concepts using pre- trained diffusion models

Dvir Samuel, Rami Ben-Ari, Simon Raviv, Nir Darshan, and Gal Chechik. Generating images of rare concepts using pre- trained diffusion models. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 4695–4703, 2024. 5, 3

2024
[51]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, pages 87–103. Springer,
[52]

Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural in- formation processing systems, 35:25278–25294, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural in- formation processing systems, 35:25278–25294, 2022. 1, 2, 4, 5

2022
[53]

Eva-based fast nsfw image classifier,

Freepik Company S.L. Eva-based fast nsfw image classifier,
[54]

Enhance Text-to-Image Fine-Tuning with DRaFT+, Now Part of NVIDIA NeMo, 2024

Ali Taghibakhshi, Sahil Jain, Gerald Shen, Nima Tajbakhsh, and Arash Vahdat. Enhance Text-to-Image Fine-Tuning with DRaFT+, Now Part of NVIDIA NeMo, 2024. 3

2024
[55]

Tuning-free alignment of diffusion models with direct noise optimization.arXiv preprint arXiv:2405.18881, 2024

Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, and Tsung-Hui Chang. Inference-time alignment of diffusion models with direct noise optimization. arXiv preprint arXiv:2405.18881, 2024. 2

work page arXiv 2024
[56]

Tuning-free align- ment of diffusion models with direct noise optimization

Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, and Tsung-Hui Chang. Tuning-free align- ment of diffusion models with direct noise optimization. In ICML 2024 Workshop on Structured Probabilistic Inference {\&}Generative Modeling, 2024. 2, 3

2024
[57]

The silent assistant: Noisequery as implicit guidance for goal-driven image generation

Ruoyu Wang, Huayang Huang, Ye Zhu, Olga Russakovsky, and Yu Wu. The silent assistant: Noisequery as implicit guidance for goal-driven image generation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 17618–17628, 2025. 3

2025
[58]

Diffu- siondb: A large-scale prompt gallery dataset for text-to- image generative models

Zijie J Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. Diffu- siondb: A large-scale prompt gallery dataset for text-to- image generative models. InProceedings of the 61st An- nual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 893–911, 2023. 3

2023
[59]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341,

work page internal anchor Pith review arXiv
[60]

Focus-n-fix: Region-aware fine-tuning for text-to-image generation

Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, et al. Focus-n-fix: Region-aware fine-tuning for text-to-image generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18486– 18496, 2025. 3, 5, 6

2025
[61]

Imagere- ward: Learning and evaluating human preferences for text- to-image generation.Advances in Neural Information Pro- cessing Systems, 36:15903–15935, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation.Advances in Neural Information Pro- cessing Systems, 36:15903–15935, 2023. 1, 2, 3, 5, 6, 7

2023
[62]

One-step diffusion with distribution matching distillation

Tianwei Yin, Micha ¨el Gharbi, Richard Zhang, Eli Shecht- man, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 6613–6623, 2024. 4, 5

2024
[63]

Mira: Towards mitigating reward hacking in inference-time alignment of t2i diffusion models.arXiv preprint arXiv:2510.01549, 2025

Kevin Zhai, Utsav Singh, Anirudh Thatipelli, Souradip Chakraborty, Anit Kumar Sahu, Furong Huang, Amrit Singh Bedi, and Mubarak Shah. Mira: Towards mitigating reward hacking in inference-time alignment of t2i diffusion models. arXiv preprint arXiv:2510.01549, 2025. 3 Bias at the End of the Score Supplementary Material Supplementary Material Outline A . De...

work page arXiv 2025
[64]

her”or“his

For each block, we compute the 2D DCT using the or- thonormal DCT-II matrixC∈R 8×8: D=C B C ⊤,(7) whereBis an8×8image block andDcontains the corre- sponding DCT coefficients. We define the high-frequency region as all coefficients (u, v)satisfyingu+v≥6, consistent with the zig-zag or- dering used in JPEG quantization where these bins are most aggressively...