Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

Baoteng Li; Chi Zhang; Hao Sun; Kongming Liang; Tianwei Cao; Xianghao Zang; Xiangyu Na; Xinran Wang; Zhanyu Ma; Zhixiang He

arxiv: 2605.17807 · v1 · pith:CTRU33MKnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation

Baoteng Li , Xianghao Zang , Xinran Wang , Xiangyu Na , Zhixiang He , Hao Sun , Chi Zhang , Zhongjiang He

show 3 more authors

Tianwei Cao Kongming Liang Zhanyu Ma

This is my paper

Pith reviewed 2026-05-20 12:30 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords text-to-image generationcurriculum learningadaptive samplingreinforcement learninggroup relative policy optimizationreward varianceprompt difficulty

0 comments

The pith

Text-to-image models train more efficiently when prompts are sampled according to the variance of rewards across their generated images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Curriculum Group Policy Optimization to address inefficient uniform sampling in reinforcement learning for text-to-image models. It treats the variance of reward scores across multiple images produced from one prompt as a live signal of how learnable that prompt currently is. Prompts showing moderate inconsistency receive higher sampling rates because they are neither fully mastered nor impossibly hard. A separate calibration step keeps training balanced across prompt categories. The result is reported improvement on three standard generation benchmarks.

Core claim

We propose Curriculum Group Policy Optimization (CGPO), an adaptive curriculum training framework. During training, each prompt produces a group of images scored by a reward model. We use the variance of group rewards as an online proxy for prompt inconsistency. A higher variance suggests that the model has partially captured the prompt requirements but has not yet achieved stable mastery. Such prompts are more likely to provide useful learning signals, so we increase their sampling probabilities accordingly. Additionally, to address data imbalance in multi-category datasets, we design a category calibration method based on proportional fairness optimization, which balances training across c

What carries the argument

Variance of group rewards from images generated by the same prompt, serving as the proxy that dynamically raises sampling probability for prompts at the model's current frontier of capability.

If this is right

Training compute is redirected toward prompts that still produce inconsistent but improvable outputs rather than wasting steps on already-mastered or hopeless ones.
Category calibration prevents easier prompt types from dominating the training distribution in mixed datasets.
The same variance signal can be computed on top of existing Group Relative Policy Optimization without changing the underlying reward model or policy update rule.
Final generation quality rises on composition, attribute, and prompt-following benchmarks because learning signals are more consistently informative.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same group-variance proxy might transfer to other reinforcement-learning settings that already sample multiple outputs per input, such as text or code generation.
If the reward model is updated during training, the variance threshold may need to be adjusted periodically to keep the difficulty signal aligned with the improving policy.
Controlled experiments that insert prompts of known human-rated difficulty could test whether reward variance reliably tracks actual learning progress rather than reward-model artifacts.

Load-bearing premise

The variance of rewards within a group of images generated from the same prompt serves as a reliable online proxy for whether the prompt is at the right difficulty level for the model's current capability and will provide useful learning signals.

What would settle it

Training the identical base model with uniform prompt sampling versus the proposed variance-based sampling and finding equal or lower scores on GenEval, T2I-CompBench++, and DPG Bench would falsify the benefit of the adaptive schedule.

Figures

Figures reproduced from arXiv: 2605.17807 by Baoteng Li, Chi Zhang, Hao Sun, Kongming Liang, Tianwei Cao, Xianghao Zang, Xiangyu Na, Xinran Wang, Zhanyu Ma, Zhixiang He, Zhongjiang He.

**Figure 1.** Figure 1: Our methodology, depicted in Figure 1a, uses the variance of rewards within an image group as an online proxy for prompt inconsistency. Higher variance suggests that the model has partially captured the prompt requirements but has not yet achieved stable mastery, making such prompts more likely to provide useful marginal learning signals. During sampling, we accordingly increase their selection probabiliti… view at source ↗

**Figure 2.** Figure 2: Flowchart of Our CGPO Method. Our CGPO method operates through four sequential stages: 1) Probability Sampling: A batch of prompts that match the model’s current capability and remain actively learnable is sampled according to the current sampling probabilities. 2) Reward Calculation: Image groups are generated, and their rewards and advantages are computed for policy training. 3) Probability Computation: … view at source ↗

**Figure 3.** Figure 3: Training Efficiency Comparison. Performancetraining time curves. optimal image quality. 4.2. Comparative Experiments In this subsection, we compare the performance and convergence speed of our CGPO against other T2I methods. Both the training dataset and reward model used in our experiments are from GenEval. We evaluated our method’s performance across three benchmarks: GenEval, T2I-CompBench++, and DPG… view at source ↗

**Figure 4.** Figure 4: Qualitative Comparison on the GenEval Benchmark. Our method outperforms SD3.5-M and Flow-GRPO in key areas including Attribute Binding, Color, Spatial, and Counting [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Sampling Probability Difficulty Distribution. We perform difficulty stratification using a single category and then track the number of high-probability prompts (P > 0.7) in the probability list across training steps. The difficulty progressively increases from Level 1 to Level 3. fined three difficulty levels: Level 1 (3–4 objects), Level 2 (6–7 objects), and Level 3 (9–10 objects), each containing 160… view at source ↗

**Figure 6.** Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Additional Visualizations. Our method outperforms SD3.5-M and Flow-GRPO in key areas including Attribute Binding, Color, Spatial, and Counting [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Additional Visualizations. Our method outperforms SD3.5-M and Flow-GRPO in key areas including Attribute Binding, Color, Spatial, and Counting [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Text-to-Image (T2I) generation has achieved remarkable progress in recent years. Meanwhile, reinforcement learning methods, particularly those based on Group Relative Policy Optimization (GRPO), have attracted widespread attention and been successfully applied to T2I tasks. However, the uniform sampling strategy commonly used during training often ignores the match between sample difficulty and the model's current learning capability, leading to low training efficiency. We argue that improving training efficiency requires continuously prioritizing prompts that match the model's evolving capability and remain actively learnable. To this end, we propose Curriculum Group Policy Optimization (CGPO), an adaptive curriculum training framework. During training, each prompt produces a group of images scored by a reward model. We use the variance of group rewards as an online proxy for prompt inconsistency. A higher variance suggests that the model has partially captured the prompt requirements but has not yet achieved stable mastery. Such prompts are more likely to provide useful learning signals, so we increase their sampling probabilities accordingly. Additionally, to address data imbalance in multi-category datasets, we design a category calibration method based on proportional fairness optimization, which balances training difficulty across categories. Experiments on GenEval, T2I-CompBench++, and DPG Bench demonstrate that our framework effectively improves generation performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Curriculum Group Policy Optimization (CGPO), an adaptive curriculum framework extending Group Relative Policy Optimization (GRPO) for text-to-image generation. It uses the variance of reward scores across groups of images generated from the same prompt as an online proxy for prompt difficulty, increasing sampling probability for high-variance prompts on the grounds that they indicate partial learning without stable mastery. A category calibration step based on proportional fairness optimization is added to balance training across categories in imbalanced datasets. The authors report that the framework improves generation performance on GenEval, T2I-CompBench++, and DPG Bench.

Significance. If the variance-based proxy reliably identifies prompts that yield greater learning progress, the method could meaningfully improve training efficiency for RL-based T2I models by focusing compute on appropriately challenging examples. The category calibration addresses a practical issue in multi-category training. These elements represent a targeted contribution to curriculum learning in generative model training, but the overall significance depends on stronger empirical grounding for the core proxy assumption.

major comments (3)

Abstract: The claim that experiments on GenEval, T2I-CompBench++, and DPG Bench demonstrate effective improvement lacks any quantitative details on effect sizes, chosen baselines, statistical significance, or ablations isolating the variance proxy from simpler heuristics such as uniform sampling or fixed-probability adjustments.
Methods (variance proxy description): The central assumption that higher intra-group reward variance reliably signals partial capture of prompt requirements (rather than reward-model noise, prompt ambiguity, or sampling stochasticity) is not supported by any correlation analysis or ablation showing larger per-prompt reward deltas for high-variance prompts over training steps.
Experiments: No ablation is presented that compares CGPO against plain GRPO with equivalent extra compute or against a random high-variance selection baseline, making it impossible to attribute reported benchmark gains specifically to the proposed adaptive sampling.

minor comments (2)

The scaling factor or threshold applied to the variance when computing sampling probabilities is listed among free parameters but receives no sensitivity analysis or default-value justification.
The fairness parameter in the proportional fairness optimization for category calibration is introduced without discussion of how its value was chosen or its impact on final results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for improving clarity, empirical support, and attribution of results. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: Abstract: The claim that experiments on GenEval, T2I-CompBench++, and DPG Bench demonstrate effective improvement lacks any quantitative details on effect sizes, chosen baselines, statistical significance, or ablations isolating the variance proxy from simpler heuristics such as uniform sampling or fixed-probability adjustments.

Authors: We agree that the abstract would be strengthened by including quantitative details. In the revised manuscript, we will update the abstract to report specific effect sizes (e.g., relative gains on each benchmark), explicitly name the primary baselines, and reference the key ablations performed. Where multiple runs were conducted, we will note result consistency; if formal statistical significance tests were not performed, we will clarify this limitation while emphasizing the observed trends. revision: yes
Referee: Methods (variance proxy description): The central assumption that higher intra-group reward variance reliably signals partial capture of prompt requirements (rather than reward-model noise, prompt ambiguity, or sampling stochasticity) is not supported by any correlation analysis or ablation showing larger per-prompt reward deltas for high-variance prompts over training steps.

Authors: The manuscript presents the variance proxy through a conceptual argument that high intra-group variance reflects prompts for which the model has achieved partial but unstable mastery. We acknowledge that this would be more convincing with direct empirical support. In the revision, we will add a correlation-style analysis or ablation that tracks per-prompt reward improvements over training steps, stratified by variance level at the time of sampling. This will help distinguish the intended signal from potential confounds such as reward noise. revision: yes
Referee: Experiments: No ablation is presented that compares CGPO against plain GRPO with equivalent extra compute or against a random high-variance selection baseline, making it impossible to attribute reported benchmark gains specifically to the proposed adaptive sampling.

Authors: We agree that the current experimental design leaves room for stronger isolation of the adaptive sampling contribution. While the manuscript already compares CGPO to standard GRPO, we will add two targeted ablations in the revision: (1) a plain GRPO run with compute budget matched to CGPO, and (2) a random high-variance prompt selection baseline that does not use the curriculum scheduling. These additions will allow clearer attribution of gains to the variance-driven adaptive mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: heuristic proxy and calibration are independent of target benchmarks

full rationale

The paper's core proposal defines an online sampling rule that increases probability for prompts whose generated group exhibits high reward variance, treating this variance as a proxy for partial mastery. This variance is computed directly from the reward model outputs on the current batch and is not fitted or calibrated against the final GenEval, T2I-CompBench++, or DPG scores. The category calibration step invokes proportional fairness optimization with an explicit tunable fairness parameter; the paper does not report optimizing this parameter on the evaluation benchmarks. Because the claimed performance gains are shown via separate held-out experiments rather than by construction from the proxy itself, the derivation chain remains self-contained and does not reduce to self-definition, fitted-input renaming, or self-citation load-bearing.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The method rests on the assumption that reward variance correlates with learnability and that proportional fairness optimization can balance categories without introducing new biases; no new physical entities or ungrounded constants are introduced.

free parameters (2)

variance threshold or scaling factor for sampling probability
The mapping from observed variance to increased sampling probability likely requires at least one tunable hyperparameter whose value is not derived from first principles.
fairness parameter in proportional fairness optimization
Category calibration uses proportional fairness, which typically includes a tunable parameter controlling the strength of balancing.

axioms (2)

domain assumption The reward model provides a consistent and meaningful scalar score for image-prompt alignment.
The entire variance proxy depends on the reward model being a reliable judge; this is standard in RLHF-style T2I training but not proven in the paper.
ad hoc to paper Higher intra-group reward variance indicates partial learning rather than noise or prompt ambiguity.
This interpretation is central to the curriculum logic and is presented as an argument rather than derived from prior theory.

pith-pipeline@v0.9.0 · 5788 in / 1657 out tokens · 31394 ms · 2026-05-20T12:30:44.260700+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages

[1]

Flexidit: Your diffusion transformer can easily generate high-quality samples with less compute, 2025

Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali Thabet, and Edgar Sch¨onfeld. Flexidit: Your diffusion transformer can easily generate high-quality samples with less compute, 2025. 3

work page 2025
[2]

Continuous, subject-specific attribute control in t2i models by identifying semantic directions, 2025

Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Melvin Sevi, Vincent Tao Hu, and Bj¨orn Om- mer. Continuous, subject-specific attribute control in t2i models by identifying semantic directions, 2025. 3

work page 2025
[3]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learn- ing, page 41–48, New York, NY , USA, 2009. Association for Computing Machinery. 1, 3

work page 2009
[4]

Im- proving image generation with better captions

James Betker, Gabriel Goh, Li Jing, † TimBrooks, Jian- feng Wang, Linjie Li, † LongOuyang, † JuntangZhuang, † JoyceLee, † YufeiGuo, † WesamManassra, † PrafullaDhari- wal, † CaseyChu, † YunxinJiao, and Aditya Ramesh. Im- proving image generation with better captions. 6

work page
[5]

Make it count: Text-to-image gen- eration with an accurate number of objects, 2024

Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, and Gal Chechik. Make it count: Text-to-image gen- eration with an accurate number of objects, 2024. 3

work page 2024
[6]

Janus- pro: Unified multimodal understanding and generation with data and model scaling, 2025

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, and Chong Ruan. Janus- pro: Unified multimodal understanding and generation with data and model scaling, 2025. 6

work page 2025
[7]

Brown, Miljan Martic, Shane Legg, and Dario Amodei

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learn- ing from human preferences, 2023. 1

work page 2023
[8]

Acquire and then adapt: Squeezing out text-to-image model for image restoration,

Junyuan Deng, Xinyi Wu, Yongxing Yang, Congchao Zhu, Song Wang, and Zhenyao Wu. Acquire and then adapt: Squeezing out text-to-image model for image restoration,

work page
[9]

Unic-adapter: Unified image-instruction adapter with multi-modal transformer for image generation, 2024

Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming- ming Gong, and Gui-Song Xia. Unic-adapter: Unified image-instruction adapter with multi-modal transformer for image generation, 2024. 3

work page 2024
[10]

Scaling rectified flow trans- formers for high-resolution image synthesis, 2024

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yan- nik Marek, and Robin Rombach. Scaling rectified flow trans- formers for high-resolution image synthesis, 2024. 6, 7

work page 2024
[11]

Dpok: Reinforcement learning for fine-tuning text-to-image diffu- sion models, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Moham- mad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffu- sion models, 2023. 1

work page 2023
[12]

Curran Associates Inc., Red Hook, NY , USA, 2019

Meng Fang, Tianyi Zhou, Yali Du, Lei Han, and Zhengyou Zhang.Curriculum-guided hindsight experience replay. Curran Associates Inc., Red Hook, NY , USA, 2019. 5

work page 2019
[13]

Towards understanding and quantifying uncertainty for text-to-image generation, 2024

Gianni Franchi, Dat Nguyen Trong, Nacim Belkhir, Guox- uan Xia, and Andrea Pilzer. Towards understanding and quantifying uncertainty for text-to-image generation, 2024. 3

work page 2024
[14]

Prompt curriculum learning for efficient llm post-training, 2025

Zhaolin Gao, Joongwon Kim, Wen Sun, Thorsten Joachims, Sid Wang, Richard Yuanzhe Pang, and Liang Tan. Prompt curriculum learning for efficient llm post-training, 2025. 3

work page 2025
[15]

Geneval: An object-focused framework for evaluating text- to-image alignment, 2023

Dhruba Ghosh, Hanna Hajishirzi, and Ludwig Schmidt. Geneval: An object-focused framework for evaluating text- to-image alignment, 2023. 6

work page 2023
[16]

Adaptive rejection sampling for gibbs sampling.Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348, 1992

Walter R Gilks and Pascal Wild. Adaptive rejection sampling for gibbs sampling.Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348, 1992. 4

work page 1992
[17]

Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu

Alex Graves, Marc G. Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks, 2017. 1, 2, 3

work page 2017
[18]

Deep poisson gamma dynamical systems.Advances in Neu- ral Information Processing Systems, 31, 2018

Dandan Guo, Bo Chen, Hao Zhang, and Mingyuan Zhou. Deep poisson gamma dynamical systems.Advances in Neu- ral Information Processing Systems, 31, 2018. 4

work page 2018
[19]

Ella: Equip diffusion models with llm for en- hanced semantic alignment, 2024

Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu. Ella: Equip diffusion models with llm for en- hanced semantic alignment, 2024. 6

work page 2024
[20]

Asynchronous curriculum experience re- play: A deep reinforcement learning approach for uav au- tonomous motion control in unknown dynamic environ- ments, 2022

Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, and Yiwei Zhai. Asynchronous curriculum experience re- play: A deep reinforcement learning approach for uav au- tonomous motion control in unknown dynamic environ- ments, 2022. 5

work page 2022
[21]

T2i-compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation, 2025

Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhen- guo Li, and Xihui Liu. T2i-compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation, 2025. 6

work page 2025
[22]

Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,

Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, and Sung Ju Hwang. Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,

work page
[23]

Chatgen: Automatic text- to-image generation from freestyle chatting, 2024

Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, and Minnan Luo. Chatgen: Automatic text- to-image generation from freestyle chatting, 2024. 3

work page 2024
[24]

Hauptmann

Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, and Alexander G. Hauptmann. Self-paced learning with diversity. InProceedings of the 28th Inter- national Conference on Neural Information Processing Sys- tems - Volume 2, page 2078–2086, Cambridge, MA, USA,

work page 2078
[25]

Charging and rate control for elastic traffic

Frank Kelly. Charging and rate control for elastic traffic. European transactions on telecommunications, (1):8, 1997. 2

work page 1997
[26]

Rate control for communication networks: shadow prices, proportional fairness and stability.Journal of the Opera- tional Research society, 49(3):237–252, 1998

Frank P Kelly, Aman K Maulloo, and David Kim Hong Tan. Rate control for communication networks: shadow prices, proportional fairness and stability.Journal of the Opera- tional Research society, 49(3):237–252, 1998. 5

work page 1998
[27]

Rethinking training for de-biasing text-to- image generation: Unlocking the potential of stable diffu- sion, 2025

Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, and Sungroh Yoon. Rethinking training for de-biasing text-to- image generation: Unlocking the potential of stable diffu- sion, 2025. 3

work page 2025
[28]

Pick-a-pic: an open dataset of user preferences for text-to-image generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: an open dataset of user preferences for text-to-image generation. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 1

work page 2023
[29]

A probabilistic interpretation of self-paced learning with applications to re- inforcement learning, 2021

Pascal Klink, Hany Abdulsamad, Boris Belousov, Carlo D’Eramo, Jan Peters, and Joni Pajarinen. A probabilistic interpretation of self-paced learning with applications to re- inforcement learning, 2021. 3

work page 2021
[30]

Pawan Kumar, Benjamin Packer, and Daphne Koller

M. Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models.Curran As- sociates Inc., 2010. 3

work page 2010
[31]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space,

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dock- horn, Jack English, Zion English, Patrick Esser, Sumith Ku- lal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Muller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context im...

work page
[32]

2d-curri-dpo: Two- dimensional curriculum learning for direct preference opti- mization, 2025

Mengyang Li and Zhong Zhang. 2d-curri-dpo: Two- dimensional curriculum learning for direct preference opti- mization, 2025. 3

work page 2025
[33]

Curriculum-rlaif: Cur- riculum alignment with reinforcement learning from ai feed- back, 2025

Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, and Di Wang. Curriculum-rlaif: Cur- riculum alignment with reinforcement learning from ai feed- back, 2025. 3

work page 2025
[34]

Improving generative adversarial networks via adversarial learning in latent space.Advances in neural information pro- cessing systems, 35:8868–8881, 2022

Yang Li, Yichuan Mo, Liangliang Shi, and Junchi Yan. Improving generative adversarial networks via adversarial learning in latent space.Advances in neural information pro- cessing systems, 35:8868–8881, 2022. 4

work page 2022
[35]

Dual diffusion for unified image generation and understanding, 2025

Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yu- val Kluger, Linjie Yang, and Peng Wang. Dual diffusion for unified image generation and understanding, 2025. 3

work page 2025
[36]

Rich hu- man feedback for text-to-image generation, 2024

Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvi- jotham, Katie Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, and Vidhya Navalpakkam. Rich hu- man feedback for text-to-image generation, 2024. 1

work page 2024
[37]

Flow-grpo: Training flow matching models via on- line rl, 2025

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via on- line rl, 2025. 3, 6

work page 2025
[38]

Generating images from captions with attention, 2016

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Rus- lan Salakhutdinov. Generating images from captions with attention, 2016. 1

work page 2016
[39]

Noise diffusion for enhancing semantic faithfulness in text-to-image synthesis,

Boming Miao, Chunxiao Li, Xiaoxiao Wang, Andi Zhang, Rui Sun, Zizhe Wang, and Yao Zhu. Noise diffusion for enhancing semantic faithfulness in text-to-image synthesis,

work page
[40]

Ai alignment and social choice: Funda- mental limitations and policy implications, 2023

Abhilash Mishra. Ai alignment and social choice: Funda- mental limitations and policy implications, 2023. 1

work page 2023
[41]

Taylor, and Peter Stone

Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, and Peter Stone. Curriculum learning for reinforcement learning domains: A framework and survey,

work page
[42]

On the two different aspects of the rep- resentative method: the method of stratified sampling and the method of purposive selection

Jerzy Neyman. On the two different aspects of the rep- resentative method: the method of stratified sampling and the method of purposive selection. InBreakthroughs in statistics: Methodology and distribution, pages 123–150. Springer, 1992. 4

work page 1992
[43]

Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, and et al

OpenAI, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, and et al. Gpt-4o system card, 2024. 6

work page 2024
[44]

Venkatesh Babu

Rishubh Parihar, Vaibhav Agrawal, Sachidanand VS, and R. Venkatesh Babu. Compass control: Multi object orien- tation control for text-to-image generation, 2025. 1, 3

work page 2025
[45]

Curry-dpo: En- hancing alignment using curriculum learning & ranked pref- erences, 2024

Pulkit Pattnaik, Rishabh Maheshwary, Kelechi Ogueji, Vikas Yadav, and Sathwik Tejaswi Madhusudhan. Curry-dpo: En- hancing alignment using curriculum learning & ranked pref- erences, 2024. 3

work page 2024
[46]

Mitchell

Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neu- big, Barnabas Poczos, and Tom M. Mitchell. Competence- based curriculum learning for neural machine translation,

work page
[47]

Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Muller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023. 6

work page 2023
[48]

Automatic curriculum learning for deep rl: A short survey, 2020

R ´emy Portelas, C´edric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. Automatic curriculum learning for deep rl: A short survey, 2020. 3

work page 2020
[49]

Kaseb, Kent Gauen, Ryan Dailey, Sarah Agha- janzadeh, Yung-Hsiang Lu, Shu-Ching Chen, and Mei-Ling Shyu

Samira Pouyanfar, Yudong Tao, Anup Mohan, Haiman Tian, Ahmed S. Kaseb, Kent Gauen, Ryan Dailey, Sarah Agha- janzadeh, Yung-Hsiang Lu, Shu-Ching Chen, and Mei-Ling Shyu. Dynamic sampling in convolutional neural networks for imbalanced data classification. In2018 IEEE Confer- ence on Multimedia Information Processing and Retrieval (MIPR), pages 112–117, 2018. 2

work page 2018
[50]

Self-cross diffu- sion guidance for text-to-image synthesis of similar subjects,

Weimin Qiu, Jieke Wang, and Meng Tang. Self-cross diffu- sion guidance for text-to-image synthesis of similar subjects,

work page
[51]

Steps: Sequential probability tensor estimation for text-to-image hard prompt search

Yuning Qiu, Andong Wang, Chao Li, Haonan Huang, Guoxu Zhou, and Qibin Zhao. Steps: Sequential probability tensor estimation for text-to-image hard prompt search. InCVPR, pages 28640–28650, 2025. 3

work page 2025
[52]

Zero-shot text-to-image generation, 2021

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation, 2021. 3

work page 2021
[53]

High-resolution image syn- thesis with latent diffusion models, 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution image syn- thesis with latent diffusion models, 2022. 3

work page 2022
[54]

Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding, 2022. 3

work page 2022
[55]

Probabilistic curriculum learning for goal-based reinforcement learning, 2025

Llewyn Salt and Marcus Gallagher. Probabilistic curriculum learning for goal-based reinforcement learning, 2025. 3

work page 2025
[56]

Proximal policy optimization algo- rithms, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms, 2017. 1, 3

work page 2017
[57]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 1, 3, 5

work page 2024
[58]

Patel, and Karthik Nandakumar

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M. Patel, and Karthik Nandakumar. Stereo: A two- stage framework for adversarially robust concept erasing from text-to-image diffusion models, 2025. 3

work page 2025
[59]

Metropolis-hastings generative adversarial net- works

Ryan Turner, Jane Hung, Eric Frank, Yunus Saatchi, and Ja- son Yosinski. Metropolis-hastings generative adversarial net- works. InInternational Conference on Machine Learning, pages 6345–6353. PMLR, 2019. 4

work page 2019
[60]

Minority-focused text-to- image generation via prompt optimization, 2025

Soobin Um and Jong Chul Ye. Minority-focused text-to- image generation via prompt optimization, 2025. 3

work page 2025
[61]

Various techniques used in con- nection with random digits.John von Neumann, Collected Works, 5:768–770, 1963

John V on Neumann et al. Various techniques used in con- nection with random digits.John von Neumann, Collected Works, 5:768–770, 1963. 4

work page 1963
[62]

Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, and Yogesh Balaji

Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, and Yogesh Balaji. A comprehensive study of decoder-only llms for text-to-image generation, 2025. 3

work page 2025
[63]

Adapting text-to-image generation with feature difference instruction for generic image restoration

Chao Wang, Hehe Fan, Huichen Yang, Sarvnaz Karimi, Lina Yao, and Yi Yang. Adapting text-to-image generation with feature difference instruction for generic image restoration. In2025 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 23539–23550, 2025. 3

work page 2025
[64]

Scaling down text encoders of text-to-image diffusion mod- els, 2025

Lifu Wang, Daqing Liu, Xinchen Liu, and Xiaodong He. Scaling down text encoders of text-to-image diffusion mod- els, 2025. 1

work page 2025
[65]

Unified reward model for multimodal understanding and generation, 2025

Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, and Jiaqi Wang. Unified reward model for multimodal understanding and generation, 2025. 1

work page 2025
[66]

Designdiffusion: High- quality text-to-design image generation with diffusion mod- els, 2025

Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, and Houqiang Li. Designdiffusion: High- quality text-to-design image generation with diffusion mod- els, 2025. 1

work page 2025
[67]

Dump: Automated distribution-level curricu- lum learning for rl-based llm post-training, 2025

Zhenting Wang, Guofeng Cui, Yu-Jhe Li, Kun Wan, and Wentian Zhao. Dump: Automated distribution-level curricu- lum learning for rl-based llm post-training, 2025. 3

work page 2025
[68]

Sharpening a tool for teach- ing: the zone of proximal development.Teaching in Higher Education, 19(6):671–684, 2014

Rob Wass and Clinton Golding. Sharpening a tool for teach- ing: the zone of proximal development.Teaching in Higher Education, 19(6):671–684, 2014. 3

work page 2014
[69]

Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffusion transformer,

Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Chengyue Wu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, Bingchen Liu, Daquan Zhou, and Song Han. Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffusion transformer,

work page
[70]

Logic-rl: Unleashing llm reasoning with rule- based reinforcement learning, 2025

Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, and Chong Luo. Logic-rl: Unleashing llm reasoning with rule- based reinforcement learning, 2025. 3

work page 2025
[71]

Focus- n-fix: Region-aware fine-tuning for text-to-image genera- tion, 2025

Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, Feng Yang, and Deepak Ramachandran. Focus- n-fix: Region-aware fine-tuning for text-to-image genera- tion, 2025. 3

work page 2025
[72]

Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023. 1

work page 2023
[73]

The perfect blend: Redefining rlhf with mixture of judges, 2024

Tengyu Xu, Eryk Helenowski, Karthik Abinav Sankarara- man, Di Jin, Kaiyan Peng, Eric Han, Shaoliang Nie, Chen Zhu, Hejia Zhang, Wenxuan Zhou, Zhouhao Zeng, Yun He, Karishma Mandyam, Arya Talabzadeh, Madian Khabsa, Gabriel Cohen, Yuandong Tian, Hao Ma, Sinong Wang, and Han Fang. The perfect blend: Redefining rlhf with mixture of judges, 2024. 1

work page 2024
[74]

Scaling autoregressive models for content-rich text-to-image generation, 2022

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gun- jan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yin- fei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. Scaling autoregressive models for content-rich text-to-image generation, 2022. 3

work page 2022
[75]

Learning to sample effective and diverse prompts for text-to-image generation, 2025

Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park, and Ling Pan. Learning to sample effective and diverse prompts for text-to-image generation, 2025. 3 Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation Supplementary Material

work page 2025
[76]

To derive the analytical solution for the optimization problem in Eq

Derivation of the Category Calibration For- mula In this section, we present the detailed derivation process for our category calibration. To derive the analytical solution for the optimization problem in Eq. (6), we first expand the KL divergence term. The original problem is: max q cX i=1 log(qi)−λ·KL(v∥q), s.t.∀q i ≥0, cX i=1 qi = 1. (9) The KL diverge...

work page
[77]

The configuration uses a sampling timestepT= 10 during training andT= 40for evaluation, with an image group sizeG= 24, noise levela= 0.8, and image reso- lution 256

Further Details on the Experimental Setup Our CGPO framework builds upon the Flow-GRPO archi- tecture. The configuration uses a sampling timestepT= 10 during training andT= 40for evaluation, with an image group sizeG= 24, noise levela= 0.8, and image reso- lution 256. The KL ratio is set to 0.004 (0.04 for the fast variant). LoRA parameters are configured...

work page
[78]

increasing reward with decreasing variance

Extended Experimental Results 8.1. Multiple Rewards Experimental For a controlled comparison, the reproduced 8-GPU Flow- GRPO uses the same batch size, rollout configuration, re- ward model, and training steps as CGPO, with the train- ing framework being the only difference. Following the evaluation protocol of Flow-GRPO, we train three separate models us...

work page

[1] [1]

Flexidit: Your diffusion transformer can easily generate high-quality samples with less compute, 2025

Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali Thabet, and Edgar Sch¨onfeld. Flexidit: Your diffusion transformer can easily generate high-quality samples with less compute, 2025. 3

work page 2025

[2] [2]

Continuous, subject-specific attribute control in t2i models by identifying semantic directions, 2025

Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Melvin Sevi, Vincent Tao Hu, and Bj¨orn Om- mer. Continuous, subject-specific attribute control in t2i models by identifying semantic directions, 2025. 3

work page 2025

[3] [3]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learn- ing, page 41–48, New York, NY , USA, 2009. Association for Computing Machinery. 1, 3

work page 2009

[4] [4]

Im- proving image generation with better captions

James Betker, Gabriel Goh, Li Jing, † TimBrooks, Jian- feng Wang, Linjie Li, † LongOuyang, † JuntangZhuang, † JoyceLee, † YufeiGuo, † WesamManassra, † PrafullaDhari- wal, † CaseyChu, † YunxinJiao, and Aditya Ramesh. Im- proving image generation with better captions. 6

work page

[5] [5]

Make it count: Text-to-image gen- eration with an accurate number of objects, 2024

Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, and Gal Chechik. Make it count: Text-to-image gen- eration with an accurate number of objects, 2024. 3

work page 2024

[6] [6]

Janus- pro: Unified multimodal understanding and generation with data and model scaling, 2025

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, and Chong Ruan. Janus- pro: Unified multimodal understanding and generation with data and model scaling, 2025. 6

work page 2025

[7] [7]

Brown, Miljan Martic, Shane Legg, and Dario Amodei

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learn- ing from human preferences, 2023. 1

work page 2023

[8] [8]

Acquire and then adapt: Squeezing out text-to-image model for image restoration,

Junyuan Deng, Xinyi Wu, Yongxing Yang, Congchao Zhu, Song Wang, and Zhenyao Wu. Acquire and then adapt: Squeezing out text-to-image model for image restoration,

work page

[9] [9]

Unic-adapter: Unified image-instruction adapter with multi-modal transformer for image generation, 2024

Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming- ming Gong, and Gui-Song Xia. Unic-adapter: Unified image-instruction adapter with multi-modal transformer for image generation, 2024. 3

work page 2024

[10] [10]

Scaling rectified flow trans- formers for high-resolution image synthesis, 2024

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yan- nik Marek, and Robin Rombach. Scaling rectified flow trans- formers for high-resolution image synthesis, 2024. 6, 7

work page 2024

[11] [11]

Dpok: Reinforcement learning for fine-tuning text-to-image diffu- sion models, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Moham- mad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffu- sion models, 2023. 1

work page 2023

[12] [12]

Curran Associates Inc., Red Hook, NY , USA, 2019

Meng Fang, Tianyi Zhou, Yali Du, Lei Han, and Zhengyou Zhang.Curriculum-guided hindsight experience replay. Curran Associates Inc., Red Hook, NY , USA, 2019. 5

work page 2019

[13] [13]

Towards understanding and quantifying uncertainty for text-to-image generation, 2024

Gianni Franchi, Dat Nguyen Trong, Nacim Belkhir, Guox- uan Xia, and Andrea Pilzer. Towards understanding and quantifying uncertainty for text-to-image generation, 2024. 3

work page 2024

[14] [14]

Prompt curriculum learning for efficient llm post-training, 2025

Zhaolin Gao, Joongwon Kim, Wen Sun, Thorsten Joachims, Sid Wang, Richard Yuanzhe Pang, and Liang Tan. Prompt curriculum learning for efficient llm post-training, 2025. 3

work page 2025

[15] [15]

Geneval: An object-focused framework for evaluating text- to-image alignment, 2023

Dhruba Ghosh, Hanna Hajishirzi, and Ludwig Schmidt. Geneval: An object-focused framework for evaluating text- to-image alignment, 2023. 6

work page 2023

[16] [16]

Adaptive rejection sampling for gibbs sampling.Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348, 1992

Walter R Gilks and Pascal Wild. Adaptive rejection sampling for gibbs sampling.Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348, 1992. 4

work page 1992

[17] [17]

Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu

Alex Graves, Marc G. Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks, 2017. 1, 2, 3

work page 2017

[18] [18]

Deep poisson gamma dynamical systems.Advances in Neu- ral Information Processing Systems, 31, 2018

Dandan Guo, Bo Chen, Hao Zhang, and Mingyuan Zhou. Deep poisson gamma dynamical systems.Advances in Neu- ral Information Processing Systems, 31, 2018. 4

work page 2018

[19] [19]

Ella: Equip diffusion models with llm for en- hanced semantic alignment, 2024

Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu. Ella: Equip diffusion models with llm for en- hanced semantic alignment, 2024. 6

work page 2024

[20] [20]

Asynchronous curriculum experience re- play: A deep reinforcement learning approach for uav au- tonomous motion control in unknown dynamic environ- ments, 2022

Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, and Yiwei Zhai. Asynchronous curriculum experience re- play: A deep reinforcement learning approach for uav au- tonomous motion control in unknown dynamic environ- ments, 2022. 5

work page 2022

[21] [21]

T2i-compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation, 2025

Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhen- guo Li, and Xihui Liu. T2i-compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation, 2025. 6

work page 2025

[22] [22]

Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,

Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, and Sung Ju Hwang. Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,

work page

[23] [23]

Chatgen: Automatic text- to-image generation from freestyle chatting, 2024

Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, and Minnan Luo. Chatgen: Automatic text- to-image generation from freestyle chatting, 2024. 3

work page 2024

[24] [24]

Hauptmann

Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, and Alexander G. Hauptmann. Self-paced learning with diversity. InProceedings of the 28th Inter- national Conference on Neural Information Processing Sys- tems - Volume 2, page 2078–2086, Cambridge, MA, USA,

work page 2078

[25] [25]

Charging and rate control for elastic traffic

Frank Kelly. Charging and rate control for elastic traffic. European transactions on telecommunications, (1):8, 1997. 2

work page 1997

[26] [26]

Rate control for communication networks: shadow prices, proportional fairness and stability.Journal of the Opera- tional Research society, 49(3):237–252, 1998

Frank P Kelly, Aman K Maulloo, and David Kim Hong Tan. Rate control for communication networks: shadow prices, proportional fairness and stability.Journal of the Opera- tional Research society, 49(3):237–252, 1998. 5

work page 1998

[27] [27]

Rethinking training for de-biasing text-to- image generation: Unlocking the potential of stable diffu- sion, 2025

Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, and Sungroh Yoon. Rethinking training for de-biasing text-to- image generation: Unlocking the potential of stable diffu- sion, 2025. 3

work page 2025

[28] [28]

Pick-a-pic: an open dataset of user preferences for text-to-image generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: an open dataset of user preferences for text-to-image generation. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 1

work page 2023

[29] [29]

A probabilistic interpretation of self-paced learning with applications to re- inforcement learning, 2021

Pascal Klink, Hany Abdulsamad, Boris Belousov, Carlo D’Eramo, Jan Peters, and Joni Pajarinen. A probabilistic interpretation of self-paced learning with applications to re- inforcement learning, 2021. 3

work page 2021

[30] [30]

Pawan Kumar, Benjamin Packer, and Daphne Koller

M. Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models.Curran As- sociates Inc., 2010. 3

work page 2010

[31] [31]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space,

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dock- horn, Jack English, Zion English, Patrick Esser, Sumith Ku- lal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Muller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context im...

work page

[32] [32]

2d-curri-dpo: Two- dimensional curriculum learning for direct preference opti- mization, 2025

Mengyang Li and Zhong Zhang. 2d-curri-dpo: Two- dimensional curriculum learning for direct preference opti- mization, 2025. 3

work page 2025

[33] [33]

Curriculum-rlaif: Cur- riculum alignment with reinforcement learning from ai feed- back, 2025

Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, and Di Wang. Curriculum-rlaif: Cur- riculum alignment with reinforcement learning from ai feed- back, 2025. 3

work page 2025

[34] [34]

Improving generative adversarial networks via adversarial learning in latent space.Advances in neural information pro- cessing systems, 35:8868–8881, 2022

Yang Li, Yichuan Mo, Liangliang Shi, and Junchi Yan. Improving generative adversarial networks via adversarial learning in latent space.Advances in neural information pro- cessing systems, 35:8868–8881, 2022. 4

work page 2022

[35] [35]

Dual diffusion for unified image generation and understanding, 2025

Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yu- val Kluger, Linjie Yang, and Peng Wang. Dual diffusion for unified image generation and understanding, 2025. 3

work page 2025

[36] [36]

Rich hu- man feedback for text-to-image generation, 2024

Youwei Liang, Junfeng He, Gang Li, Peizhao Li, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvi- jotham, Katie Collins, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, and Vidhya Navalpakkam. Rich hu- man feedback for text-to-image generation, 2024. 1

work page 2024

[37] [37]

Flow-grpo: Training flow matching models via on- line rl, 2025

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via on- line rl, 2025. 3, 6

work page 2025

[38] [38]

Generating images from captions with attention, 2016

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Rus- lan Salakhutdinov. Generating images from captions with attention, 2016. 1

work page 2016

[39] [39]

Noise diffusion for enhancing semantic faithfulness in text-to-image synthesis,

Boming Miao, Chunxiao Li, Xiaoxiao Wang, Andi Zhang, Rui Sun, Zizhe Wang, and Yao Zhu. Noise diffusion for enhancing semantic faithfulness in text-to-image synthesis,

work page

[40] [40]

Ai alignment and social choice: Funda- mental limitations and policy implications, 2023

Abhilash Mishra. Ai alignment and social choice: Funda- mental limitations and policy implications, 2023. 1

work page 2023

[41] [41]

Taylor, and Peter Stone

Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, and Peter Stone. Curriculum learning for reinforcement learning domains: A framework and survey,

work page

[42] [42]

On the two different aspects of the rep- resentative method: the method of stratified sampling and the method of purposive selection

Jerzy Neyman. On the two different aspects of the rep- resentative method: the method of stratified sampling and the method of purposive selection. InBreakthroughs in statistics: Methodology and distribution, pages 123–150. Springer, 1992. 4

work page 1992

[43] [43]

Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, and et al

OpenAI, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, and et al. Gpt-4o system card, 2024. 6

work page 2024

[44] [44]

Venkatesh Babu

Rishubh Parihar, Vaibhav Agrawal, Sachidanand VS, and R. Venkatesh Babu. Compass control: Multi object orien- tation control for text-to-image generation, 2025. 1, 3

work page 2025

[45] [45]

Curry-dpo: En- hancing alignment using curriculum learning & ranked pref- erences, 2024

Pulkit Pattnaik, Rishabh Maheshwary, Kelechi Ogueji, Vikas Yadav, and Sathwik Tejaswi Madhusudhan. Curry-dpo: En- hancing alignment using curriculum learning & ranked pref- erences, 2024. 3

work page 2024

[46] [46]

Mitchell

Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neu- big, Barnabas Poczos, and Tom M. Mitchell. Competence- based curriculum learning for neural machine translation,

work page

[47] [47]

Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Muller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023. 6

work page 2023

[48] [48]

Automatic curriculum learning for deep rl: A short survey, 2020

R ´emy Portelas, C´edric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. Automatic curriculum learning for deep rl: A short survey, 2020. 3

work page 2020

[49] [49]

Kaseb, Kent Gauen, Ryan Dailey, Sarah Agha- janzadeh, Yung-Hsiang Lu, Shu-Ching Chen, and Mei-Ling Shyu

Samira Pouyanfar, Yudong Tao, Anup Mohan, Haiman Tian, Ahmed S. Kaseb, Kent Gauen, Ryan Dailey, Sarah Agha- janzadeh, Yung-Hsiang Lu, Shu-Ching Chen, and Mei-Ling Shyu. Dynamic sampling in convolutional neural networks for imbalanced data classification. In2018 IEEE Confer- ence on Multimedia Information Processing and Retrieval (MIPR), pages 112–117, 2018. 2

work page 2018

[50] [50]

Self-cross diffu- sion guidance for text-to-image synthesis of similar subjects,

Weimin Qiu, Jieke Wang, and Meng Tang. Self-cross diffu- sion guidance for text-to-image synthesis of similar subjects,

work page

[51] [51]

Steps: Sequential probability tensor estimation for text-to-image hard prompt search

Yuning Qiu, Andong Wang, Chao Li, Haonan Huang, Guoxu Zhou, and Qibin Zhao. Steps: Sequential probability tensor estimation for text-to-image hard prompt search. InCVPR, pages 28640–28650, 2025. 3

work page 2025

[52] [52]

Zero-shot text-to-image generation, 2021

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation, 2021. 3

work page 2021

[53] [53]

High-resolution image syn- thesis with latent diffusion models, 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution image syn- thesis with latent diffusion models, 2022. 3

work page 2022

[54] [54]

Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding, 2022. 3

work page 2022

[55] [55]

Probabilistic curriculum learning for goal-based reinforcement learning, 2025

Llewyn Salt and Marcus Gallagher. Probabilistic curriculum learning for goal-based reinforcement learning, 2025. 3

work page 2025

[56] [56]

Proximal policy optimization algo- rithms, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms, 2017. 1, 3

work page 2017

[57] [57]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 1, 3, 5

work page 2024

[58] [58]

Patel, and Karthik Nandakumar

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M. Patel, and Karthik Nandakumar. Stereo: A two- stage framework for adversarially robust concept erasing from text-to-image diffusion models, 2025. 3

work page 2025

[59] [59]

Metropolis-hastings generative adversarial net- works

Ryan Turner, Jane Hung, Eric Frank, Yunus Saatchi, and Ja- son Yosinski. Metropolis-hastings generative adversarial net- works. InInternational Conference on Machine Learning, pages 6345–6353. PMLR, 2019. 4

work page 2019

[60] [60]

Minority-focused text-to- image generation via prompt optimization, 2025

Soobin Um and Jong Chul Ye. Minority-focused text-to- image generation via prompt optimization, 2025. 3

work page 2025

[61] [61]

Various techniques used in con- nection with random digits.John von Neumann, Collected Works, 5:768–770, 1963

John V on Neumann et al. Various techniques used in con- nection with random digits.John von Neumann, Collected Works, 5:768–770, 1963. 4

work page 1963

[62] [62]

Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, and Yogesh Balaji

Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, and Yogesh Balaji. A comprehensive study of decoder-only llms for text-to-image generation, 2025. 3

work page 2025

[63] [63]

Adapting text-to-image generation with feature difference instruction for generic image restoration

Chao Wang, Hehe Fan, Huichen Yang, Sarvnaz Karimi, Lina Yao, and Yi Yang. Adapting text-to-image generation with feature difference instruction for generic image restoration. In2025 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 23539–23550, 2025. 3

work page 2025

[64] [64]

Scaling down text encoders of text-to-image diffusion mod- els, 2025

Lifu Wang, Daqing Liu, Xinchen Liu, and Xiaodong He. Scaling down text encoders of text-to-image diffusion mod- els, 2025. 1

work page 2025

[65] [65]

Unified reward model for multimodal understanding and generation, 2025

Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, and Jiaqi Wang. Unified reward model for multimodal understanding and generation, 2025. 1

work page 2025

[66] [66]

Designdiffusion: High- quality text-to-design image generation with diffusion mod- els, 2025

Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, and Houqiang Li. Designdiffusion: High- quality text-to-design image generation with diffusion mod- els, 2025. 1

work page 2025

[67] [67]

Dump: Automated distribution-level curricu- lum learning for rl-based llm post-training, 2025

Zhenting Wang, Guofeng Cui, Yu-Jhe Li, Kun Wan, and Wentian Zhao. Dump: Automated distribution-level curricu- lum learning for rl-based llm post-training, 2025. 3

work page 2025

[68] [68]

Sharpening a tool for teach- ing: the zone of proximal development.Teaching in Higher Education, 19(6):671–684, 2014

Rob Wass and Clinton Golding. Sharpening a tool for teach- ing: the zone of proximal development.Teaching in Higher Education, 19(6):671–684, 2014. 3

work page 2014

[69] [69]

Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffusion transformer,

Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Chengyue Wu, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, Bingchen Liu, Daquan Zhou, and Song Han. Sana 1.5: Efficient scaling of training-time and inference-time compute in linear diffusion transformer,

work page

[70] [70]

Logic-rl: Unleashing llm reasoning with rule- based reinforcement learning, 2025

Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, and Chong Luo. Logic-rl: Unleashing llm reasoning with rule- based reinforcement learning, 2025. 3

work page 2025

[71] [71]

Focus- n-fix: Region-aware fine-tuning for text-to-image genera- tion, 2025

Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, Feng Yang, and Deepak Ramachandran. Focus- n-fix: Region-aware fine-tuning for text-to-image genera- tion, 2025. 3

work page 2025

[72] [72]

Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023. 1

work page 2023

[73] [73]

The perfect blend: Redefining rlhf with mixture of judges, 2024

Tengyu Xu, Eryk Helenowski, Karthik Abinav Sankarara- man, Di Jin, Kaiyan Peng, Eric Han, Shaoliang Nie, Chen Zhu, Hejia Zhang, Wenxuan Zhou, Zhouhao Zeng, Yun He, Karishma Mandyam, Arya Talabzadeh, Madian Khabsa, Gabriel Cohen, Yuandong Tian, Hao Ma, Sinong Wang, and Han Fang. The perfect blend: Redefining rlhf with mixture of judges, 2024. 1

work page 2024

[74] [74]

Scaling autoregressive models for content-rich text-to-image generation, 2022

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gun- jan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yin- fei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. Scaling autoregressive models for content-rich text-to-image generation, 2022. 3

work page 2022

[75] [75]

Learning to sample effective and diverse prompts for text-to-image generation, 2025

Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park, and Ling Pan. Learning to sample effective and diverse prompts for text-to-image generation, 2025. 3 Curriculum Group Policy Optimization: Adaptive Sampling for Unleashing the Potential of Text-to-Image Generation Supplementary Material

work page 2025

[76] [76]

To derive the analytical solution for the optimization problem in Eq

Derivation of the Category Calibration For- mula In this section, we present the detailed derivation process for our category calibration. To derive the analytical solution for the optimization problem in Eq. (6), we first expand the KL divergence term. The original problem is: max q cX i=1 log(qi)−λ·KL(v∥q), s.t.∀q i ≥0, cX i=1 qi = 1. (9) The KL diverge...

work page

[77] [77]

The configuration uses a sampling timestepT= 10 during training andT= 40for evaluation, with an image group sizeG= 24, noise levela= 0.8, and image reso- lution 256

Further Details on the Experimental Setup Our CGPO framework builds upon the Flow-GRPO archi- tecture. The configuration uses a sampling timestepT= 10 during training andT= 40for evaluation, with an image group sizeG= 24, noise levela= 0.8, and image reso- lution 256. The KL ratio is set to 0.004 (0.04 for the fast variant). LoRA parameters are configured...

work page

[78] [78]

increasing reward with decreasing variance

Extended Experimental Results 8.1. Multiple Rewards Experimental For a controlled comparison, the reproduced 8-GPU Flow- GRPO uses the same batch size, rollout configuration, re- ward model, and training steps as CGPO, with the train- ing framework being the only difference. Following the evaluation protocol of Flow-GRPO, we train three separate models us...

work page