Embedding-perturbed Exploration Preference Optimization for Flow Models
Pith reviewed 2026-05-20 18:28 UTC · model grok-4.3
The pith
Embedding-level perturbations within sample groups sustain variance and keep the learning signal alive during preference optimization for flow models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Embedding-perturbed Exploration Preference Optimization (E²PO) adds structured perturbations at the embedding level inside sample groups. This produces a sustained intra-group variance that preserves the discriminative signal required for optimization. The framework therefore avoids the variance collapse that occurs in standard group-based methods and yields flow models whose outputs match human preferences more faithfully than existing baselines.
What carries the argument
Embedding-level perturbation inside sample groups: a controlled addition of structured noise at the embedding stage that maintains variance while leaving semantic content and preference ordering unchanged.
If this is right
- Optimization remains stable because a non-zero discriminative signal persists even late in training.
- Flow models reach higher human-preference alignment without requiring larger group sizes or repeated noise resampling.
- The risk of premature policy stagnation or reward hacking is reduced.
- The same perturbation principle offers a direct alternative to variance-increasing tricks that have shown diminishing returns.
Where Pith is reading between the lines
- The same embedding perturbation idea could be tested on diffusion or autoregressive models to check whether variance maintenance is architecture-specific.
- Smaller group sizes might become viable if the perturbation reliably supplies the missing signal, lowering per-step compute.
- Measuring output diversity on downstream tasks after training would reveal whether the added variance also improves sample variety.
Load-bearing premise
Perturbations added at the embedding level will increase useful variance without corrupting the semantic validity of the samples or the accuracy of their preference labels.
What would settle it
An experiment that applies the embedding perturbations and then finds either collapsed variance across groups or generated samples whose human preference rankings differ from the unperturbed versions would falsify the central claim.
Figures
read the original abstract
Recent advancements have established Reinforcement Learning (RL) as a pivotal paradigm for aligning generative models with human intent. However, group-based optimization frameworks (e.g., GRPO) face a critical limitation: the rapid decay of intra-group variance. As the distinctiveness among samples within a group diminishes, the variance approaches zero. This eliminates the very learning signal required for optimization, rendering the process unstable and forcing the policy into premature stagnation or reward hacking. Existing strategies, such as varying the initial noise or increasing group sizes, often fail to address this fundamental issue, resulting in training instability or diminishing returns. To overcome these challenges, we propose $\textbf{Embedding-perturbed Exploration Preference Optimization (}E^2\textbf{PO)}$, a novel framework that sustains optimization through embedding-level perturbation. Our method introduces structured, embedding-level perturbations within sample groups, guaranteeing a robust variance that preserves the discriminative signal throughout the training process. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving a more faithful alignment with human preference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Embedding-perturbed Exploration Preference Optimization (E²PO) for flow models. It identifies rapid decay of intra-group variance as a core limitation in group-based RL frameworks such as GRPO, which eliminates the learning signal and leads to instability or reward hacking. The method introduces structured perturbations at the embedding level within sample groups to sustain variance while preserving the discriminative signal and semantic validity. Experiments are claimed to show significant outperformance over state-of-the-art baselines with more faithful human-preference alignment.
Significance. If the embedding perturbations can be shown to increase useful variance without corrupting preference labels, the approach would offer a targeted engineering fix for a known instability in preference optimization of generative models. This could improve training stability for flow-based architectures without requiring larger groups or noise variation, provided the invariance property holds.
major comments (1)
- [Abstract and §3] Abstract and §3 (Method): The central claim that 'structured, embedding-level perturbations ... guaranteeing a robust variance that preserves the discriminative signal' requires that the perturbation operator leaves both semantic content and the correctness of human preference labels unchanged. No derivation, bound, or invariance argument is supplied showing that the perturbation commutes with the preference oracle or that embedding shifts do not cross decision boundaries corresponding to preference flips. This is load-bearing for the claim, as label corruption would turn the optimization objective into a misaligned surrogate.
minor comments (1)
- [Abstract] The abstract supplies no equations, implementation details, or quantitative metrics, which hinders immediate technical assessment even though this is acceptable for an abstract.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review of our manuscript. The major comment raises an important point about the need for justification that our embedding perturbations preserve semantic content and preference labels. We address this below and outline the changes we will make in revision.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Method): The central claim that 'structured, embedding-level perturbations ... guaranteeing a robust variance that preserves the discriminative signal' requires that the perturbation operator leaves both semantic content and the correctness of human preference labels unchanged. No derivation, bound, or invariance argument is supplied showing that the perturbation commutes with the preference oracle or that embedding shifts do not cross decision boundaries corresponding to preference flips. This is load-bearing for the claim, as label corruption would turn the optimization objective into a misaligned surrogate.
Authors: We agree that establishing preservation of semantic content and preference labels is central to the validity of E²PO. The current manuscript motivates the perturbations as small and structured within the embedding space of a fixed pre-trained encoder, with the claim supported by downstream empirical results showing improved alignment and stability. However, we acknowledge the absence of an explicit invariance argument or bound. In the revised manuscript we will add a new subsection in §3 that (i) provides a heuristic argument based on the local Lipschitz continuity of the embedding map and the small magnitude of the perturbations, (ii) reports an empirical label-consistency study in which human raters or a proxy preference model re-evaluate perturbed versus unperturbed pairs, and (iii) discusses the operating regime in which decision-boundary crossings are unlikely. These additions will make the load-bearing assumption explicit and testable. revision: yes
Circularity Check
No significant circularity; method presented as independent engineering intervention without reductive derivations
full rationale
The abstract and available text introduce E²PO as a novel framework that adds structured embedding-level perturbations to sustain intra-group variance in group-based RL optimization for flow models. No equations, derivations, or parameter-fitting steps are shown that reduce a claimed prediction or result back to the inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core claims. The variance-preservation benefit is asserted directly from the perturbation design rather than derived from fitted quantities or prior self-referential results. This is a standard non-circular engineering proposal; the derivation chain (if any exists in the full manuscript) does not exhibit the enumerated circular patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- perturbation magnitude
axioms (1)
- domain assumption Structured embedding perturbations increase intra-group variance while leaving preference signals intact.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our method introduces structured, embedding-level perturbations within sample groups, guaranteeing a robust variance that preserves the discriminative signal throughout the training process.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we propose Embedding-perturbed Exploration Preference Optimization (E²PO)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation
EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.
Reference graph
Works this paper leans on
-
[1]
https://github.com/ discus0434/aesthetic-predictor-v2-5 ,
Aesthetic predictor v2.5. https://github.com/ discus0434/aesthetic-predictor-v2-5 ,
-
[2]
Accessed: 2025-06-10
work page 2025
-
[3]
Ban, Y ., Wang, R., Zhou, T., Cheng, M., Gong, B., and Hsieh, C.-J. Understanding the impact of negative prompts: When and how do they take effect? In 8 E²PO: Embedding-perturbed Exploration Preference Optimization for Flow Models european conference on computer vision, pp. 190–206. Springer, 2024
work page 2024
-
[4]
Bengio, Y ., Courville, A., and Vincent, P. Representa- tion learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelli- gence, 35(8):1798–1828, 2013
work page 2013
-
[5]
Training diffusion models with reinforcement learn- ing
Black, K., Janner, M., Du, Y ., Kostrikov, I., and Levine, S. Training diffusion models with reinforcement learn- ing. InThe Twelfth International Conference on Learn- ing Representations
-
[6]
Chen, C., Hu, S., Zhu, J., Wu, M., Chen, J., Li, Y ., Huang, N., Fang, C., Wu, J., Chu, X., et al. Taming preference mode collapse via directional decoupling alignment in diffusion reinforcement learning.arXiv preprint arXiv:2512.24146, 2025
-
[7]
Stochastic self- guidance for training-free enhancement of diffusion models
Chen, C., Zhu, J., Feng, X., Huang, N., Zhu, C., Wu, M., Mao, F., Wu, J., Chu, X., and Li, X. Stochastic self- guidance for training-free enhancement of diffusion models. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[8]
Chen, C.-Y ., Shi, M., Zhang, G., and Shi, H. T2i- copilot: A training-free multi-agent text-to-image sys- tem for enhanced prompt interpretation and interactive generation. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pp. 19396–19405, October 2025
work page 2025
-
[9]
Chen, H., Lou, X., Feng, X., Huang, K., and Wang, X. Unveiling chain of step reasoning for vision-language models with fine-grained rewards.Advances in Neural Information Processing Systems, 38:114703–114727, 2026
work page 2026
-
[10]
Conceptweaver: Weav- ing disentangled concepts with flow.arXiv preprint arXiv:2603.28493, 2026
Chen, J., Hao, A., Chen, X., Bai, C., Chen, C., Li, Y ., Wu, J., Chu, X., and Zhang, S. Conceptweaver: Weav- ing disentangled concepts with flow.arXiv preprint arXiv:2603.28493, 2026
-
[11]
Chen, Y ., He, X., Ma, X., and Ma, Y . Contextflow: Training-free video object editing via adaptive context enrichment.arXiv preprint arXiv:2509.17818, 2025
-
[12]
Chen, B., Martí Monsó, D., Du, Y ., Simchowitz, M., Tedrake, R., and Sitzmann, V
Chung, H., Kim, J., Park, G. Y ., Nam, H., and Ye, J. C. Cfg++: Manifold-constrained classifier free guidance for diffusion models.arXiv preprint arXiv:2406.08070, 2024
-
[13]
Clark, K., Vicol, P., Swersky, K., and Fleet, D. J. Di- rectly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021
work page 2021
-
[15]
Optimizing ddpm sampling with shortcut fine-tuning.arXiv preprint arXiv:2301.13362,
Fan, Y . and Lee, K. Optimizing ddpm sampling with shortcut fine-tuning.arXiv preprint arXiv:2301.13362, 2023
-
[16]
Fan, Y ., Watkins, O., Du, Y ., Liu, H., Ryu, M., Boutilier, C., Abbeel, P., Ghavamzadeh, M., Lee, K., and Lee, K. Dpok: Reinforcement learning for fine- tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858– 79885, 2023
work page 2023
-
[17]
Fang, C., He, C., Tang, L., Zhang, Y ., Zhu, C., Shen, Y ., Chen, C., Xu, G., and Li, X. Integrating extra modality helps segmentor find camouflaged objects well.arXiv preprint arXiv:2502.14471, 2025
-
[18]
Fang, C., He, C., Zhang, Y ., Chen, C., Zhu, C., Tang, L., and Li, X. Prism: Rethinking scattered atmosphere reconstruction as a unified understanding and gener- ation model for real-world dehazing.arXiv preprint arXiv:2604.07048, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[19]
Dit4edit: Diffusion transformer for image editing
Feng, K., Ma, Y ., Wang, B., Qi, C., Chen, H., Chen, Q., and Wang, Z. Dit4edit: Diffusion transformer for image editing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp. 2969–2977, 2025
work page 2025
-
[20]
Feng, X., Yu, H., Wu, M., Hu, S., Chen, J., Zhu, C., Wu, J., Chu, X., and Huang, K. Narrlv: Towards a comprehensive narrative-centric evaluation for long video generation.arXiv preprint arXiv:2507.11245, 2025
-
[21]
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Gal, R., Alaluf, Y ., Atzmon, Y ., Patashnik, O., Bermano, A. H., Chechik, G., and Cohen-Or, D. An image is worth one word: Personalizing text-to-image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[22]
Ghosh, D., Hajishirzi, H., and Schmidt, L. Geneval: An object-focused framework for evaluating text-to- image alignment.Advances in Neural Information Processing Systems, 36:52132–52152, 2023
work page 2023
-
[23]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. Deepseek- r1: Incentivizing reasoning capability in llms via rein- forcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
When Less is More: The LLM Scaling Paradox in Context Compression
Guo, R., Liu, Y ., Ma, G., Wang, Y ., Zhang, Y ., Xia, L., Chen, K., Sun, Z., and Shi, D. When less is more: The llm scaling paradox in context compression.arXiv preprint arXiv:2602.09789, 2026. 9 E²PO: Embedding-perturbed Exploration Preference Optimization for Flow Models
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter
Gupta, S., Ahuja, C., Lin, T.-Y ., Roy, S. D., Oost- erhuis, H., de Rijke, M., and Shukla, S. N. A sim- ple and effective reinforcement learning method for text-to-image diffusion fine-tuning.arXiv preprint arXiv:2503.00897, 2025
-
[26]
Vigor-bench: How far are visual generative models from zero-shot visual reasoners?, 2026
Han, H., Huang, J., Sun, X., He, J., Yang, R., Hu, J., Peng, X., Ma, L., Wei, X., and Li, X. Vigor-bench: How far are visual generative models from zero-shot visual reasoners?, 2026. URL https://arxiv. org/abs/2603.25823
-
[27]
Camouflaged object detection with feature decomposition and edge reconstruction
He, C., Li, K., Zhang, Y ., Tang, L., Zhang, Y ., Guo, Z., and Li, X. Camouflaged object detection with feature decomposition and edge reconstruction. InCVPR, pp. 22046–22055, 2023
work page 2023
-
[28]
He, C., Li, K., Zhang, Y ., Zhang, Y ., Guo, Z., Li, X., Danelljan, M., and Yu, F. Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024
work page 2024
-
[29]
He, C., Fang, C., Zhang, Y ., Ye, T., Li, K., Tang, L., Guo, Z., Li, X., and Farsiu, S. Reti-diff: Illumina- tion degradation image restoration with retinex-based latent diffusion model.ICLR, 2025
work page 2025
-
[30]
Segment concealed object with incomplete supervision.TPAMI, 2025
He, C., Li, K., Zhang, Y ., Yang, Z., Tang, L., Zhang, Y ., Kong, L., and Farsiu, S. Segment concealed object with incomplete supervision.TPAMI, 2025
work page 2025
-
[31]
Diffusion models in low-level vision: A survey.TPAMI, 2025
He, C., Shen, Y ., Fang, C., Xiao, F., Tang, L., Zhang, Y ., Zuo, W., Guo, Z., and Li, X. Diffusion models in low-level vision: A survey.TPAMI, 2025
work page 2025
-
[32]
He, C., Xiao, F., Zhang, R., Fang, C., Fan, D.-P., and Farsiu, S. Reversible unfolding network for concealed visual perception with generative refinement.arXiv preprint arXiv:2508.15027, 2025
-
[33]
He, C., Zhang, R., Chen, Z., Yang, B., Fang, C., Lin, Y ., Xiao, F., and Farsiu, S. Unfoldldm: Deep unfolding- based blind image restoration with latent diffusion priors.arXiv preprint arXiv:2511.18152, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
He, C., Zhang, R., Xiao, F., Zhang, D., Cao, Z., and Farsiu, S. Refining context-entangled content segmen- tation via curriculum selection and anti-curriculum promotion.ICML, 2026
work page 2026
-
[35]
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models
He, X., Fu, S., Zhao, Y ., Li, W., Yang, J., Yin, D., Rao, F., and Zhang, B. Tempflow-grpo: When tim- ing matters for grpo in flow models.arXiv preprint arXiv:2508.04324, 2025
work page internal anchor Pith review arXiv 2025
-
[36]
Classifier-Free Diffusion Guidance
Ho, J. and Salimans, T. Classifier-free diffusion guid- ance.arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[37]
Huang, J., Xu, Z., Zhou, J., Liu, T., Xiao, Y ., Ou, M., Ji, B., Li, X., and Yuan, K. Sam-r1: Leveraging sam for reward feedback in multimodal segmentation via reinforcement learning.Advances in Neural Informa- tion Processing Systems, 38:138362–138383, 2026
work page 2026
-
[38]
Mate: Images are all you need for material transfer via diffusion transformer
Huang, N., Liu, H., Lin, Y ., Huang, K., Chen, C., Guo, J., Lee, T.-y., and Li, X. Mate: Images are all you need for material transfer via diffusion transformer. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15117–15126, 2025
work page 2025
-
[39]
Tod3cap: Towards 3d dense captioning in outdoor scenes
Jin, B., Zheng, Y ., Li, P., Li, W., Zheng, Y ., Hu, S., Liu, X., Zhu, J., Yan, Z., Sun, H., et al. Tod3cap: Towards 3d dense captioning in outdoor scenes. In European Conference on Computer Vision, pp. 367–
-
[40]
Kirstain, Y ., Polyak, A., Singer, U., Matiana, S., Penna, J., and Levy, O. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652– 36663, 2023
work page 2023
-
[41]
Crafting papers on machine learning
Langley, P. Crafting papers on machine learning. In Langley, P. (ed.),Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stanford, CA, 2000. Morgan Kaufmann
work page 2000
-
[42]
Lee, K., Liu, H., Ryu, M., Watkins, O., Du, Y ., Boutilier, C., Abbeel, P., Ghavamzadeh, M., and Gu, S. S. Aligning text-to-image models using human feedback.arXiv preprint arXiv:2302.12192, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Li, J., Cui, Y ., Huang, T., Ma, Y ., Fan, C., Yang, M., and Zhong, Z. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Reneg: Learning negative embedding with reward guidance
Li, X., Liu, Y ., Isobe, T., Jia, X., Cui, Q., Zhou, D., Li, D., He, Y ., Lu, H., Wang, Z., et al. Reneg: Learning negative embedding with reward guidance. InProceed- ings of the Computer Vision and Pattern Recognition Conference, pp. 23636–23645, 2025
work page 2025
-
[45]
Li, Y ., Wang, Y ., Zhu, Y ., Zhao, Z., Lu, M., She, Q., and Zhang, S. Branchgrpo: Stable and efficient grpo with structured branching in diffusion models.arXiv preprint arXiv:2509.06040, 2025
-
[46]
Lipman, Y ., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling,
-
[47]
URL https://arxiv.org/abs/2210. 02747
-
[48]
Liu, H., Huang, H., Wang, J., Liu, C., Li, X., and Ji, X. Diversegrpo: Mitigating mode collapse in image generation via diversity-aware grpo.arXiv preprint arXiv:2512.21514, 2025. 10 E²PO: Embedding-perturbed Exploration Preference Optimization for Flow Models
-
[49]
Flow-GRPO: Training Flow Matching Models via Online RL
Liu, J., Liu, G., Liang, J., Li, Y ., Liu, J., Wang, X., Wan, P., Zhang, D., and Ouyang, W. Flow-grpo: Train- ing flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[51]
Omnidiff: A comprehensive benchmark for fine- grained image difference captioning
Liu, Y ., Hou, S., Hou, S., Du, J., Meng, S., and Huang, Y . Omnidiff: A comprehensive benchmark for fine- grained image difference captioning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21440–21449, 2025
work page 2025
-
[52]
Liu, Z., Xu, Z., Shu, S., Zhou, J., Zhang, R., Tang, Z., and Li, X. Controllable layer decomposition for reversible multi-layer image generation.arXiv preprint arXiv:2511.16249, 2025
-
[53]
Long, Z., Zheng, M., Feng, K., Zhang, X., Liu, H., Yang, H., Zhang, L., Chen, Q., and Ma, Y . Follow-your-shape: Shape-aware image editing via trajectory-guided region control.arXiv preprint arXiv:2508.08134, 2025
-
[54]
Ma, X., Qiu, H., Zhang, G., Zeng, Z., Yang, S., Ma, L., and Zhao, F. Stage: Stable and generalizable grpo for autoregressive image generation, 2025. URL https: //arxiv.org/abs/2509.25027
-
[55]
MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation
Ma, X., Lei, J., Ren, T., Huang, J., Fu, S., Hao, A., Wu, J., Chu, X., and Zhao, F. Mar-grpo: Stabilized grpo for ar-diffusion hybrid image generation, 2026. URL https://arxiv.org/abs/2604.06966
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[56]
Follow your pose: Pose-guided text-to- video generation using pose-free videos
Ma, Y ., He, Y ., Cun, X., Wang, X., Chen, S., Li, X., and Chen, Q. Follow your pose: Pose-guided text-to- video generation using pose-free videos. InProceed- ings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 4117–4125, 2024
work page 2024
-
[57]
Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation
Ma, Y ., Liu, H., Wang, H., Pan, H., He, Y ., Yuan, J., Zeng, A., Cai, C., Shum, H.-Y ., Liu, W., et al. Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation. InSIGGRAPH Asia 2024 Conference Papers, pp. 1–12, 2024
work page 2024
-
[58]
Controllable video generation: A survey.arXiv preprint arXiv:2507.16869,
Ma, Y ., Feng, K., Hu, Z., Wang, X., Wang, Y ., Zheng, M., He, X., Zhu, C., Liu, H., He, Y ., et al. Con- trollable video generation: A survey.arXiv preprint arXiv:2507.16869, 2025
-
[59]
Ma, Y ., Feng, K., Zhang, X., Liu, H., Zhang, D. J., Xing, J., Zhang, Y ., Yang, A., Wang, Z., and Chen, Q. Follow-your-creation: Empowering 4d creation through video inpainting.arXiv preprint arXiv:2506.04590, 2025
-
[60]
Follow- your-click: Open-domain regional image animation via motion prompts
Ma, Y ., He, Y ., Wang, H., Wang, A., Shen, L., Qi, C., Ying, J., Cai, C., Li, Z., Shum, H.-Y ., et al. Follow- your-click: Open-domain regional image animation via motion prompts. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 39, pp. 6018– 6026, 2025
work page 2025
-
[61]
Ma, Y ., Liu, Y ., Zhu, Q., Yang, A., Feng, K., Zhang, X., Li, Z., Han, S., Qi, C., and Chen, Q. Follow- your-motion: Video motion transfer via efficient spatial-temporal decoupled finetuning.arXiv preprint arXiv:2506.05207, 2025
-
[62]
Ma, Y ., Yan, Z., Liu, H., Wang, H., Pan, H., He, Y ., Yuan, J., Zeng, A., Cai, C., Shum, H.-Y ., et al. Follow- your-emoji-faster: Towards efficient, fine-controllable, and expressive freestyle portrait animation.arXiv preprint arXiv:2509.16630, 2025
-
[63]
Omni-effects: Unified and spatially-controllable visual effects gener- ation
Mao, F., Hao, A., Chen, J., Liu, D., Feng, X., Zhu, J., Wu, M., Chen, C., Wu, J., and Chu, X. Omni-effects: Unified and spatially-controllable visual effects gener- ation. InProceedings of the AAAI Conference on Arti- ficial Intelligence, volume 40, pp. 7927–7935, 2026
work page 2026
-
[64]
Meng, D., Jin, C., Gao, Z., Li, Y ., Patras, I., and Tz- imiropoulos, G. Training-free generation of diverse and high-fidelity images via prompt semantic space optimization, 2025. URL https://arxiv.org/ abs/2511.19811
-
[65]
Training diffusion models to- wards diverse image generation with reinforcement learning
Miao, Z., Wang, J., Wang, Z., Yang, Z., Wang, L., Qiu, Q., and Liu, Z. Training diffusion models to- wards diverse image generation with reinforcement learning. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pp. 10844–10853, 2024
work page 2024
-
[66]
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wain- wright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback.Advances in neu- ral information processing systems, 35:27730–27744, 2022
work page 2022
-
[67]
Video diffusion alignment via reward gradients.arXiv preprint arXiv:2407.08737, 2024
Prabhudesai, M., Mendonca, R., Qin, Z., Fragkiadaki, K., and Pathak, D. Video diffusion alignment via reward gradients.arXiv preprint arXiv:2407.08737, 2024
-
[68]
Pryzant, R., Iter, D., Li, J., Lee, Y . T., Zhu, C., and Zeng, M. Automatic prompt optimization with” gradient descent” and beam search.arXiv preprint arXiv:2305.03495, 2023
-
[69]
High-resolution image synthesis with la- tent diffusion models
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with la- tent diffusion models. InProceedings of the IEEE/CVF 11 E²PO: Embedding-perturbed Exploration Preference Optimization for Flow Models conference on computer vision and pattern recogni- tion, pp. 10684–10695, 2022
work page 2022
-
[70]
Dreambooth: Fine tuning text-to- image diffusion models for subject-driven generation
Ruiz, N., Li, Y ., Jampani, V ., Pritch, Y ., Rubinstein, M., and Aberman, K. Dreambooth: Fine tuning text-to- image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pp. 22500–22510, 2023
work page 2023
-
[71]
Shen, Y ., Xiao, F., Hu, S., Pang, Y ., Pu, Y ., Fang, C., Li, X., and He, C. Uncertainty-masked bernoulli diffusion for camouflaged object detection refinement.arXiv preprint arXiv:2506.10712, 2025
-
[72]
Shen, Y ., Yuan, J., Aonishi, T., Nakayama, H., and Ma, Y . Follow-your-preference: Towards preference-aligned image inpainting.arXiv preprint arXiv:2509.23082, 2025
-
[73]
Skalse, J., Howe, N., Krasheninnikov, D., and Krueger, D. Defining and characterizing reward gaming.Ad- vances in Neural Information Processing Systems, 35: 9460–9471, 2022
work page 2022
-
[74]
Tam- ing rectified flow for inversion and editing
Wang, J., Pu, J., Qi, Z., Guo, J., Ma, Y ., Huang, N., Chen, Y ., Li, X., and Shan, Y . Taming recti- fied flow for inversion and editing.arXiv preprint arXiv:2411.04746, 2024
-
[75]
Wang, J., Liang, J., Liu, J., Liu, H., Liu, G., Zheng, J., Pang, W., Ma, A., Xie, Z., Wang, X., et al. Grpo-guard: Mitigating implicit over-optimization in flow matching via regulated clipping.arXiv preprint arXiv:2510.22319, 2025
-
[76]
Wang, J., Lai, Z., Chen, J., Guo, J., Guo, H., Li, X., Yue, X., and Guo, C. Elastic diffusion transformer. arXiv preprint arXiv:2602.13993, 2026
-
[77]
Precisecache: Precise feature caching for efficient and high-fidelity video genera- tion
Wang, J., Zhao, K., Guo, J., Wang, J., Guo, H., Zhu, C., Yue, X., and Li, X. Precisecache: Precise feature caching for efficient and high-fidelity video genera- tion. InThe Fourteenth International Conference on Learning Representations, 2026. URL https:// openreview.net/forum?id=DjfRkr82jn
work page 2026
-
[78]
Wang, K., Mao, J., Wu, T., and Xiang, Y . Towards a golden classifier-free guidance path via foresight fixed point iterations.arXiv preprint arXiv:2510.21512, 2025
-
[79]
On dis- crete prompt optimization for diffusion models.arXiv preprint arXiv:2407.01606, 2024
Wang, R., Liu, T., Hsieh, C.-J., and Gong, B. On dis- crete prompt optimization for diffusion models.arXiv preprint arXiv:2407.01606, 2024
-
[80]
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Wang, Y ., Li, Z., Zang, Y ., Zhou, Y ., Bu, J., Wang, C., Lu, Q., Jin, C., and Wang, J. Pref-grpo: Pairwise pref- erence reward-based grpo for stable text-to-image rein- forcement learning.arXiv preprint arXiv:2508.20751, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.