DanceOPD: On-Policy Generative Field Distillation
Pith reviewed 2026-06-26 04:53 UTC · model grok-4.3
The pith
Routing each sample to one expert velocity field and training a flow-matching student on its own low-noise states lets one model compose text-to-image, editing, and guidance without conflicts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DanceOPD is an on-policy generative field distillation framework for flow-matching models. Each capability source is defined as a velocity field over the shared flow state space. The method routes each sample to one capability field, queries one low-noise student-induced state, and trains with a velocity MSE objective so the student learns to compose the expert fields from its own rollout states. The same formulation also absorbs operator-defined fields such as classifier-free guidance.
What carries the argument
On-policy routing of each sample to a single capability velocity field followed by MSE training on the student's own low-noise induced states.
If this is right
- The distilled model strengthens target capabilities while preserving anchor generation quality.
- The same procedure absorbs realism fields and classifier-free guidance without architectural changes.
- Multi-capability composition improves across T2I, local editing, and global editing benchmarks.
- A simple velocity MSE objective suffices; no additional regularization terms are introduced.
Where Pith is reading between the lines
- The single-model composition could replace ensembles of specialized generators in production pipelines.
- The on-policy querying pattern might extend to sequential or video generation tasks that also suffer capability conflicts.
- Adding new capability fields after initial training may require only additional routing without retraining the entire student.
- The method's reliance on flow-matching state space suggests it could transfer to other continuous generative paradigms that use velocity or score fields.
Load-bearing premise
Routing every sample to exactly one capability field and querying only one low-noise student state is sufficient for the student to compose the expert fields without creating new interference or requiring extra regularization.
What would settle it
A controlled experiment in which the distilled single model exhibits either lower T2I quality than the anchor model or increased interference between local and global editing compared with separately trained experts.
read the original abstract
Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DanceOPD, an on-policy generative field distillation framework for flow-matching models. It routes each sample to exactly one capability-specific velocity field (T2I, local editing, global editing), queries a single low-noise state induced by the current student, and optimizes a velocity MSE objective so the student can compose the expert fields. The formulation is also shown to absorb operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption are reported to demonstrate improved multi-capability composition while preserving anchor generation quality.
Significance. If the empirical results hold under the stated on-policy querying regime, the work supplies a lightweight mechanism for distilling and composing velocity fields in flow-matching models without extra regularization terms. The explicit use of student-induced low-noise states is a concrete strength that directly targets distribution shift in distillation; the ability to absorb CFG as an operator field is also noteworthy and falsifiable.
minor comments (2)
- [Abstract / §1] The abstract and introduction would benefit from a short table or bullet list that explicitly contrasts the proposed routing/querying scheme against prior distillation baselines (e.g., standard off-policy MSE or multi-field averaging).
- [Method] Notation for the capability fields and the student-induced state should be introduced with a single equation block early in the method section to avoid repeated prose definitions.
Simulated Author's Rebuttal
We thank the referee for their accurate summary of DanceOPD, for highlighting the on-policy student-induced state querying and CFG absorption as concrete strengths, and for recommending minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity; derivation is self-contained empirical method
full rationale
The paper presents DanceOPD as an on-policy distillation framework that routes samples to single capability velocity fields, queries student-induced low-noise states, and optimizes a standard velocity MSE objective. The central claim of improved multi-capability composition is framed as an empirical outcome from experiments on T2I, editing, realism, and CFG absorption. No load-bearing derivation step reduces by construction to its inputs, self-definition, or self-citation chains; the routing and querying are explicitly stated mechanisms rather than fitted parameters renamed as predictions, and the objective does not embed circularity. The approach is self-contained against external benchmarks with no uniqueness theorems or ansatzes imported via self-citation.
Axiom & Free-Parameter Ledger
invented entities (1)
-
capability field
no independent evidence
Reference graph
Works this paper leans on
-
[1]
On-policy distillation of language models: Learning from self-generated mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self-generated mistakes. In International Conference on Learning Representations, volume 2024, pages 21246–21263, 2024
2024
-
[2]
Variational information distillation for knowledge transfer
Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D Lawrence, and Zhenwen Dai. Variational information distillation for knowledge transfer. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9163–9171, 2019
2019
-
[3]
Git re-basin: Merging models modulo permutation symmetries
Samuel K Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git re-basin: Merging models modulo permutation symmetries. arXiv preprint arXiv:2209.04836, 2022
arXiv 2022
-
[4]
Stochastic interpolants: A unifying framework for flows and diffusions.Journal of Machine Learning Research, 26(209):1–80, 2025
Michael Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.Journal of Machine Learning Research, 26(209):1–80, 2025
2025
-
[5]
Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, and Shuicheng Yan. HumanEdit: A high-quality human-rewarded dataset for instruction-based image editing.arXiv preprint arXiv:2412.04280, 2024
arXiv 2024
-
[6]
MultiDiffusion: Fusing diffusion paths for controlled image generation
Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. MultiDiffusion: Fusing diffusion paths for controlled image generation. InInternational Conference on Machine Learning, pages 1737–1752. PMLR, 2023
2023
-
[7]
Training diffusion models with reinforcement learning
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning. InInternational Conference on Learning Representations, volume 2024, 2024
2024
-
[8]
InstructPix2Pix: Learning to follow image editing instructions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. InstructPix2Pix: Learning to follow image editing instructions. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023
2023
-
[9]
Z-Image: An efficient image generation foundation model with single-stream diffusion transformer
Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, et al. Z-Image: An efficient image generation foundation model with single-stream diffusion transformer. arXiv preprint arXiv:2511.22699, 2025
Pith/arXiv arXiv 2025
-
[10]
Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models.ACM Transactionson Graphics, 42(4):1–10, 2023
Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, and Daniel Cohen-Or. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models.ACM Transactionson Graphics, 42(4):1–10, 2023
2023
-
[11]
TINO-Edit: Timestep and noise optimization for robust diffusion-based image editing
Sherry X Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, and Pradeep Sen. TINO-Edit: Timestep and noise optimization for robust diffusion-based image editing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6337–6346, 2024
2024
-
[12]
An empirical study of GPT-4o image generation capabilities.arXiv preprint arXiv:2504.05979, 2025
Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, et al. An empirical study of GPT-4o image generation capabilities.arXiv preprint arXiv:2504.05979, 2025
arXiv 2025
-
[13]
GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InInternational Conference on Machine Learning, pages 794–803. PMLR, 2018
2018
-
[14]
Just pick a sign: Optimizing deep multitask models with gradient sign dropout.Advances in Neural Information Processing Systems, 33:2039–2050, 2020
Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, and Dragomir Anguelov. Just pick a sign: Optimizing deep multitask models with gradient sign dropout.Advances in Neural Information Processing Systems, 33:2039–2050, 2020
2039
-
[15]
PhysBench: Bench- marking and enhancing vision-language models for physical world understanding
Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Campagnolo Guizilini, and Yue Wang. PhysBench: Bench- marking and enhancing vision-language models for physical world understanding. InInternational Conference on Learning Representations, 2025
2025
-
[16]
EditMGT: Unleashing potentials of masked generative transformers in image editing
Wei Chow, Linfeng Li, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, and Songhua Liu. EditMGT: Unleashing potentials of masked generative transformers in image editing. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 38038–38048, 2026
2026
-
[17]
Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, and Songhua Liu. Masked generative transformer is what you need for image editing.arXiv preprint arXiv:2605.10859, 2026. 34
Pith/arXiv arXiv 2026
-
[18]
Quan Dao, Hao Phung, Binh Nguyen, and Anh Tran. Flow matching in latent space. arXiv preprint arXiv:2307.08698, 2023
arXiv 2023
-
[19]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InAdvancesin Neural Information Processing Systems, volume 34, pages 8780–8794, 2021
2021
-
[20]
Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC
Yilun Du, Conor Durkan, Robin Strudel, Joshua B Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl- Dickstein, Arnaud Doucet, and Will Sussman Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC. InInternational Conference on Machine Learning, pages 8489–8510. PMLR, 2023
2023
-
[21]
Scaling rectified flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. In International Conference on Machine Learning, pages 12606–12633. PMLR, 2024
2024
-
[22]
Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models
Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. Advancesin Neural Information Processing Systems, 36:79858–79885, 2023
2023
-
[23]
Rubric-based on-policy distillation.arXiv preprint arXiv:2605.07396, 2026
Junfeng Fang, Zhepei Hong, Mao Zheng, Mingyang Song, Gengsheng Li, Houcheng Jiang, Dan Zhang, Haiyun Guo, Xiang Wang, and Tat-Seng Chua. Rubric-based on-policy distillation.arXiv preprint arXiv:2605.07396, 2026
Pith/arXiv arXiv 2026
-
[24]
Flow-OPD: On-policy distillation for flow matching models.arXiv preprint arXiv:2605.08063, 2026
Zhen Fang, Wenxuan Huang, Yu Zeng, Yiming Zhao, Shuang Chen, Kaituo Feng, Yunlong Lin, Lin Chen, Zehui Chen, Shaosheng Cao, et al. Flow-OPD: On-policy distillation for flow matching models.arXiv preprint arXiv:2605.08063, 2026
Pith/arXiv arXiv 2026
-
[25]
Training-free structured diffusion guidance for compositional text-to-image synthesis
Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, and William Yang Wang. Training-free structured diffusion guidance for compositional text-to-image synthesis. arXiv preprint arXiv:2212.05032, 2022
arXiv 2022
-
[26]
StephanieFu, NetanelTamir, ShobhitaSundaram, LucyChai, RichardZhang, TaliDekel, andPhillipIsola. Dream- Sim: Learning new dimensions of human visual similarity using synthetic data.arXiv preprint arXiv:2306.09344, 2023
Pith/arXiv arXiv 2023
-
[27]
Efficient knowledge distillation from an ensemble of teachers
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran. Efficient knowledge distillation from an ensemble of teachers. InInterspeech, pages 3697–3701, 2017
2017
-
[28]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022
Pith/arXiv arXiv 2022
-
[29]
GenEval: An object-focused framework for evaluating text-to-image alignment.Advancesin Neural Information Processing Systems, 36:52132–52152, 2023
Dhruba Ghosh, Hannaneh Hajishirzi, and Ludwig Schmidt. GenEval: An object-focused framework for evaluating text-to-image alignment.Advancesin Neural Information Processing Systems, 36:52132–52152, 2023
2023
-
[30]
MiniLLM: Knowledge distillation of large language models
Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. MiniLLM: Knowledge distillation of large language models. In International Conference on Learning Representations, volume 2024, pages 32694–32717, 2024
2024
-
[31]
Efficient diffusion training via min-SNR weighting strategy
Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffusion training via min-SNR weighting strategy. InIEEE/CVF International Conference on Computer Vision, pages 7441–7451, 2023
2023
-
[32]
A comprehensive overhaul of feature distillation
Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, and Jin Young Choi. A comprehensive overhaul of feature distillation. InIEEE/CVF International Conference on Computer Vision, pages 1921–1930, 2019
1921
-
[33]
Prompt-to-prompt image editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022
Pith/arXiv arXiv 2022
-
[34]
CLIPScore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InConference on Empirical Methods in Natural Language Processing, pages 7514–7528, 2021
2021
-
[35]
Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 35
Pith/arXiv arXiv 2015
-
[36]
Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
Pith/arXiv arXiv 2022
-
[37]
Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020
2020
-
[38]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022
2022
-
[39]
Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, and Noah A Smith. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. InIEEE/CVF International Conference on Computer Vision, pages 20406–20417, 2023
2023
-
[40]
Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhenguo Li, and Xihui Liu. T2I-CompBench++: An enhanced and comprehensive benchmark for compositional text-to-image generation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3563–3579, 2025
2025
-
[41]
Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. Composer: Creative and controllable image synthesis with composable conditions.arXiv preprint arXiv:2302.09778, 2023
arXiv 2023
-
[42]
Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022
Pith/arXiv arXiv 2022
-
[43]
Adaptive mixtures of local experts
Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991
1991
-
[44]
Rotograd: Gradient homogenization in multitask learning.arXiv preprint arXiv:2103.02631, 2021
Adrián Javaloy and Isabel Valera. Rotograd: Gradient homogenization in multitask learning.arXiv preprint arXiv:2103.02631, 2021
arXiv 2021
-
[45]
Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang, Weipeng Zhang, Ke Zeng, Xunliang Cai, and Zequn Sun. Asymmetric on-policy distillation: Bridging exploitation and imitation at the token level.arXiv preprint arXiv:2605.06387, 2026
Pith/arXiv arXiv 2026
-
[46]
D-OPSD: On-policy self-distillation for continuously tuning step-distilled diffusion models
Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, et al. D-OPSD: On-policy self-distillation for continuously tuning step-distilled diffusion models. arXiv preprint arXiv:2605.05204, 2026
Pith/arXiv arXiv 2026
-
[47]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advancesin Neural Information Processing Systems, 35:26565–26577, 2022
2022
-
[48]
Consistency trajectory models: Learning probability flow ODE trajectory of diffusion
Dongjun Kim, Chieh-Hsin Lai, WeiHsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. InInternational Conference on Learning Representations, 2024
2024
-
[49]
Pick-A-Pic: An open dataset of user preferences for text-to-image generation
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-A-Pic: An open dataset of user preferences for text-to-image generation. InAdvances in Neural Information Processing Systems, volume 36, pages 36652–36663, 2023
2023
-
[50]
AI for auto-research: Roadmap & user guide.arXiv preprint arXiv:2605.18661, 2026
Lingdong Kong, Xian Sun, Wei Chow, Linfeng Li, Kevin Qinghong Lin, Xuan Billy Zhang, Song Wang, Rong Li, Qing Wu, Wei Gao, et al. AI for auto-research: Roadmap & user guide.arXiv preprint arXiv:2605.18661, 2026
Pith/arXiv arXiv 2026
-
[51]
VieScore: Towards explainable metrics for conditional image synthesis evaluation
Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen. VieScore: Towards explainable metrics for conditional image synthesis evaluation. InAnnual Meeting of the Association for Computational Linguistics, pages 12268–12290, 2024
2024
-
[52]
Multi-concept customization of text-to-image diffusion
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1931–1941, 2023
1931
-
[53]
Quanhao Li, Junqiu Yu, Kaixun Jiang, Yujie Wei, Zhen Xing, Pandeng Li, Ruihang Chu, Shiwei Zhang, Yu Liu, and Zuxuan Wu. DiffusionOPD: A unified perspective of on-policy distillation in diffusion models.arXiv preprint arXiv:2605.15055, 2026
Pith/arXiv arXiv 2026
-
[54]
Schedule your edit: A simple yet effective diffusion noise schedule for image editing
Haonan Lin, Yan Chen, Jiahao Wang, Wenbin An, Mengmeng Wang, Feng Tian, Yong Liu, Guang Dai, Jingdong Wang, and Qianying Wang. Schedule your edit: A simple yet effective diffusion noise schedule for image editing. Advancesin Neural Information Processing Systems, 37:115712–115756, 2024. 36
2024
-
[55]
Flow matching for generative modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022
Pith/arXiv arXiv 2022
-
[56]
Conflict-averse gradient descent for multi-task learning
Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. Advancesin Neural Information Processing Systems, 34:18878–18890, 2021
2021
-
[57]
Flow-GRPO: Training flow matching models via online RL.Advances in Neural Information Processing Systems, 38:40783–40818, 2026
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-GRPO: Training flow matching models via online RL.Advances in Neural Information Processing Systems, 38:40783–40818, 2026
2026
-
[58]
Towards impartial multi-task learning
Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. InInternational Conference on Learning Representations, 2021
2021
-
[59]
Compositional visual generation with composable diffusion models
Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B Tenenbaum. Compositional visual generation with composable diffusion models. InEuropean Conference on Computer Vision, pages 423–439. Springer, 2022
2022
-
[60]
Step1x-Edit: A practical framework for general image editing
Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, et al. Step1x-Edit: A practical framework for general image editing. arXiv preprint arXiv:2504.17761, 2025
Pith/arXiv arXiv 2025
-
[61]
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022
Pith/arXiv arXiv 2022
-
[62]
DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022
2022
-
[63]
DPM-Solver++: Fast solver for guided sampling of diffusion probabilistic models.Machine Intelligence Research, 22(4):730–751, 2025
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver++: Fast solver for guided sampling of diffusion probabilistic models.Machine Intelligence Research, 22(4):730–751, 2025
2025
-
[64]
Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han, Tianyi Zhang, and Vladimir Braverman. Demystifying OPD: Length inflation and stabilization strategies for large language models.arXiv preprint arXiv:2604.08527, 2026
Pith/arXiv arXiv 2026
-
[65]
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023
Pith/arXiv arXiv 2023
-
[66]
Modeling task relationships in multi- task learning with multi-gate mixture-of-experts
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. Modeling task relationships in multi- task learning with multi-gate mixture-of-experts. InACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1930–1939, 2018
1930
-
[67]
Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks
Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, and James Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. InAnnual Meeting of the Association for Computational Linguistics, pages 565–576, 2021
2021
-
[68]
Merging models with fisher-weighted averaging
Michael S Matena and Colin Raffel. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716, 2022
2022
-
[69]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021
Pith/arXiv arXiv 2021
-
[70]
Multi-task learning as a bargaining game.arXiv preprint arXiv:2202.01017, 2022
Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game.arXiv preprint arXiv:2202.01017, 2022
arXiv 2022
-
[71]
Improveddenoisingdiffusionprobabilisticmodels
AlexanderQuinnNicholandPrafullaDhariwal. Improveddenoisingdiffusionprobabilisticmodels. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021
2021
-
[72]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InIEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023
2023
-
[73]
Adapterfusion: Non-destructive task composition for transfer learning
Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, and Iryna Gurevych. Adapterfusion: Non-destructive task composition for transfer learning. InConference of the European Chapter of the Association for Computational Linguistics, pages 487–503, 2021
2021
-
[74]
SDXL: Improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. InInternational Conference on Learning Representations, 2024. 37
2024
-
[75]
Dreamfusion: Text-to-3D using 2D diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3D using 2D diffusion. arXiv preprint arXiv:2209.14988, 2022
Pith/arXiv arXiv 2022
-
[76]
Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, and Katerina Fragkiadaki. Aligning text-to-image diffusion models with reward backpropagation.arXiv preprint arXiv:2310.03739, 2023
arXiv 2023
-
[77]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022
2022
-
[78]
FitNets: Hints for thin deep nets.arXiv preprint arXiv:1412.6550, 2014
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. FitNets: Hints for thin deep nets.arXiv preprint arXiv:1412.6550, 2014
Pith/arXiv arXiv 2014
-
[79]
A reduction of imitation learning and structured prediction to no-regret online learning
Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InInternational Conference on Artificial Intelligence and Statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011
2011
-
[80]
DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.