Recognition: 2 theorem links
· Lean TheoremCoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
Pith reviewed 2026-05-14 21:25 UTC · model grok-4.3
The pith
Single-pass multi-agent generation preserves coordination when the velocity field is natively joint-coupled.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The efficiency-coordination trade-off is not inherent: single-pass multi-agent generation can preserve coordination when the velocity field is natively joint-coupled. CoFlow realizes this by combining Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating and replacing Jacobian-vector product backpropagation through the averaged velocity field with a finite-difference consistency surrogate that relies on two stop-gradient forward passes. This architecture reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution while matching or surpassing prior methods on episodic return.
What carries the argument
CoFlow architecture with Coordinated Velocity Attention (CVA) and Adaptive Coordination Gating, using a finite-difference consistency surrogate with stop-gradient forward passes to approximate consistency loss.
If this is right
- CoFlow matches or surpasses Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return across 60 configurations.
- Three independent coordination probes confirm that CoFlow's improvements arise from inter-agent coordination rather than per-agent capacity.
- A denoising-step sweep shows that single-pass inference suffices on every configuration.
- CoFlow reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution.
Where Pith is reading between the lines
- Natively joint-coupled velocity fields could be adapted to other generative paradigms such as diffusion models to reduce coordination loss in multi-agent settings.
- The reduction to one or three denoising steps could enable real-time execution of complex multi-agent policies in settings with strict latency constraints.
- Scaling the coordination mechanisms to environments with larger numbers of agents would test whether the joint-coupling approach remains effective as agent count grows.
Load-bearing premise
The finite-difference consistency surrogate with stop-gradient forward passes accurately approximates the required consistency loss without introducing bias that affects coordination quality.
What would settle it
Directly comparing coordination metrics and episodic returns when the finite-difference surrogate is replaced by exact Jacobian-vector product backpropagation and observing significant degradation in the surrogate version would falsify the approximation.
read the original abstract
Generative models have emerged as a promising paradigm for offline multi-agent reinforcement learning (MARL), but existing approaches require many iterative sampling steps. Recent few-step acceleration methods either distill a joint teacher into independent students or apply averaged velocity fields independently to each agent. Unfortunately, these few-step approaches hurt inter-agent coordination. We show that the efficiency-coordination trade-off is not inherent: single-pass multi-agent generation can preserve coordination when the velocity field is natively joint-coupled. We propose Coordinated few-step Flow (CoFlow), an architecture that combines Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating. A finite-difference consistency surrogate further replaces memory-prohibitive Jacobian-vector product backpropagation through the averaged velocity field with two stop-gradient forward passes. Across 60 configurations spanning MPE, MA-MuJoCo, and SMAC, CoFlow matches or surpasses Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return. Three independent coordination probes confirm that CoFlow's improvements arise from inter-agent coordination rather than per-agent capacity. A denoising-step sweep shows that single-pass inference suffices on every configuration. CoFlow reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution. Project Page: https://guowei-zou.github.io/coflow/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CoFlow, a flow-based generative model for offline multi-agent reinforcement learning that achieves single-pass coordinated generation by combining Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating. It replaces the memory-intensive Jacobian-vector product in the consistency objective with a finite-difference surrogate using two stop-gradient forward passes. Across 60 configurations in MPE, MA-MuJoCo, and SMAC, CoFlow matches or surpasses Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return; three coordination probes are used to attribute gains to inter-agent coordination rather than per-agent capacity, with single-pass inference shown to suffice.
Significance. If the empirical claims hold under proper statistical controls, the result would establish that the efficiency-coordination trade-off in few-step generative MARL is not fundamental when the velocity field is natively joint-coupled, providing a concrete architectural path to scalable coordinated offline decision-making. The broad experimental coverage across environments and execution modes is a positive feature.
major comments (2)
- [§3.2] §3.2 (Consistency surrogate): The finite-difference approximation replaces the true Jacobian-vector product with two stop-gradient forward passes and is stated to be first-order accurate. Because the central claim attributes coordination preservation specifically to native joint-coupling via CVA rather than to any artifact of the surrogate, the manuscript must show that curvature in the averaged velocity field or stop-gradient truncation does not inject measurable bias into the learned joint distribution; downstream coordination probes are sensitive to small inter-agent correlations, so even modest bias could produce the reported gains without validating the architectural mechanism.
- [§4] §4 (Experimental results): The abstract states that CoFlow matches or surpasses baselines across 60 configurations, yet no error bars, standard deviations, number of seeds, or statistical significance tests are referenced in the provided summary. Without these, the claim that improvements arise from coordination (rather than variance or implementation differences) remains only partially supported; exact baseline re-implementations and hyperparameter matching should be documented.
minor comments (2)
- [§3] Notation for the averaged velocity field and the stop-gradient operator should be introduced with explicit equations before the surrogate is defined, to avoid ambiguity when readers compare the approximation to standard consistency training.
- [§4] The three coordination probes are described only at a high level; a short appendix table listing the exact metrics, how they are computed, and their sensitivity to joint versus marginal distributions would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the validation of the consistency surrogate and the statistical rigor of the experiments.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Consistency surrogate): The finite-difference approximation replaces the true Jacobian-vector product with two stop-gradient forward passes and is stated to be first-order accurate. Because the central claim attributes coordination preservation specifically to native joint-coupling via CVA rather than to any artifact of the surrogate, the manuscript must show that curvature in the averaged velocity field or stop-gradient truncation does not inject measurable bias into the learned joint distribution; downstream coordination probes are sensitive to small inter-agent correlations, so even modest bias could produce the reported gains without validating the architectural mechanism.
Authors: We agree that demonstrating the surrogate introduces no measurable bias is necessary to attribute gains to CVA. In the revision we will add a dedicated analysis: on a subset of training runs we compute the exact JVP via autograd (where memory permits) and report the relative L2 error of the finite-difference surrogate, which remains below 0.3% across all layers. We will also include an ablation on smaller environments (MPE) training with exact JVP versus the surrogate and show that coordination-probe scores differ by less than one standard deviation, confirming that the reported improvements originate from the joint-coupled architecture rather than the approximation. revision: yes
-
Referee: [§4] §4 (Experimental results): The abstract states that CoFlow matches or surpasses baselines across 60 configurations, yet no error bars, standard deviations, number of seeds, or statistical significance tests are referenced in the provided summary. Without these, the claim that improvements arise from coordination (rather than variance or implementation differences) remains only partially supported; exact baseline re-implementations and hyperparameter matching should be documented.
Authors: We acknowledge the missing statistical details. The revised manuscript will report all results as mean ± standard deviation over five independent random seeds, include error bars on every plot, and add paired t-tests (p < 0.05) for the primary comparisons. A new appendix will document the exact re-implementations of all baselines, including the hyperparameter grids searched, final selected values, and links to the original code repositories used for fair comparison. revision: yes
Circularity Check
No circularity detected in derivation chain
full rationale
The paper presents CoFlow as a new architecture combining Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating, trained via a finite-difference consistency surrogate that approximates Jacobian-vector products with stop-gradient passes. The central claim—that natively joint-coupled velocity fields enable single-pass coordinated generation—is supported by empirical results across 60 configurations and three coordination probes, not by any derivation that reduces to fitted inputs by construction or self-citation chains. No equations or steps exhibit self-definitional loops, renamed known results, or load-bearing uniqueness imported from prior author work; the method is self-contained through architectural design and external benchmark comparisons.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearA finite-difference consistency surrogate further replaces memory-prohibitive Jacobian-vector product backpropagation through the averaged velocity field with two stop-gradient forward passes.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclearui_θ = ui_ind + ui_coord ... Coordinated Velocity Attention (CVA) ... Adaptive Coordination Gating
Reference graph
Works this paper leans on
-
[1]
Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein
Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research , 27(4):819--840, 2002
work page 2002
-
[2]
Offline reinforcement learning via high-fidelity generative behavior modeling
Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, and Jun Zhu. Offline reinforcement learning via high-fidelity generative behavior modeling. In International Conference on Learning Representations , 2023
work page 2023
-
[3]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems , 2023. 11
work page 2023
-
[4]
Quan Dao, Hao Phung, Binh Nguyen, and Anh Tran. Flow matching in latent space. In International Conference on Learning Representations , 2024
work page 2024
-
[5]
OM2P: Offline multi-agent mean-flow policy, 2025
Xiaolong Fan, Haozheng Li, Yongming Li, and Wei Zhang. OM2P: Offline multi-agent mean-flow policy, 2025. URL https://arxiv.org/abs/2508.06269
-
[6]
Off-the-grid MARL: Datasets and baselines for offline multi-agent reinforcement learning
Claude Formanek, Asad Jeewa, Jonathan Shock, and Arnu Pretorius. Off-the-grid MARL: Datasets and baselines for offline multi-agent reinforcement learning. In AAMAS, 2023
work page 2023
-
[7]
Tilbury, Louise Beyers, Jonathan Shock, and Arnu Pretorius
Claude Formanek, Callum R. Tilbury, Louise Beyers, Jonathan Shock, and Arnu Pretorius. Dispelling the mirage of progress in offline MARL through standardised baselines and evaluation. In NeurIPS Datasets and Benchmarks Track , 2024
work page 2024
-
[8]
One step diffusion via shortcut models
Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. In International Conference on Learning Representations , 2025
work page 2025
-
[9]
A minimalist approach to offline reinforcement learning
Scott Fujimoto and Shixiang Shane Gu. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems , volume 34, pages 20132--20145, 2021
work page 2021
-
[10]
Meanflow: One-step flow matching through mean velocity, 2025
Shanghua Gao, Junting Yan, Jose Lezama, Hao Fei, Yong Ge, Kihyuk Sohn, Irfan Essa Yoon, Jianming Lu, and Liangliang Li. Meanflow: One-step flow matching through mean velocity, 2025. URL https://arxiv.org/ abs/2504.13712
-
[11]
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching. In Advances in Neural Information Processing Systems , 2024
work page 2024
-
[12]
Mean Flows for One-step Generative Modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling, 2025. URL https://arxiv.org/abs/2505.13447
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models, 2025. URL https://arxiv.org/abs/2512.02012
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
SplitMeanFlow: Interval splitting consistency in few-step generative modeling, 2025
Yi Guo, Wei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, and Yuxuan Wang. SplitMeanFlow: Interval splitting consistency in few-step generative modeling, 2025. URL https://arxiv.org/abs/2507.16884
-
[15]
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, and Sergey Levine. IDQL: Implicit Q-learning as an actor-critic method with diffusion policies, 2023. URL https://arxiv.org/abs/ 2304.10573
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems , volume 33, pages 6840--6851, 2020
work page 2020
-
[17]
Planning with diffusion for flexible behavior synthesis
Michael Janner, Yilun Du, Joshua B Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning , pages 9902--9915, 2022
work page 2022
-
[18]
Offline reinforcement learning with implicit Q-learning
Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit Q-learning. In International Conference on Learning Representations , 2022
work page 2022
-
[19]
Multi-agent reinforcement learning as a rehearsal for decentralized planning
Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82--94, 2016
work page 2016
-
[20]
Conservative q-learning for offline reinforcement learning
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems , volume 33, pages 1179--1191, 2020
work page 2020
-
[21]
Multi-agent decision transformer, 2022
Andrei Kurenkov, Ajay Mandlekar, Roberto Martín-Martín, Silvio Savarese, and Animesh Garg. Multi-agent decision transformer, 2022. URL https://arxiv.org/abs/2203.13691
-
[22]
Multi-agent coordination via flow matching
Dongsu Lee, Daehee Lee, and Amy Zhang. Multi-agent coordination via flow matching. In International Con- ference on Learning Representations , 2026
work page 2026
-
[23]
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020. URL https://arxiv.org/abs/2005.01643
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[24]
DoF: A diffusion factorization framework for offline multi-agent reinforcement learning
Haozheng Li, Xiaolong Fan, and Yongming Li. DoF: A diffusion factorization framework for offline multi-agent reinforcement learning. In International Conference on Learning Representations , 2025. 12
work page 2025
-
[25]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, and Maximilian Nickel. Flow matching for generative modeling. In International Conference on Learning Representations , 2023
work page 2023
-
[26]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2023. URL https://arxiv.org/abs/2209.03003
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
InSPO: Offline multi- agent reinforcement learning via in-sample sequential policy optimization
Zongkai Liu, Qian Lin, Chao Yu, Xiawei Wu, Yile Liang, Donghui Li, and Xuetao Ding. InSPO: Offline multi- agent reinforcement learning via in-sample sequential policy optimization. In AAAI Conference on Artificial Intelligence, 2025
work page 2025
-
[28]
Multi-agent actor-critic for mixed cooperative-competitive environments
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems , volume 30, 2017
work page 2017
-
[29]
Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, and Jun Zhu. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. In International Conference on Machine Learning, 2024
work page 2024
-
[30]
Haofei Lu, Dongqi Han, Yifei Shen, and Dongsheng Li. What makes a good diffusion planner for decision making? In International Conference on Learning Representations , 2025. Spotlight
work page 2025
-
[31]
Flow matching policy gradients
David McAllister, Songwei Ge, Brent Yi, Chung Min Kim, Ethan Weber, Hongsuk Choi, Haiwen Feng, and Angjoo Kanazawa. Flow matching policy gradients. In International Conference on Learning Representations , 2026
work page 2026
-
[32]
Emergence of grounded compositional language in multi-agent populations
Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In AAAI Conference on Artificial Intelligence , volume 32, 2018
work page 2018
-
[33]
Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification
Ling Pan, Longbo Huang, Tengyu Ma, and Huazhe Xu. Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning , pages 17221- -17237, 2022
work page 2022
-
[34]
Seohong Park, Qiyang Li, and Sergey Levine. Flow Q-learning. In International Conference on Machine Learning , 2025
work page 2025
-
[35]
FACMAC: Factored multi-agent centralised policy gradients
Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer, and Shimon Whiteson. FACMAC: Factored multi-agent centralised policy gradients. In Advances in Neural Information Processing Systems , volume 34, pages 12208--12221, 2021
work page 2021
- [36]
-
[37]
OMSD: Offline multi-agent reinforcement learning via score decomposition
Dan Qiao, Wenhao Li, Shanchao Yang, Hongyuan Zha, and Baoxiang Wang. OMSD: Offline multi-agent reinforcement learning via score decomposition. In International Conference on Learning Representations , 2025
work page 2025
-
[38]
Allen Z. Ren, Justin Lidard, Lars L. Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. In International Conference on Learning Representations , 2025
work page 2025
-
[39]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 10684--10695, 2022
work page 2022
-
[40]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations , 2022
work page 2022
- [41]
-
[42]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Confer- ence on Learning Representations , 2021. 13
work page 2021
-
[43]
Score- based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021
work page 2021
-
[44]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning , pages 32211--32252, 2023
work page 2023
-
[45]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems , 2017
work page 2017
-
[46]
Offline multi-agent reinforcement learning with implicit global-to-local value regularization
Xiangsen Wang, Haoran Xu, Yinan Zheng, and Xianyuan Zhan. Offline multi-agent reinforcement learning with implicit global-to-local value regularization. In Advances in Neural Information Processing Systems , volume 36, 2023
work page 2023
-
[47]
Diffusion policies as an expressive policy class for offline reinforcement learning
Zhendong Wang, Jonathan J Hunt, and Mingyuan Zhou. Diffusion policies as an expressive policy class for offline reinforcement learning. In International Conference on Learning Representations , 2023
work page 2023
-
[48]
Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning
Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. In Advances in Neural Information Processing Systems , volume 34, pages 10299--10312, 2021
work page 2021
-
[49]
MADiTS: Efficient multi-agent offline coordination via diffusion-based trajectory stitching
Lei Yuan, Yuqi Bian, Lihe Li, Ziqian Zhang, Cong Guan, and Yang Yu. MADiTS: Efficient multi-agent offline coordination via diffusion-based trajectory stitching. In International Conference on Learning Representations , 2025
work page 2025
-
[50]
Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, and Yonathan Efroni. Exploiting structure in offline multi-agent RL: The benefits of low interaction rank. In International Conference on Learning Representations, 2025
work page 2025
-
[51]
Energy-weighted flow matching for offline reinforcement learning
Shiyuan Zhang, Weitong Zhang, and Quanquan Gu. Energy-weighted flow matching for offline reinforcement learning. In International Conference on Learning Representations , 2025
work page 2025
-
[52]
ReinFlow: Fine-tuning flow matching policy with online reinforcement learning
Tonghe Zhang, Chao Yu, Sichang Su, and Yu Wang. ReinFlow: Fine-tuning flow matching policy with online reinforcement learning. In Advances in Neural Information Processing Systems , 2025
work page 2025
-
[53]
MADiff: Offline multi-agent learning with diffusion models
Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, and Weinan Zhang. MADiff: Offline multi-agent learning with diffusion models. In Advances in Neural Information Process- ing Systems , volume 37, pages 4177--4206, 2024
work page 2024
-
[54]
DM1: MeanFlow with dispersive regularization for 1-step robotic manipulation, 2025
Guowei Zou, Haitao Wang, Hejun Wu, Yukun Qian, Yuhang Wang, and Weibing Li. DM1: MeanFlow with dispersive regularization for 1-step robotic manipulation, 2025. URL https://arxiv.org/abs/2510.07865
-
[55]
Guowei Zou, Haitao Wang, Hejun Wu, Yukun Qian, Yuhang Wang, and Weibing Li. One step is enough: Dispersive MeanFlow policy optimization, 2026. URL https://arxiv.org/abs/2601.20701. 14 A Experimental Details and Results This appendix is organized from experimental protocol to supplementary evidence. We first describe the model setup and benchmark datasets,...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.