arxiv: 2605.01457 · v3 · submitted 2026-05-02 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

Beiwen Zhang, Boning Zhang, Guowei Zou, Haitao Wang, Hejun Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:25 UTC · model grok-4.3

classification 💻 cs.AI

keywords offline multi-agent reinforcement learningfew-step flow modelsinter-agent coordinationvelocity fieldgenerative policiesconsistency surrogateMARL

0 comments

The pith

Single-pass multi-agent generation preserves coordination when the velocity field is natively joint-coupled.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing few-step acceleration methods for generative models in offline multi-agent reinforcement learning hurt inter-agent coordination by distilling joint teachers into independent students or applying averaged velocity fields separately to each agent. The paper shows this efficiency-coordination trade-off is not inherent when the velocity field is designed to stay natively joint-coupled from the start. CoFlow achieves single-pass generation through Coordinated Velocity Attention combined with Adaptive Coordination Gating and a finite-difference consistency surrogate that uses two stop-gradient forward passes instead of memory-heavy backpropagation. Experiments across 60 configurations in MPE, MA-MuJoCo, and SMAC environments show CoFlow matches or exceeds Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return. Three coordination probes confirm the gains come from preserved inter-agent coordination rather than increased per-agent capacity, with single-pass inference proving sufficient on every tested setup.

Core claim

The efficiency-coordination trade-off is not inherent: single-pass multi-agent generation can preserve coordination when the velocity field is natively joint-coupled. CoFlow realizes this by combining Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating and replacing Jacobian-vector product backpropagation through the averaged velocity field with a finite-difference consistency surrogate that relies on two stop-gradient forward passes. This architecture reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution while matching or surpassing prior methods on episodic return.

What carries the argument

CoFlow architecture with Coordinated Velocity Attention (CVA) and Adaptive Coordination Gating, using a finite-difference consistency surrogate with stop-gradient forward passes to approximate consistency loss.

If this is right

CoFlow matches or surpasses Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return across 60 configurations.
Three independent coordination probes confirm that CoFlow's improvements arise from inter-agent coordination rather than per-agent capacity.
A denoising-step sweep shows that single-pass inference suffices on every configuration.
CoFlow reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Natively joint-coupled velocity fields could be adapted to other generative paradigms such as diffusion models to reduce coordination loss in multi-agent settings.
The reduction to one or three denoising steps could enable real-time execution of complex multi-agent policies in settings with strict latency constraints.
Scaling the coordination mechanisms to environments with larger numbers of agents would test whether the joint-coupling approach remains effective as agent count grows.

Load-bearing premise

The finite-difference consistency surrogate with stop-gradient forward passes accurately approximates the required consistency loss without introducing bias that affects coordination quality.

What would settle it

Directly comparing coordination metrics and episodic returns when the finite-difference surrogate is replaced by exact Jacobian-vector product backpropagation and observing significant degradation in the surrogate version would falsify the approximation.

read the original abstract

Generative models have emerged as a promising paradigm for offline multi-agent reinforcement learning (MARL), but existing approaches require many iterative sampling steps. Recent few-step acceleration methods either distill a joint teacher into independent students or apply averaged velocity fields independently to each agent. Unfortunately, these few-step approaches hurt inter-agent coordination. We show that the efficiency-coordination trade-off is not inherent: single-pass multi-agent generation can preserve coordination when the velocity field is natively joint-coupled. We propose Coordinated few-step Flow (CoFlow), an architecture that combines Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating. A finite-difference consistency surrogate further replaces memory-prohibitive Jacobian-vector product backpropagation through the averaged velocity field with two stop-gradient forward passes. Across 60 configurations spanning MPE, MA-MuJoCo, and SMAC, CoFlow matches or surpasses Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return. Three independent coordination probes confirm that CoFlow's improvements arise from inter-agent coordination rather than per-agent capacity. A denoising-step sweep shows that single-pass inference suffices on every configuration. CoFlow reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution. Project Page: https://guowei-zou.github.io/coflow/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoFlow shows joint-coupled velocity fields can preserve coordination in single-pass flow generation for offline MARL, but the finite-difference surrogate is a soft spot that needs direct validation.

read the letter

The main point is that single-pass multi-agent generation does not have to sacrifice coordination if the velocity field is built to be jointly coupled from the start. CoFlow puts this into practice with Coordinated Velocity Attention plus Adaptive Coordination Gating inside a flow-matching model, plus a finite-difference stand-in for the consistency term that avoids full Jacobian backprop. That combination is not in the earlier flow or diffusion MARL papers they cite, so the architecture itself is the concrete addition. The experiments run across 60 setups in MPE, MA-MuJoCo, and SMAC, and the three coordination probes are a useful way to check that gains are not just from extra per-agent capacity. Single-pass inference works on every domain they test, which is the practical payoff they advertise. The finite-difference consistency surrogate with stop-gradient passes is the clearest weak point. It is only first-order accurate, and any curvature or truncation error could shift the learned joint distribution in ways that affect the downstream coordination metrics. The abstract gives no error bars or statistical tests, so it is hard to judge how stable the reported edges are. If the full paper shows the surrogate matches the true Jacobian on held-out checks, that concern shrinks; otherwise the central claim rests partly on an unverified approximation. This paper is aimed at researchers who already work on generative offline MARL and want fewer sampling steps without losing joint behavior. It is worth sending to peer review because the empirical breadth is decent and the joint-coupling idea is worth testing, even with the surrogate caveat.

Referee Report

2 major / 2 minor

Summary. The paper introduces CoFlow, a flow-based generative model for offline multi-agent reinforcement learning that achieves single-pass coordinated generation by combining Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating. It replaces the memory-intensive Jacobian-vector product in the consistency objective with a finite-difference surrogate using two stop-gradient forward passes. Across 60 configurations in MPE, MA-MuJoCo, and SMAC, CoFlow matches or surpasses Gaussian policies, value-based methods, transformer policies, diffusion models, and prior flow baselines on episodic return; three coordination probes are used to attribute gains to inter-agent coordination rather than per-agent capacity, with single-pass inference shown to suffice.

Significance. If the empirical claims hold under proper statistical controls, the result would establish that the efficiency-coordination trade-off in few-step generative MARL is not fundamental when the velocity field is natively joint-coupled, providing a concrete architectural path to scalable coordinated offline decision-making. The broad experimental coverage across environments and execution modes is a positive feature.

major comments (2)

[§3.2] §3.2 (Consistency surrogate): The finite-difference approximation replaces the true Jacobian-vector product with two stop-gradient forward passes and is stated to be first-order accurate. Because the central claim attributes coordination preservation specifically to native joint-coupling via CVA rather than to any artifact of the surrogate, the manuscript must show that curvature in the averaged velocity field or stop-gradient truncation does not inject measurable bias into the learned joint distribution; downstream coordination probes are sensitive to small inter-agent correlations, so even modest bias could produce the reported gains without validating the architectural mechanism.
[§4] §4 (Experimental results): The abstract states that CoFlow matches or surpasses baselines across 60 configurations, yet no error bars, standard deviations, number of seeds, or statistical significance tests are referenced in the provided summary. Without these, the claim that improvements arise from coordination (rather than variance or implementation differences) remains only partially supported; exact baseline re-implementations and hyperparameter matching should be documented.

minor comments (2)

[§3] Notation for the averaged velocity field and the stop-gradient operator should be introduced with explicit equations before the surrogate is defined, to avoid ambiguity when readers compare the approximation to standard consistency training.
[§4] The three coordination probes are described only at a high level; a short appendix table listing the exact metrics, how they are computed, and their sensitivity to joint versus marginal distributions would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the validation of the consistency surrogate and the statistical rigor of the experiments.

read point-by-point responses

Referee: [§3.2] §3.2 (Consistency surrogate): The finite-difference approximation replaces the true Jacobian-vector product with two stop-gradient forward passes and is stated to be first-order accurate. Because the central claim attributes coordination preservation specifically to native joint-coupling via CVA rather than to any artifact of the surrogate, the manuscript must show that curvature in the averaged velocity field or stop-gradient truncation does not inject measurable bias into the learned joint distribution; downstream coordination probes are sensitive to small inter-agent correlations, so even modest bias could produce the reported gains without validating the architectural mechanism.

Authors: We agree that demonstrating the surrogate introduces no measurable bias is necessary to attribute gains to CVA. In the revision we will add a dedicated analysis: on a subset of training runs we compute the exact JVP via autograd (where memory permits) and report the relative L2 error of the finite-difference surrogate, which remains below 0.3% across all layers. We will also include an ablation on smaller environments (MPE) training with exact JVP versus the surrogate and show that coordination-probe scores differ by less than one standard deviation, confirming that the reported improvements originate from the joint-coupled architecture rather than the approximation. revision: yes
Referee: [§4] §4 (Experimental results): The abstract states that CoFlow matches or surpasses baselines across 60 configurations, yet no error bars, standard deviations, number of seeds, or statistical significance tests are referenced in the provided summary. Without these, the claim that improvements arise from coordination (rather than variance or implementation differences) remains only partially supported; exact baseline re-implementations and hyperparameter matching should be documented.

Authors: We acknowledge the missing statistical details. The revised manuscript will report all results as mean ± standard deviation over five independent random seeds, include error bars on every plot, and add paired t-tests (p < 0.05) for the primary comparisons. A new appendix will document the exact re-implementations of all baselines, including the hyperparameter grids searched, final selected values, and links to the original code repositories used for fair comparison. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper presents CoFlow as a new architecture combining Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating, trained via a finite-difference consistency surrogate that approximates Jacobian-vector products with stop-gradient passes. The central claim—that natively joint-coupled velocity fields enable single-pass coordinated generation—is supported by empirical results across 60 configurations and three coordination probes, not by any derivation that reduces to fitted inputs by construction or self-citation chains. No equations or steps exhibit self-definitional loops, renamed known results, or load-bearing uniqueness imported from prior author work; the method is self-contained through architectural design and external benchmark comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; full manuscript would be required to audit them.

pith-pipeline@v0.9.0 · 5552 in / 1083 out tokens · 57358 ms · 2026-05-14T21:25:01.079699+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
A finite-difference consistency surrogate further replaces memory-prohibitive Jacobian-vector product backpropagation through the averaged velocity field with two stop-gradient forward passes.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
ui_θ = ui_ind + ui_coord ... Coordinated Velocity Attention (CVA) ... Adaptive Coordination Gating

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 5 internal anchors

[1]

Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein

Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research , 27(4):819--840, 2002

work page 2002
[2]

Offline reinforcement learning via high-fidelity generative behavior modeling

Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, and Jun Zhu. Offline reinforcement learning via high-fidelity generative behavior modeling. In International Conference on Learning Representations , 2023

work page 2023
[3]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems , 2023. 11

work page 2023
[4]

Flow matching in latent space

Quan Dao, Hao Phung, Binh Nguyen, and Anh Tran. Flow matching in latent space. In International Conference on Learning Representations , 2024

work page 2024
[5]

OM2P: Offline multi-agent mean-flow policy, 2025

Xiaolong Fan, Haozheng Li, Yongming Li, and Wei Zhang. OM2P: Offline multi-agent mean-flow policy, 2025. URL https://arxiv.org/abs/2508.06269

work page arXiv 2025
[6]

Off-the-grid MARL: Datasets and baselines for offline multi-agent reinforcement learning

Claude Formanek, Asad Jeewa, Jonathan Shock, and Arnu Pretorius. Off-the-grid MARL: Datasets and baselines for offline multi-agent reinforcement learning. In AAMAS, 2023

work page 2023
[7]

Tilbury, Louise Beyers, Jonathan Shock, and Arnu Pretorius

Claude Formanek, Callum R. Tilbury, Louise Beyers, Jonathan Shock, and Arnu Pretorius. Dispelling the mirage of progress in offline MARL through standardised baselines and evaluation. In NeurIPS Datasets and Benchmarks Track , 2024

work page 2024
[8]

One step diffusion via shortcut models

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. In International Conference on Learning Representations , 2025

work page 2025
[9]

A minimalist approach to offline reinforcement learning

Scott Fujimoto and Shixiang Shane Gu. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems , volume 34, pages 20132--20145, 2021

work page 2021
[10]

Meanflow: One-step flow matching through mean velocity, 2025

Shanghua Gao, Junting Yan, Jose Lezama, Hao Fei, Yong Ge, Kihyuk Sohn, Irfan Essa Yoon, Jianming Lu, and Liangliang Li. Meanflow: One-step flow matching through mean velocity, 2025. URL https://arxiv.org/ abs/2504.13712

work page arXiv 2025
[11]

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching. In Advances in Neural Information Processing Systems , 2024

work page 2024
[12]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling, 2025. URL https://arxiv.org/abs/2505.13447

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models, 2025. URL https://arxiv.org/abs/2512.02012

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

SplitMeanFlow: Interval splitting consistency in few-step generative modeling, 2025

Yi Guo, Wei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, and Yuxuan Wang. SplitMeanFlow: Interval splitting consistency in few-step generative modeling, 2025. URL https://arxiv.org/abs/2507.16884

work page arXiv 2025
[15]

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, and Sergey Levine. IDQL: Implicit Q-learning as an actor-critic method with diffusion policies, 2023. URL https://arxiv.org/abs/ 2304.10573

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems , volume 33, pages 6840--6851, 2020

work page 2020
[17]

Planning with diffusion for flexible behavior synthesis

Michael Janner, Yilun Du, Joshua B Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning , pages 9902--9915, 2022

work page 2022
[18]

Offline reinforcement learning with implicit Q-learning

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit Q-learning. In International Conference on Learning Representations , 2022

work page 2022
[19]

Multi-agent reinforcement learning as a rehearsal for decentralized planning

Landon Kraemer and Bikramjit Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82--94, 2016

work page 2016
[20]

Conservative q-learning for offline reinforcement learning

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems , volume 33, pages 1179--1191, 2020

work page 2020
[21]

Multi-agent decision transformer, 2022

Andrei Kurenkov, Ajay Mandlekar, Roberto Martín-Martín, Silvio Savarese, and Animesh Garg. Multi-agent decision transformer, 2022. URL https://arxiv.org/abs/2203.13691

work page arXiv 2022
[22]

Multi-agent coordination via flow matching

Dongsu Lee, Daehee Lee, and Amy Zhang. Multi-agent coordination via flow matching. In International Con- ference on Learning Representations , 2026

work page 2026
[23]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020. URL https://arxiv.org/abs/2005.01643

work page internal anchor Pith review Pith/arXiv arXiv 2020
[24]

DoF: A diffusion factorization framework for offline multi-agent reinforcement learning

Haozheng Li, Xiaolong Fan, and Yongming Li. DoF: A diffusion factorization framework for offline multi-agent reinforcement learning. In International Conference on Learning Representations , 2025. 12

work page 2025
[25]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, and Maximilian Nickel. Flow matching for generative modeling. In International Conference on Learning Representations , 2023

work page 2023
[26]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2023. URL https://arxiv.org/abs/2209.03003

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

InSPO: Offline multi- agent reinforcement learning via in-sample sequential policy optimization

Zongkai Liu, Qian Lin, Chao Yu, Xiawei Wu, Yile Liang, Donghui Li, and Xuetao Ding. InSPO: Offline multi- agent reinforcement learning via in-sample sequential policy optimization. In AAAI Conference on Artificial Intelligence, 2025

work page 2025
[28]

Multi-agent actor-critic for mixed cooperative-competitive environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017
[29]

Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning

Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, and Jun Zhu. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning. In International Conference on Machine Learning, 2024

work page 2024
[30]

What makes a good diffusion planner for decision making? In International Conference on Learning Representations , 2025

Haofei Lu, Dongqi Han, Yifei Shen, and Dongsheng Li. What makes a good diffusion planner for decision making? In International Conference on Learning Representations , 2025. Spotlight

work page 2025
[31]

Flow matching policy gradients

David McAllister, Songwei Ge, Brent Yi, Chung Min Kim, Ethan Weber, Hongsuk Choi, Haiwen Feng, and Angjoo Kanazawa. Flow matching policy gradients. In International Conference on Learning Representations , 2026

work page 2026
[32]

Emergence of grounded compositional language in multi-agent populations

Igor Mordatch and Pieter Abbeel. Emergence of grounded compositional language in multi-agent populations. In AAAI Conference on Artificial Intelligence , volume 32, 2018

work page 2018
[33]

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification

Ling Pan, Longbo Huang, Tengyu Ma, and Huazhe Xu. Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning , pages 17221- -17237, 2022

work page 2022
[34]

Flow Q-learning

Seohong Park, Qiyang Li, and Sergey Levine. Flow Q-learning. In International Conference on Machine Learning , 2025

work page 2025
[35]

FACMAC: Factored multi-agent centralised policy gradients

Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer, and Shimon Whiteson. FACMAC: Factored multi-agent centralised policy gradients. In Advances in Neural Information Processing Systems , volume 34, pages 12208--12221, 2021

work page 2021
[36]

Pomerleau

Dean A. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems , volume 1, 1988

work page 1988
[37]

OMSD: Offline multi-agent reinforcement learning via score decomposition

Dan Qiao, Wenhao Li, Shanchao Yang, Hongyuan Zha, and Baoxiang Wang. OMSD: Offline multi-agent reinforcement learning via score decomposition. In International Conference on Learning Representations , 2025

work page 2025
[38]

Ren, Justin Lidard, Lars L

Allen Z. Ren, Justin Lidard, Lars L. Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. In International Conference on Learning Representations , 2025

work page 2025
[39]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 10684--10695, 2022

work page 2022
[40]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations , 2022

work page 2022
[41]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Tim Rud- ner, Chia-Man Hung, Philip H. S. Torr, Sainbayar Sukhbaatar, and Shimon Whiteson. The StarCraft multi-agent challenge, 2019. URL https://arxiv.org/abs/1902.04043

work page arXiv 2019
[42]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Confer- ence on Learning Representations , 2021. 13

work page 2021
[43]

Score- based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

work page 2021
[44]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning , pages 32211--32252, 2023

work page 2023
[45]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems , 2017

work page 2017
[46]

Offline multi-agent reinforcement learning with implicit global-to-local value regularization

Xiangsen Wang, Haoran Xu, Yinan Zheng, and Xianyuan Zhan. Offline multi-agent reinforcement learning with implicit global-to-local value regularization. In Advances in Neural Information Processing Systems , volume 36, 2023

work page 2023
[47]

Diffusion policies as an expressive policy class for offline reinforcement learning

Zhendong Wang, Jonathan J Hunt, and Mingyuan Zhou. Diffusion policies as an expressive policy class for offline reinforcement learning. In International Conference on Learning Representations , 2023

work page 2023
[48]

Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning

Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. In Advances in Neural Information Processing Systems , volume 34, pages 10299--10312, 2021

work page 2021
[49]

MADiTS: Efficient multi-agent offline coordination via diffusion-based trajectory stitching

Lei Yuan, Yuqi Bian, Lihe Li, Ziqian Zhang, Cong Guan, and Yang Yu. MADiTS: Efficient multi-agent offline coordination via diffusion-based trajectory stitching. In International Conference on Learning Representations , 2025

work page 2025
[50]

Lee, Daniel R

Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, and Yonathan Efroni. Exploiting structure in offline multi-agent RL: The benefits of low interaction rank. In International Conference on Learning Representations, 2025

work page 2025
[51]

Energy-weighted flow matching for offline reinforcement learning

Shiyuan Zhang, Weitong Zhang, and Quanquan Gu. Energy-weighted flow matching for offline reinforcement learning. In International Conference on Learning Representations , 2025

work page 2025
[52]

ReinFlow: Fine-tuning flow matching policy with online reinforcement learning

Tonghe Zhang, Chao Yu, Sichang Su, and Yu Wang. ReinFlow: Fine-tuning flow matching policy with online reinforcement learning. In Advances in Neural Information Processing Systems , 2025

work page 2025
[53]

MADiff: Offline multi-agent learning with diffusion models

Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, and Weinan Zhang. MADiff: Offline multi-agent learning with diffusion models. In Advances in Neural Information Process- ing Systems , volume 37, pages 4177--4206, 2024

work page 2024
[54]

DM1: MeanFlow with dispersive regularization for 1-step robotic manipulation, 2025

Guowei Zou, Haitao Wang, Hejun Wu, Yukun Qian, Yuhang Wang, and Weibing Li. DM1: MeanFlow with dispersive regularization for 1-step robotic manipulation, 2025. URL https://arxiv.org/abs/2510.07865

work page arXiv 2025
[55]

identity coupling

Guowei Zou, Haitao Wang, Hejun Wu, Yukun Qian, Yuhang Wang, and Weibing Li. One step is enough: Dispersive MeanFlow policy optimization, 2026. URL https://arxiv.org/abs/2601.20701. 14 A Experimental Details and Results This appendix is organized from experimental protocol to supplementary evidence. We first describe the model setup and benchmark datasets,...

work page arXiv 2026