Your GFlowNet Secretly Learns an Optimal Transport Plan

Denis Belomestny; Ian Maksimov; Nikita Morozov; Sergey Samsonov

arxiv: 2606.06272 · v1 · pith:F27RRWIRnew · submitted 2026-06-04 · 💻 cs.LG · cs.AI

Your GFlowNet Secretly Learns an Optimal Transport Plan

Ian Maksimov , Nikita Morozov , Denis Belomestny , Sergey Samsonov This is my paper

Pith reviewed 2026-06-28 02:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords GFlowNetoptimal transportKantorovich problemminimum flowshortest path costsgenerative modelstransport plansgraph flows

0 comments

The pith

Fixing the initial flow distribution in a minimum-flow GFlowNet reduces its objective to a Kantorovich optimal transport problem whose solution is recovered by sampling the trained policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that non-acyclic GFlowNets trained under the minimum-flow objective with a fixed initial distribution solve a Kantorovich optimal transport problem whose costs are the shortest-path distances induced by the underlying graph. This equivalence means the converged GFlowNet policy directly encodes an optimal transport plan, and trajectories sampled from it recover the corresponding optimal coupling between source and target distributions. A sympathetic reader would care because the result supplies a new way to compute transport plans on large graphs by training neural-parameterized flows rather than solving the transport problem directly.

Core claim

Fixing the initial flow distribution in a minimum-flow GFlowNet reduces its objective to a Kantorovich OT problem with graph-induced shortest path costs. At the optimum, the learned GFlowNet policy therefore encodes an optimal transport plan from the source distribution to the target distribution: sampling trajectories from the minimum-flow GFlowNet recovers the corresponding optimal coupling.

What carries the argument

The reduction of the minimum-flow GFlowNet objective (with fixed initial distribution) to the Kantorovich formulation of optimal transport, where edge costs are shortest-path distances on the graph.

If this is right

GFlowNet training can be used to solve optimal transport problems on large graphs by learning edge flows with neural networks.
The optimal transport plan is recovered simply by sampling trajectories from the trained GFlowNet.
The equivalence holds only for the minimum-flow objective and fixed initial distribution on non-acyclic graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standard OT solvers could be used to warm-start or regularize GFlowNet training on moderate-sized graphs.
The link suggests that other flow-based generative models might admit similar reductions to transport problems under appropriate objectives.
GFlowNets could scale optimal transport computations to graphs that are too large for exact linear programming solvers.

Load-bearing premise

The GFlowNet must be non-acyclic and trained under the minimum-flow objective with a fixed initial flow distribution.

What would settle it

Compute the exact optimal coupling via a standard OT solver on the same graph using shortest-path costs, then compare it to the empirical distribution of trajectories sampled from a converged minimum-flow GFlowNet; any systematic mismatch falsifies the claimed equivalence.

Figures

Figures reproduced from arXiv: 2606.06272 by Denis Belomestny, Ian Maksimov, Nikita Morozov, Sergey Samsonov.

**Figure 1.** Figure 1: Visualization of the solution to the GFlowNet LP problem and Kantorovich OT optimal plan on a hypergrid environment. OT induces optimal coupling that connects source and target states and how much mass is transfered between them. GFlowNet on the other hand builds a policy that samples target states starting from source states, thus inducing a coupling with respect to the shortest path metric. Both formulat… view at source ↗

**Figure 2.** Figure 2: Visualization of the distributions used in the experiments: moon-shaped (left), corner-shaped (center), and ball-shaped (right). Where: z = s M − 1 , c = 1 2 , . . . , 1 2 , e1 = (1, 0, . . . , 0), b = c + rout 2 e1. and rin, rout are inner and outer balls radius. The ball-shaped distribution defined as follows: L(s) ≜ I [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

read the original abstract

Generative Flow Networks (GFlowNets) are a framework for sampling structured objects via stochastic trajectories in a directed graph. In this work, we establish a theoretical connection between non-acyclic GFlowNets and optimal transport (OT). We show that fixing the initial flow distribution in a minimum-flow GFlowNet reduces its objective to a Kantorovich OT problem with graph-induced shortest path costs. At the optimum, the learned GFlowNet policy therefore encodes an optimal transport plan from the source distribution to the target distribution: we show that sampling trajectories from the minimum-flow GFlowNet recovers the corresponding optimal coupling. Our formulation enables applying the GFlowNet learning framework to OT problems on large graphs via edge flows and neural parameterization. Experiments confirm agreement with exact OT solvers and demonstrate that GFlowNets can learn high-quality transport plans.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces minimum-flow GFlowNets with fixed initial flow on non-acyclic graphs to Kantorovich OT with shortest-path costs, so the policy encodes the optimal coupling.

read the letter

The central result is that fixing the initial flow in a minimum-flow GFlowNet on a non-acyclic graph makes its objective identical to a Kantorovich optimal transport problem whose ground cost is the shortest-path distance induced by the graph. At the optimum the edge flows define a coupling whose marginals match the source and target, and sampling trajectories recovers that coupling. This is a clean reduction from flow conservation plus the definition of the objective, not a loose analogy.

The paper does two things well. It gives a practical route to OT on large graphs by letting the GFlowNet machinery (edge flows, neural parameterization) stand in for the transport plan. The experiments show agreement with exact solvers, which is the right first check for a claimed equivalence.

The restrictions are real and stated up front: the graph must be non-acyclic and the training must use the minimum-flow objective with fixed initial distribution. Change either and the reduction does not hold. Without the full proofs it is hard to judge whether every step in the derivation is gap-free, though the stress-test note indicates the argument follows directly from the definitions. The experiments are confirmatory rather than exhaustive; they do not yet show behavior on very large or noisy graphs.

This is for people already working on GFlowNets or on OT problems that live on graphs. A reader who wants a new parameterization tool for transport plans will find the equivalence useful. The claim is specific enough and the experimental check is positive enough that the paper deserves a serious referee rather than a desk reject.

Referee Report

0 major / 3 minor

Summary. The paper claims that fixing the initial flow distribution in a minimum-flow GFlowNet on a non-acyclic directed graph reduces the GFlowNet objective exactly to a Kantorovich optimal transport problem whose ground cost is the shortest-path distance induced by the graph. Consequently the learned edge flows define an optimal coupling whose marginals match the prescribed source and target distributions, and sampling trajectories from the trained policy recovers that coupling. The formulation is then used to solve OT instances on large graphs via neural parameterization of the flows, with experiments reported to match exact solvers.

Significance. If the reduction is correct, the result supplies a new, scalable route to OT on graphs by importing the GFlowNet training machinery and neural function approximation. Credit is due for the explicit reduction to the standard Kantorovich formulation (no free parameters introduced) and for the experimental verification against exact OT solvers.

minor comments (3)

[§3.2] §3.2, Eq. (8): the statement that the minimum-flow objective is 'identical' to the OT objective should be accompanied by an explicit side-by-side display of the two loss expressions to make the algebraic equivalence immediate.
[Experiments] The experimental section reports agreement with exact solvers but does not state the precise graph sizes or the number of independent runs; adding these details would strengthen reproducibility.
Notation: the symbol P is used both for the GFlowNet policy and for the OT coupling; a brief remark distinguishing the two usages would avoid confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of our manuscript and for the positive assessment of its significance. We appreciate the recommendation for minor revision. As the report lists no specific major comments, we have no points requiring point-by-point response or manuscript changes.

Circularity Check

0 steps flagged

No significant circularity; derivation reduces to external OT formulation

full rationale

The central claim is a direct mathematical reduction: fixing the initial flow in the minimum-flow GFlowNet objective yields the Kantorovich OT problem with shortest-path costs induced by the graph. This follows from flow conservation and the objective definition without any self-referential fitting, self-citation load-bearing, or ansatz smuggling. The result is externally falsifiable via exact OT solvers, and experiments confirm agreement. No step in the provided derivation chain reduces by construction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the work relies on standard mathematical formulations of optimal transport and GFlowNets without introducing new free parameters or invented entities.

axioms (1)

standard math Kantorovich formulation of optimal transport
The GFlowNet objective is shown to reduce to this standard problem.

pith-pipeline@v0.9.1-grok · 5678 in / 1091 out tokens · 60595 ms · 2026-06-28T02:32:13.835068+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

157 extracted references · 25 canonical work pages · 4 internal anchors

[1]

Optimizing Backward Policies in

Gritsaev, Timofei and Morozov, Nikita and Samsonov, Sergey and Tiapkin, Daniil , booktitle=. Optimizing Backward Policies in
[2]

arXiv preprint arXiv:2603.01786 , year=

Learning Shortest Paths with Generative Flow Networks , author=. arXiv preprint arXiv:2603.01786 , year=

work page arXiv
[3]

International Journal of Computer Vision , volume=

The Earth Mover's Distance as a Metric for Image Retrieval , author=. International Journal of Computer Vision , volume=
[4]

Proceedings of the 32nd International Conference on Machine Learning , pages=

From Word Embeddings to Document Distances , author=. Proceedings of the 32nd International Conference on Machine Learning , pages=
[5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Optimal Transport for Domain Adaptation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
[6]

Proceedings of the 34th International Conference on Machine Learning , pages=

Wasserstein Generative Adversarial Networks , author=. Proceedings of the 34th International Conference on Machine Learning , pages=
[7]

Advances in Neural Information Processing Systems , year=

Sinkhorn Distances: Lightspeed Computation of Optimal Transport , author=. Advances in Neural Information Processing Systems , year=
[8]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and. Nature Methods , year =
[9]

2008 , publisher=

Optimal Transport: Old and New , author=. 2008 , publisher=

2008
[10]

Econometrica , volume =

Martin Beckmann , title =. Econometrica , volume =
[11]

, title =

Dantzig, George B. , title =. Activity Analysis of Production and Allocation , editor =
[12]

SIAM Journal on Scientific Computing , volume=

Quadratically Regularized Optimal Transport on Graphs , author=. SIAM Journal on Scientific Computing , volume=. 2018 , publisher=. doi:10.1137/17M1132665 , url=

work page doi:10.1137/17m1132665 2018
[13]

arXiv preprint arXiv:2509.20408 , year=

A Theory of Multi-Agent Generative Flow Networks , author=. arXiv preprint arXiv:2509.20408 , year=

work page arXiv
[14]

International Conference on Machine Learning , pages=

Random Policy Evaluation Uncovers Policies of Generative Flow Networks , author=. International Conference on Machine Learning , pages=. 2025 , organization=

2025
[15]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Gdpo: Learning to directly align language models with diversity using gflownets , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[16]

arXiv preprint arXiv:2509.15207 , year=

Flowrl: Matching reward distributions for llm reasoning , author=. arXiv preprint arXiv:2509.15207 , year=

work page arXiv
[17]

SIAM Journal on Mathematical Analysis , volume=

Barycenters in the Wasserstein space , author=. SIAM Journal on Mathematical Analysis , volume=. 2011 , publisher=

2011
[18]

Econometrica: Journal of the Econometric Society , pages=

A continuous model of transportation , author=. Econometrica: Journal of the Econometric Society , pages=. 1952 , publisher=

1952
[19]

International Conference on Learning Representations , year=

Wasserstein Auto-Encoders , author=. International Conference on Learning Representations , year=
[20]

Journal of Machine Learning Research , volume=

POT: Python Optimal Transport , author=. Journal of Machine Learning Research , volume=
[21]

Revisiting Non-Acyclic

Morozov, Nikita and Maksimov, Ian and Tiapkin, Daniil and Samsonov, Sergey , booktitle =. Revisiting Non-Acyclic. 2025 , volume =

2025
[22]

IEEE transactions on Systems Science and Cybernetics , volume=

A formal basis for the heuristic determination of minimum cost paths , author=. IEEE transactions on Systems Science and Cybernetics , volume=. 1968 , publisher=

1968
[23]

and Moulines, E

Douc, R. and Moulines, E. and Priouret, P. and Soulier, P. , TITLE =. 2018 , PAGES =

2018
[24]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
[25]

2017 , publisher=

Markov chains and mixing times , author=. 2017 , publisher=

2017
[26]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
[27]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016
[28]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000
[29]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980
[30]

M. J. Kearns , title =
[31]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983
[32]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000
[33]

Suppressed for Anonymity , author=
[34]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981
[35]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959
[36]

Advances in Neural Information Processing Systems , volume=

Flow network based generative models for non-iterative diverse candidate generation , author=. Advances in Neural Information Processing Systems , volume=
[37]

2024 , eprint=

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets , author=. 2024 , eprint=

2024
[38]

Journal of Physics: Conference Series , volume=

HPC resources of the higher school of economics , author=. Journal of Physics: Conference Series , volume=. 2021 , organization=

2021
[39]

Advances in Neural Information Processing Systems , volume=

Maximum entropy monte-carlo planning , author=. Advances in Neural Information Processing Systems , volume=
[40]

International Conference on Machine Learning , pages=

Learning GFlowNets from partial episodes for improved convergence and stability , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[41]

Advances in Neural Information Processing Systems , volume=

Trajectory balance: Improved credit assignment in GFlowNets , author=. Advances in Neural Information Processing Systems , volume=
[42]

International Conference on Artificial Intelligence and Statistics , pages=

Generative flow networks as entropy-regularized rl , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024
[43]

International conference on machine learning , pages=

Reinforcement learning with deep energy-based policies , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[44]

Nature , volume=

Mastering atari, go, chess and shogi by planning with a learned model , author=. Nature , volume=. 2020 , publisher=

2020
[45]

Journal of Machine Learning Research , volume=

Gflownet foundations , author=. Journal of Machine Learning Research , volume=
[46]

1974 , publisher=

Advanced Combinatorics: The Art of Finite and Infinite Expansions , author=. 1974 , publisher=

1974
[47]

The Twelfth International Conference on Learning Representations , year=

Amortizing intractable inference in large language models , author=. The Twelfth International Conference on Learning Representations , year=
[48]

The Twelfth International Conference on Learning Representations , year=

Order-Preserving GFlowNets , author=. The Twelfth International Conference on Learning Representations , year=
[49]

International Conference on Machine Learning , pages=

Biological sequence design with gflownets , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[50]

Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets , volume =

Zhang, Dinghuai and Dai, Hanjun and Malkin, Nikolay and Courville, Aaron C and Bengio, Yoshua and Pan, Ling , booktitle =. Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets , volume =
[51]

and Rupp, Matthias and von Lilienfeld, O

Ramakrishnan, Raghunathan and Dral, Pavlo O. and Rupp, Matthias and von Lilienfeld, O. Anatole , journal =. Quantum chemistry structures and properties of 134 kilo molecules , volume =
[52]

2020 , eprint=

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures , author=. 2020 , eprint=

2020
[53]

International Conference on Machine Learning , pages=

Multi-objective gflownets , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[54]

Advances in Neural Information Processing Systems , volume=

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets , author=. Advances in Neural Information Processing Systems , volume=
[55]

International Conference on Artificial Intelligence and Statistics , pages=

Maximum entropy GFlowNets with soft Q-learning , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024
[56]

nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. nature , volume=. 2016 , publisher=

2016
[57]

International conference on computers and games , pages=

Efficient selectivity and backup operators in Monte-Carlo tree search , author=. International conference on computers and games , pages=. 2006 , organization=

2006
[58]

Science , volume =

David Silver and Thomas Hubert and Julian Schrittwieser and Ioannis Antonoglou and Matthew Lai and Arthur Guez and Marc Lanctot and Laurent Sifre and Dharshan Kumaran and Thore Graepel and Timothy Lillicrap and Karen Simonyan and Demis Hassabis , title =. Science , volume =. 2018 , doi =

2018
[59]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=
[60]

Digital Discovery , volume=

Gflownets for ai-driven scientific discovery , author=. Digital Discovery , volume=. 2023 , publisher=

2023
[61]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[62]

A unified view of entropy-regularized Markov decision processes

A unified view of entropy-regularized markov decision processes , author=. arXiv preprint arXiv:1705.07798 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[63]

International Conference on Machine Learning , pages=

A theory of regularized markov decision processes , author=. International Conference on Machine Learning , pages=. 2019 , organization=

2019
[64]

Equivalence Between Policy Gradients and Soft Q-Learning

Equivalence between policy gradients and soft q-learning , author=. arXiv preprint arXiv:1704.06440 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[65]

European conference on machine learning , pages=

Bandit based monte-carlo planning , author=. European conference on machine learning , pages=. 2006 , organization=

2006
[66]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

2015
[67]

Prioritized Experience Replay , booktitle =

Tom Schaul and John Quan and Ioannis Antonoglou and David Silver , editor =. Prioritized Experience Replay , booktitle =. 2016 , url =

2016
[68]

International Conference on Machine Learning , pages=

Better training of gflownets with local credit and incomplete trajectories , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[69]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Ergodic Generative Flows , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , volume =

2025
[70]

2019 , publisher=

Computational optimal transport: With applications to data science , author=. 2019 , publisher=

2019
[71]

2021 , publisher=

Topics in optimal transportation , author=. 2021 , publisher=

2021
[72]

2009 , publisher=

Optimal transport: old and new , author=. 2009 , publisher=

2009
[73]

The Thirteenth International Conference on Learning Representations , year=

Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks , author=. The Thirteenth International Conference on Learning Representations , year=
[74]

Advances in Neural Information Processing Systems , volume=

On divergence measures for training gflownets , author=. Advances in Neural Information Processing Systems , volume=
[75]

International Conference on Machine Learning , pages=

Generative flow networks for discrete probabilistic modeling , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[76]

arXiv preprint arXiv:2305.14594 , year=

torchgfn: A PyTorch GFlowNet library , author=. arXiv preprint arXiv:2305.14594 , year=

work page arXiv
[77]

Taming the Noise in Reinforcement Learning via Soft Updates , booktitle =

Roy Fox and Ari Pakman and Naftali Tishby , editor =. Taming the Noise in Reinforcement Learning via Soft Updates , booktitle =. 2016 , url =

2016
[78]

Proceedings of the 35th International Conference on Machine Learning , pages =

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

2018
[79]

Advances in Neural Information Processing Systems , volume=

Munchausen reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=
[80]

International Conference on Learning Representations , year=

Mirror Descent Policy Optimization , author=. International Conference on Learning Representations , year=

Showing first 80 references.

[1] [1]

Optimizing Backward Policies in

Gritsaev, Timofei and Morozov, Nikita and Samsonov, Sergey and Tiapkin, Daniil , booktitle=. Optimizing Backward Policies in

[2] [2]

arXiv preprint arXiv:2603.01786 , year=

Learning Shortest Paths with Generative Flow Networks , author=. arXiv preprint arXiv:2603.01786 , year=

work page arXiv

[3] [3]

International Journal of Computer Vision , volume=

The Earth Mover's Distance as a Metric for Image Retrieval , author=. International Journal of Computer Vision , volume=

[4] [4]

Proceedings of the 32nd International Conference on Machine Learning , pages=

From Word Embeddings to Document Distances , author=. Proceedings of the 32nd International Conference on Machine Learning , pages=

[5] [5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Optimal Transport for Domain Adaptation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

[6] [6]

Proceedings of the 34th International Conference on Machine Learning , pages=

Wasserstein Generative Adversarial Networks , author=. Proceedings of the 34th International Conference on Machine Learning , pages=

[7] [7]

Advances in Neural Information Processing Systems , year=

Sinkhorn Distances: Lightspeed Computation of Optimal Transport , author=. Advances in Neural Information Processing Systems , year=

[8] [8]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and. Nature Methods , year =

[9] [9]

2008 , publisher=

Optimal Transport: Old and New , author=. 2008 , publisher=

2008

[10] [10]

Econometrica , volume =

Martin Beckmann , title =. Econometrica , volume =

[11] [11]

, title =

Dantzig, George B. , title =. Activity Analysis of Production and Allocation , editor =

[12] [12]

SIAM Journal on Scientific Computing , volume=

Quadratically Regularized Optimal Transport on Graphs , author=. SIAM Journal on Scientific Computing , volume=. 2018 , publisher=. doi:10.1137/17M1132665 , url=

work page doi:10.1137/17m1132665 2018

[13] [13]

arXiv preprint arXiv:2509.20408 , year=

A Theory of Multi-Agent Generative Flow Networks , author=. arXiv preprint arXiv:2509.20408 , year=

work page arXiv

[14] [14]

International Conference on Machine Learning , pages=

Random Policy Evaluation Uncovers Policies of Generative Flow Networks , author=. International Conference on Machine Learning , pages=. 2025 , organization=

2025

[15] [15]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Gdpo: Learning to directly align language models with diversity using gflownets , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024

[16] [16]

arXiv preprint arXiv:2509.15207 , year=

Flowrl: Matching reward distributions for llm reasoning , author=. arXiv preprint arXiv:2509.15207 , year=

work page arXiv

[17] [17]

SIAM Journal on Mathematical Analysis , volume=

Barycenters in the Wasserstein space , author=. SIAM Journal on Mathematical Analysis , volume=. 2011 , publisher=

2011

[18] [18]

Econometrica: Journal of the Econometric Society , pages=

A continuous model of transportation , author=. Econometrica: Journal of the Econometric Society , pages=. 1952 , publisher=

1952

[19] [19]

International Conference on Learning Representations , year=

Wasserstein Auto-Encoders , author=. International Conference on Learning Representations , year=

[20] [20]

Journal of Machine Learning Research , volume=

POT: Python Optimal Transport , author=. Journal of Machine Learning Research , volume=

[21] [21]

Revisiting Non-Acyclic

Morozov, Nikita and Maksimov, Ian and Tiapkin, Daniil and Samsonov, Sergey , booktitle =. Revisiting Non-Acyclic. 2025 , volume =

2025

[22] [22]

IEEE transactions on Systems Science and Cybernetics , volume=

A formal basis for the heuristic determination of minimum cost paths , author=. IEEE transactions on Systems Science and Cybernetics , volume=. 1968 , publisher=

1968

[23] [23]

and Moulines, E

Douc, R. and Moulines, E. and Priouret, P. and Soulier, P. , TITLE =. 2018 , PAGES =

2018

[24] [24]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

[25] [25]

2017 , publisher=

Markov chains and mixing times , author=. 2017 , publisher=

2017

[26] [26]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

[27] [27]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016

[28] [28]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000

[29] [29]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980

[30] [30]

M. J. Kearns , title =

[31] [31]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983

[32] [32]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000

[33] [33]

Suppressed for Anonymity , author=

[34] [34]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981

[35] [35]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959

[36] [36]

Advances in Neural Information Processing Systems , volume=

Flow network based generative models for non-iterative diverse candidate generation , author=. Advances in Neural Information Processing Systems , volume=

[37] [37]

2024 , eprint=

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets , author=. 2024 , eprint=

2024

[38] [38]

Journal of Physics: Conference Series , volume=

HPC resources of the higher school of economics , author=. Journal of Physics: Conference Series , volume=. 2021 , organization=

2021

[39] [39]

Advances in Neural Information Processing Systems , volume=

Maximum entropy monte-carlo planning , author=. Advances in Neural Information Processing Systems , volume=

[40] [40]

International Conference on Machine Learning , pages=

Learning GFlowNets from partial episodes for improved convergence and stability , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[41] [41]

Advances in Neural Information Processing Systems , volume=

Trajectory balance: Improved credit assignment in GFlowNets , author=. Advances in Neural Information Processing Systems , volume=

[42] [42]

International Conference on Artificial Intelligence and Statistics , pages=

Generative flow networks as entropy-regularized rl , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024

[43] [43]

International conference on machine learning , pages=

Reinforcement learning with deep energy-based policies , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[44] [44]

Nature , volume=

Mastering atari, go, chess and shogi by planning with a learned model , author=. Nature , volume=. 2020 , publisher=

2020

[45] [45]

Journal of Machine Learning Research , volume=

Gflownet foundations , author=. Journal of Machine Learning Research , volume=

[46] [46]

1974 , publisher=

Advanced Combinatorics: The Art of Finite and Infinite Expansions , author=. 1974 , publisher=

1974

[47] [47]

The Twelfth International Conference on Learning Representations , year=

Amortizing intractable inference in large language models , author=. The Twelfth International Conference on Learning Representations , year=

[48] [48]

The Twelfth International Conference on Learning Representations , year=

Order-Preserving GFlowNets , author=. The Twelfth International Conference on Learning Representations , year=

[49] [49]

International Conference on Machine Learning , pages=

Biological sequence design with gflownets , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022

[50] [50]

Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets , volume =

Zhang, Dinghuai and Dai, Hanjun and Malkin, Nikolay and Courville, Aaron C and Bengio, Yoshua and Pan, Ling , booktitle =. Let the Flows Tell: Solving Graph Combinatorial Problems with GFlowNets , volume =

[51] [51]

and Rupp, Matthias and von Lilienfeld, O

Ramakrishnan, Raghunathan and Dral, Pavlo O. and Rupp, Matthias and von Lilienfeld, O. Anatole , journal =. Quantum chemistry structures and properties of 134 kilo molecules , volume =

[52] [52]

2020 , eprint=

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures , author=. 2020 , eprint=

2020

[53] [53]

International Conference on Machine Learning , pages=

Multi-objective gflownets , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[54] [54]

Advances in Neural Information Processing Systems , volume=

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets , author=. Advances in Neural Information Processing Systems , volume=

[55] [55]

International Conference on Artificial Intelligence and Statistics , pages=

Maximum entropy GFlowNets with soft Q-learning , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024

[56] [56]

nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. nature , volume=. 2016 , publisher=

2016

[57] [57]

International conference on computers and games , pages=

Efficient selectivity and backup operators in Monte-Carlo tree search , author=. International conference on computers and games , pages=. 2006 , organization=

2006

[58] [58]

Science , volume =

David Silver and Thomas Hubert and Julian Schrittwieser and Ioannis Antonoglou and Matthew Lai and Arthur Guez and Marc Lanctot and Laurent Sifre and Dharshan Kumaran and Thore Graepel and Timothy Lillicrap and Karen Simonyan and Demis Hassabis , title =. Science , volume =. 2018 , doi =

2018

[59] [59]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=

[60] [60]

Digital Discovery , volume=

Gflownets for ai-driven scientific discovery , author=. Digital Discovery , volume=. 2023 , publisher=

2023

[61] [61]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

[62] [62]

A unified view of entropy-regularized Markov decision processes

A unified view of entropy-regularized markov decision processes , author=. arXiv preprint arXiv:1705.07798 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[63] [63]

International Conference on Machine Learning , pages=

A theory of regularized markov decision processes , author=. International Conference on Machine Learning , pages=. 2019 , organization=

2019

[64] [64]

Equivalence Between Policy Gradients and Soft Q-Learning

Equivalence between policy gradients and soft q-learning , author=. arXiv preprint arXiv:1704.06440 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[65] [65]

European conference on machine learning , pages=

Bandit based monte-carlo planning , author=. European conference on machine learning , pages=. 2006 , organization=

2006

[66] [66]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

2015

[67] [67]

Prioritized Experience Replay , booktitle =

Tom Schaul and John Quan and Ioannis Antonoglou and David Silver , editor =. Prioritized Experience Replay , booktitle =. 2016 , url =

2016

[68] [68]

International Conference on Machine Learning , pages=

Better training of gflownets with local credit and incomplete trajectories , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[69] [69]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Ergodic Generative Flows , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , volume =

2025

[70] [70]

2019 , publisher=

Computational optimal transport: With applications to data science , author=. 2019 , publisher=

2019

[71] [71]

2021 , publisher=

Topics in optimal transportation , author=. 2021 , publisher=

2021

[72] [72]

2009 , publisher=

Optimal transport: old and new , author=. 2009 , publisher=

2009

[73] [73]

The Thirteenth International Conference on Learning Representations , year=

Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks , author=. The Thirteenth International Conference on Learning Representations , year=

[74] [74]

Advances in Neural Information Processing Systems , volume=

On divergence measures for training gflownets , author=. Advances in Neural Information Processing Systems , volume=

[75] [75]

International Conference on Machine Learning , pages=

Generative flow networks for discrete probabilistic modeling , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022

[76] [76]

arXiv preprint arXiv:2305.14594 , year=

torchgfn: A PyTorch GFlowNet library , author=. arXiv preprint arXiv:2305.14594 , year=

work page arXiv

[77] [77]

Taming the Noise in Reinforcement Learning via Soft Updates , booktitle =

Roy Fox and Ari Pakman and Naftali Tishby , editor =. Taming the Noise in Reinforcement Learning via Soft Updates , booktitle =. 2016 , url =

2016

[78] [78]

Proceedings of the 35th International Conference on Machine Learning , pages =

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

2018

[79] [79]

Advances in Neural Information Processing Systems , volume=

Munchausen reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

[80] [80]

International Conference on Learning Representations , year=

Mirror Descent Policy Optimization , author=. International Conference on Learning Representations , year=