pith. machine review for the scientific record. sign in

arxiv: 2605.01729 · v1 · submitted 2026-05-03 · 💻 cs.LG · stat.ML

Recognition: 3 theorem links

· Lean Theorem

Stable GFlowNets with Probabilistic Guarantees

Alvaro A. Cardenas, Ananth Shreekumar, Daniel J. Fremont, Dongyan Xu, Jonathan Rosenthal, Ruoyu Song, Satish Ukkusuri, Z. Berkay Celik, Zengxiang Lei

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:16 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords GFlowNetstrajectory balance losstotal variation distancestable traininggenerative samplingloss boundsdistributional fidelityprobabilistic guarantees
0
0 comments X

The pith

Bounded trajectory balance loss in GFlowNets implies small total variation distance to the target reward distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

GFlowNets aim to generate samples matching an unnormalized reward but often train unstably with spikes and collapse. The paper first shows that even a small total variation gap between learned and target distributions can allow unbounded loss values. It then derives new bounds that work in the converse direction: if the trajectory balance loss stays below a threshold, the total variation distance must also remain small. This link supplies probabilistic guarantees of global fidelity whenever the loss is controlled. The authors turn the bounds into Stable GFlowNets, an algorithm that monitors and caps the loss during training, producing more reliable sampling in the tested settings.

Core claim

The paper establishes converse guarantees by deriving loss-to-TV bounds that certify global fidelity from bounded trajectory balance losses. It first demonstrates sensitivity of GFlowNet objectives, where small TV distance does not preclude large loss. The derived inequalities then show that controlling the trajectory balance loss directly limits the total variation distance between the learned sampling distribution and the target. Stable GFlowNets uses these results to stabilize training and empirically achieves reduced loss spikes together with higher distributional fidelity.

What carries the argument

loss-to-TV bounds: inequalities that convert an upper limit on trajectory balance loss into an upper limit on total variation distance between the GFlowNet sampling distribution and the target reward distribution.

If this is right

  • Keeping trajectory balance loss bounded during training guarantees that the generated distribution stays close to the target in total variation.
  • Training procedures can now monitor loss values to certify distributional fidelity without directly computing TV distance.
  • Stable GFlowNets reduces loss spikes and mode collapse by enforcing the loss bounds in practice.
  • The same bounds apply to any GFlowNet variant that uses the trajectory balance loss in the stated form.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bounds might extend to other GFlowNet losses if similar sensitivity analyses are performed.
  • The stabilization technique could be tested on larger or more structured reward landscapes to check scalability.
  • If the bounds are tight, they could guide the design of new loss functions that are easier to keep small while still enforcing fidelity.

Load-bearing premise

The bounds hold only for the exact mathematical form of the trajectory balance objective and GFlowNet sampling process stated in the paper.

What would settle it

A concrete counter-example in which the trajectory balance loss remains below the derived threshold yet the measured total variation distance between learned and target distributions exceeds the bound predicted by the inequalities.

Figures

Figures reproduced from arXiv: 2605.01729 by Alvaro A. Cardenas, Ananth Shreekumar, Daniel J. Fremont, Dongyan Xu, Jonathan Rosenthal, Ruoyu Song, Satish Ukkusuri, Z. Berkay Celik, Zengxiang Lei.

Figure 1
Figure 1. Figure 1: The “one more mode” learning setup on a g-ary tree of depth h. The underlying structure is a regular tree. We initialize the experiment with an already fitted model where this specific node leads to a negligible reward R(x4) = ϵ (≪ 1). To introduce the new mode, we update the reward function by promoting x4 to a high-reward state with R(x4) = 1. The objective is to learn this new mode while preserving the … view at source ↗
Figure 2
Figure 2. Figure 2: Loss Concentration and Training Stability. Solid lines (left axis) show the Max-to-Rest Loss Ratio, defined as Pmaxi LT B(τi) j̸=iLT B(τj ) , and dashed lines (right axis) show the smoothed training loss (time window = 1000). Spikes in the ratio indicate outlier-dominated updates and serve as a proxy for training instability. We note the largest spikes occur at Hypergrid H = 16; at H = 32, TB/FM rarely sam… view at source ↗
Figure 3
Figure 3. Figure 3: Derived TV bounds vs. empirical TV distances. We compare our theoretical bound with the measured distributional error. Shaded regions show the min-max range over 5 trials, each computed from a Monte Carlo estimate using 1,000 backward samples. (2025b), we use a reward exponent of 20 and define modes as the top 0.01% quantile of R(x). A diversity filtering with a Levenshtein distance threshold of 1 is enfor… view at source ↗
Figure 4
Figure 4. Figure 4: Learned patterns of DB. Each subfigure uses 100,000 samples. 5000 (Training Rounds) 10000 15000 20000 25000 30000 Hypergrid D = 4, H = 1 6 dim 3&4 dim 1&2 Hypergrid D = 4, H = 3 2 dim 3&4 dim 1&2 view at source ↗
Figure 5
Figure 5. Figure 5: Learned patterns of FM. Each subfigure uses 100,000 samples. D.4. Comparison Among Baselines Through the Lens of Proposition 3.10 We estimate the TV bound via Monte Carlo using Proposi￾tion 3.10. For the Hypergrid environment, we evaluate the models trained in RQ2 every 3, 000 rounds, reporting the mean over five random seeds. For L14-RNA1, we use the trained model in Section D.3. For L14-RNA1, we estimate… view at source ↗
Figure 6
Figure 6. Figure 6: Learned patterns of TB. Each subfigure uses 100,000 samples. 0 1000 2000 3000 0 5000 10000 15000 Number of Visited Modes # Modes 0 1000 2000 3000 140 150 160 170 180 Total Reward of Top-10k Visited States Total Mass TB Teacher Stable StableTeacher 0 1000 2000 3000 10 1 10 0 Training Loss Smoothed Training Loss 0 1000 2000 3000 1 2 Max-to-Rest Ratio Smoothed Max-to-Rest Ratio 7500 8250 9000 Training Rounds (x100) view at source ↗
Figure 7
Figure 7. Figure 7: Extended Experiment Results for L1-RNA14. 100 200 300 0 0.2 0.4 0.6 0.8 1 Est. TV Bound Hypergrid (D=4, H=8) 100 200 300 Hypergrid (D=4, H=16) 100 200 300 Hypergrid (D=4, H=32) 1000 2000 3000 L14-RNA1 Training Rounds (x100) TB DB FM SubTB Teacher WDB Stable StableTeacher | | 1000 Samples 100 Samples 10 Samples view at source ↗
Figure 8
Figure 8. Figure 8: Evolution of TV Bounds estimated from Theorem 3.10. Here for L14-RNA1 evaluationuses the Xsub discovered by StableTeacher (which carries the largest total reward mass), whereas view at source ↗
read the original abstract

Generative Flow Networks (GFlowNets) learn to sample states proportional to an unnormalized reward. Despite their theoretical promise, practical training is often unstable, exhibiting severe loss spikes and mode collapse. To tackle this, we first assess the sensitivity of GFlowNet objectives, demonstrating that a small Total Variation (TV) distance between the learned and target distributions does not preclude unbounded training loss. Motivated by this mismatch, we establish converse guarantees by deriving loss-to-TV bounds that certify global fidelity from bounded trajectory balance losses. Lastly, we propose Stable GFlowNets, an algorithm that leverages our theoretical results to stabilize training, and empirically demonstrate improved training behavior and superior distributional fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes the sensitivity of GFlowNet objectives to total variation (TV) distance, showing that small TV can coexist with unbounded losses. It derives converse loss-to-TV bounds from the trajectory balance objective to certify global distributional fidelity when losses are bounded. It then introduces the Stable GFlowNets algorithm that exploits these bounds for stabilization and reports empirical gains in training stability and sample fidelity over standard GFlowNet training.

Significance. If the loss-to-TV bounds are valid and the algorithm preserves the original GFlowNet sampling guarantees, the work supplies a principled route to reliable training of GFlowNets on complex reward landscapes. The combination of sensitivity analysis, explicit bounds, and a practical stabilization procedure addresses a documented practical weakness in the GFlowNet literature.

major comments (2)
  1. [§4] §4 (loss-to-TV bounds): the derivation of the converse guarantee appears to rely on a uniform bound on the trajectory balance loss across all trajectories; the manuscript should state whether this uniform bound is assumed or derived from the finite-state or finite-horizon setting, because violation of uniformity would invalidate the global TV certificate.
  2. [§5] §5 (Stable GFlowNets algorithm): the clipping or re-weighting step that enforces the bounded-loss regime must be shown not to alter the fixed point of the original GFlowNet objective; otherwise the fidelity guarantee no longer applies to the modified sampler.
minor comments (2)
  1. [§6] The experimental section should report the precise hyper-parameter ranges and random seeds used for the baseline comparisons; without them it is difficult to assess whether the reported stability gains are robust.
  2. [§2] Notation for the trajectory balance loss (Eq. (3) or equivalent) should be aligned with the standard GFlowNet literature to avoid confusion for readers familiar with the original TB objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and arguments.

read point-by-point responses
  1. Referee: [§4] §4 (loss-to-TV bounds): the derivation of the converse guarantee appears to rely on a uniform bound on the trajectory balance loss across all trajectories; the manuscript should state whether this uniform bound is assumed or derived from the finite-state or finite-horizon setting, because violation of uniformity would invalidate the global TV certificate.

    Authors: We agree that the converse loss-to-TV bounds require a uniform bound on the trajectory balance loss to certify global fidelity. This uniform bound is assumed (rather than derived) as a prerequisite for the certificate; it holds automatically in the finite-state and finite-horizon regimes analyzed in the paper because the trajectory space is finite and each loss term is finite. We will revise §4 to state the assumption explicitly, note that it is satisfied under the finite settings of the theorems, and discuss that the global TV guarantee is conditional on the losses remaining uniformly bounded. revision: yes

  2. Referee: [§5] §5 (Stable GFlowNets algorithm): the clipping or re-weighting step that enforces the bounded-loss regime must be shown not to alter the fixed point of the original GFlowNet objective; otherwise the fidelity guarantee no longer applies to the modified sampler.

    Authors: We thank the referee for highlighting this requirement. The clipping operation in Stable GFlowNets is applied only during training to cap loss spikes and is inactive once losses fall below the threshold. Consequently, the fixed points of the original trajectory-balance objective are preserved: at any stationary point the loss is zero on all trajectories, rendering clipping irrelevant. We will add a short proposition in the revised §5 formally showing that the modified updates share the same fixed points as the unmodified objective, thereby ensuring the fidelity guarantees continue to apply. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central contribution is a mathematical derivation of loss-to-TV bounds from the definitions of GFlowNet trajectory balance losses and total variation distance. This proceeds from the stated sensitivity analysis (small TV not implying bounded loss) to converse bounds that certify fidelity from bounded losses, without reducing to fitted parameters, self-referential normalizations, or load-bearing self-citations. The subsequent algorithm is motivated by but not equivalent to these bounds. No steps match the enumerated circularity patterns; the derivation is self-contained against the loss definitions and external TV metric.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the work builds on standard GFlowNet trajectory balance objectives without introducing new ones visible here.

pith-pipeline@v0.9.0 · 5441 in / 1045 out tokens · 33431 ms · 2026-05-08T19:16:45.870403+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

76 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    International Conference on Learning Representations , year =

    Learning diverse attacks on large language models for robust red-teaming and safety tuning , author =. International Conference on Learning Representations , year =

  2. [2]

    Jain, Moksh and Bengio, Emmanuel and Hernandez-Garcia, Alex and Rector-Brooks, Jarrid and Dossou, Bonaventure F. P. and Ekbote, Chanakya Ajit and Fu, Jie and Zhang, Tianyu and Kilgour, Michael and Zhang, Dinghuai and Simine, Lena and Das, Payel and Bengio, Yoshua , booktitle =. Biological Sequence Design with

  3. [3]

    Hu and Mo Tiwari and Emmanuel Bengio , title =

    Yoshua Bengio and Salem Lahlou and Tristan Deleu and Edward J. Hu and Mo Tiwari and Emmanuel Bengio , title =. Journal of Machine Learning Research , year =

  4. [4]

    2023 , series =

    A theory of continuous generative flow networks , author =. 2023 , series =

  5. [5]

    Advances in Neural Information Processing Systems , year =

    Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation , author =. Advances in Neural Information Processing Systems , year =

  6. [6]

    Towards Understanding and Improving

    Shen, Max W and Bengio, Emmanuel and Hajiramezanali, Ehsan and Loukas, Andreas and Cho, Kyunghyun and Biancalani, Tommaso , booktitle =. Towards Understanding and Improving

  7. [7]

    2017 , booktitle =

    Proximal Policy Optimization Algorithms , author =. 2017 , booktitle =

  8. [8]

    Gymnasium: A Standard Interface for Reinforcement Learning Environments

    Mark Towers and Ariel Kwiatkowski and Jordan K. Terry and Gianluca De Cola and Tristan Deleu and Manuel Goul. Gymnasium:. arXiv preprint arXiv:2407.17032 , year =

  9. [9]

    Trajectory balance: Improved credit assignment in

    Malkin, Nikolay and Jain, Moksh and Bengio, Emmanuel and Sun, Chen and Bengio, Yoshua , booktitle =. Trajectory balance: Improved credit assignment in

  10. [10]

    Learning

    Madan, Kanika and Rector-Brooks, Jarrid and Korablyov, Maksym and Bengio, Emmanuel and Jain, Moksh and Nica, Andrei Cristian and Bosc, Tom and Bengio, Yoshua and Malkin, Nikolay , booktitle =. Learning

  11. [11]

    Robust Scheduling with

    David W Zhang and Corrado Rainone and Markus Peschl and Roberto Bondesan , booktitle =. Robust Scheduling with

  12. [12]

    Let the Flows Tell: Solving Graph Combinatorial Problems with

    Zhang, Dinghuai and Dai, Hanjun and Malkin, Nikolay and Courville, Aaron C and Bengio, Yoshua and Pan, Ling , booktitle =. Let the Flows Tell: Solving Graph Combinatorial Problems with

  13. [13]

    International Conference on Learning Representations , year =

    Generative Augmented Flow Networks , author =. International Conference on Learning Representations , year =

  14. [14]

    Advances in Neural Information Processing Systems , editor =

    Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , author =. Advances in Neural Information Processing Systems , editor =

  15. [15]

    International Conference on Learning Representations , year =

    Pre-Training and Fine-Tuning Generative Flow Networks , author =. International Conference on Learning Representations , year =

  16. [16]

    International Conference on Learning Representations , year =

    Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration , author =. International Conference on Learning Representations , year =

  17. [17]

    and Harada, Daishi and Russell, Stuart J

    Ng, Andrew Y. and Harada, Daishi and Russell, Stuart J. , title =. 1999 , booktitle =

  18. [18]

    Advances in Neural Information Processing Systems , year =

    Hindsight Experience Replay , author =. Advances in Neural Information Processing Systems , year =

  19. [19]

    Advances in Neural Information Processing Systems , year =

    Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards , author =. Advances in Neural Information Processing Systems , year =

  20. [20]

    International Conference on Machine Learning , year =

    GFlowNet Training by Policy Gradients , author =. International Conference on Machine Learning , year =

  21. [21]

    Artificial Intelligence , year =

    Reward is enough , author =. Artificial Intelligence , year =

  22. [22]

    International Conference on Artificial Intelligence and Statistics , year =

    Generative Flow Networks as Entropy-Regularized RL , author =. International Conference on Artificial Intelligence and Statistics , year =

  23. [23]

    Better training of

    Pan, Ling and Malkin, Nikolay and Zhang, Dinghuai and Bengio, Yoshua , booktitle =. Better training of

  24. [24]

    International Conference on Learning Representations , year =

    Tom Schaul and John Quan and Ioannis Antonoglou and David Silver , title =. International Conference on Learning Representations , year =

  25. [25]

    Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

    Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards , author =. arXiv preprint arXiv:1707.08817 , year =

  26. [26]

    Pessimistic Backward Policy for

    Jang, Hyosoon and Jang, Yunhui and Kim, Minsu and Park, Jinkyoo and Ahn, Sungsoo , booktitle =. Pessimistic Backward Policy for

  27. [27]

    International Conference on Learning Representations , year=

    Local search GFlowNets , author=. International Conference on Learning Representations , year=

  28. [28]

    International Conference on Learning Representations , year=

    Large-scale study of curiosity-driven learning , author=. International Conference on Learning Representations , year=

  29. [29]

    torchgfn: A

    Lahlou, Salem and Viviano, Joseph D and Schmidt, Victor and Bengio, Yoshua , booktitle=. torchgfn: A

  30. [30]

    International Conference on Learning Representations , year =

    Hierarchical reinforcement learning by discovering intrinsic options , author =. International Conference on Learning Representations , year =

  31. [31]

    Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned

    Haoran He and Can Chang and Huazhe Xu and Ling Pan , booktitle =. Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned

  32. [32]

    Transactions on Machine Learning Research , year =

    Evolution Guided Generative Flow Networks , author =. Transactions on Machine Learning Research , year =

  33. [33]

    NeurIPS Workshop on System-2 Reasoning at Scale , year =

    Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning , author =. NeurIPS Workshop on System-2 Reasoning at Scale , year =

  34. [34]

    Distributional

    Zhang, Dinghuai and Pan, Ling and Chen, Ricky TQ and Courville, Aaron and Bengio, Yoshua , booktitle =. Distributional

  35. [35]

    Thompson Sampling for Improved Exploration in

    Rector-Brooks, Jarrid and Madan, Kanika and Jain, Moksh and Korablyov, Maksym and Liu, Cheng-Hao and Chandar, Sarath and Malkin, Nikolay and Bengio, Yoshua , booktitle=. Thompson Sampling for Improved Exploration in

  36. [36]

    Advances in Neural Information Processing Systems , year=

    Improved off-policy training of diffusion samplers , author=. Advances in Neural Information Processing Systems , year=

  37. [37]

    An empirical study of the effectiveness of using a replay buffer on mode discovery in

    Vemgal, Nikhil and Lau, Elaine and Precup, Doina , booktitle=. An empirical study of the effectiveness of using a replay buffer on mode discovery in

  38. [38]

    Learning to scale logits for temperature-conditional

    Kim, Minsu and Ko, Joohwan and Yun, Taeyoung and Zhang, Dinghuai and Pan, Ling and Kim, Woochang and Park, Jinkyoo and Bengio, Emmanuel and Bengio, Yoshua , booktitle=. Learning to scale logits for temperature-conditional

  39. [39]

    International Conference on Learning Representations , year=

    Adaptive teachers for amortized samplers , author=. International Conference on Learning Representations , year=

  40. [40]

    Towards Improving Exploration through Sibling Augmented

    Madan, Kanika and Lamb, Alex and Bengio, Emmanuel and Berseth, Glen and Bengio, Yoshua , booktitle=. Towards Improving Exploration through Sibling Augmented

  41. [41]

    International Conference on Learning Representations , year=

    Exploration by random network distillation , author=. International Conference on Learning Representations , year=

  42. [42]

    Science , year=

    Optimization by simulated annealing , author=. Science , year=

  43. [43]

    Statistics and computing , year=

    Annealed importance sampling , author=. Statistics and computing , year=

  44. [44]

    Advances in neural information processing systems , year=

    Generative modeling by estimating gradients of the data distribution , author=. Advances in neural information processing systems , year=

  45. [45]

    Transactions on Machine Learning Research , year =

    TacoGFN: Target-conditioned GFlownet for structure-based drug design , author=. Transactions on Machine Learning Research , year =

  46. [46]

    Ant colony sampling with

    Kim, Minsu and Choi, Sanghyeok and Kim, Hyeonah and Son, Jiwoo and Park, Jinkyoo and Bengio, Yoshua , booktitle=. Ant colony sampling with

  47. [47]

    Order-Preserving

    Yihang Chen and Lukas Mauch , booktitle=. Order-Preserving

  48. [48]

    International Conference on Learning Representations , year =

    Diffusion generative flow samplers: Improving learning signals through partial trajectory optimization , author=. International Conference on Learning Representations , year =

  49. [49]

    Lau, Elaine and Vemgal, Nikhil and Precup, Doina and Bengio, Emmanuel , booktitle=

  50. [50]

    Advances in neural information processing systems , year=

    Improved techniques for training gans , author=. Advances in neural information processing systems , year=

  51. [51]

    International Conference on Learning Representations , year =

    Unrolled generative adversarial networks , author=. International Conference on Learning Representations , year =

  52. [52]

    Advances in Neural Information Processing Systems , year=

    Rgfn: Synthesizable molecular generation using gflownets , author=. Advances in Neural Information Processing Systems , year=

  53. [53]

    Digital Discovery , year=

    Gflownets for ai-driven scientific discovery , author=. Digital Discovery , year=

  54. [54]

    Intelligent data analysis , year=

    Toward accurate dynamic time warping in linear time and space , author=. Intelligent data analysis , year=

  55. [55]

    International Conference on Artificial Intelligence and Statistics , year=

    Generative flow networks as entropy-regularized rl , author=. International Conference on Artificial Intelligence and Statistics , year=

  56. [56]

    Advances in Neural Information Processing Systems , year=

    On divergence measures for training gflownets , author=. Advances in Neural Information Processing Systems , year=

  57. [57]

    International Conference on Learning Representations , year=

    Beyond squared error: Exploring loss design for enhanced training of generative flow networks , author=. International Conference on Learning Representations , year=

  58. [58]

    International Conference on Learning Representations , year=

    Optimizing backward policies in GFlownets via trajectory likelihood maximization , author=. International Conference on Learning Representations , year=

  59. [59]

    Advances in neural information processing systems , year=

    Qgfn: Controllable greediness with action values , author=. Advances in neural information processing systems , year=

  60. [60]

    Advances in Neural Information Processing Systems , year=

    Genetic-guided GFlowNets for sample efficient molecular optimization , author=. Advances in Neural Information Processing Systems , year=

  61. [61]

    AdaLead: A simple and robust adaptive greedy search algorithm for sequence design , author=

  62. [62]

    International Conference on Learning Representations , year=

    When do GFlowNets learn the right distribution? , author=. International Conference on Learning Representations , year=

  63. [63]

    arXiv preprint arXiv:2505.02035 , year=

    Secrets of GFlowNets' Learning Behavior: A Theoretical Study , author=. arXiv preprint arXiv:2505.02035 , year=

  64. [64]

    AI for Accelerated Materials Design-NeurIPS 2023 Workshop , year=

    Hierarchical gflownet for crystal structure generation , author=. AI for Accelerated Materials Design-NeurIPS 2023 Workshop , year=

  65. [65]

    arXiv preprint arXiv:2210.00580 , year=

    GFlowNets and variational inference , author=. arXiv preprint arXiv:2210.00580 , year=

  66. [66]

    arXiv preprint arXiv:2505.15251 , year=

    Loss-guided auxiliary agents for overcoming mode collapse in gflownets , author=. arXiv preprint arXiv:2505.15251 , year=

  67. [67]

    arXiv preprint arXiv:2209.02606 , year=

    Unifying generative models with GFlowNets and beyond , author=. arXiv preprint arXiv:2209.02606 , year=

  68. [68]

    arXiv preprint arXiv:2505.03561 , year=

    Ergodic Generative Flows , author=. arXiv preprint arXiv:2505.03561 , year=

  69. [69]

    2014 , publisher=

    Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

  70. [70]

    IEEE/CVF conference on computer vision and pattern recognition , year=

    High-resolution image synthesis with latent diffusion models , author=. IEEE/CVF conference on computer vision and pattern recognition , year=

  71. [71]

    AAAI conference on artificial intelligence , year=

    Deep reinforcement learning with double q-learning , author=. AAAI conference on artificial intelligence , year=

  72. [72]

    Continuous control with deep reinforcement learning

    Continuous control with deep reinforcement learning , author=. arXiv preprint arXiv:1509.02971 , year=

  73. [73]

    Advances in neural information processing systems , year=

    Maximum likelihood training of score-based diffusion models , author=. Advances in neural information processing systems , year=

  74. [74]

    1998 , publisher=

    Reinforcement learning: An introduction , author=. 1998 , publisher=

  75. [75]

    Machine Learning , year=

    An upper bound on the loss from approximate optimal-value functions , author=. Machine Learning , year=

  76. [76]

    International conference on machine learning , year=

    Trust region policy optimization , author=. International conference on machine learning , year=