Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

Fucheng Cai; Harold Soh; Meiyi Wang; Xiaopeng Fan; Xuehui Yu

arxiv: 2605.20758 · v1 · pith:KAL6ZZWRnew · submitted 2026-05-20 · 💻 cs.AI · cs.CV· cs.LG· cs.RO

Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

Xuehui Yu , Fucheng Cai , Meiyi Wang , Xiaopeng Fan , Harold Soh This is my paper

Pith reviewed 2026-05-21 05:05 UTC · model grok-4.3

classification 💻 cs.AI cs.CVcs.LGcs.RO

keywords flow modelsadditive guidancecompositional rewardsoff-manifold driftgradient conflictsinference-time guidancegenerative modelscontrolled generation

0 comments

The pith

Conflict-Aware Additive Guidance rectifies off-manifold drift in flow models by resolving gradient conflicts during compositional reward guidance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that composing multiple constraints during inference-time guided sampling of flow models often causes the trajectory to drift away from the true data manifold. The authors trace this to approximation errors that grow sharply when gradients from different rewards misalign. They introduce a lightweight learnable adjustment that detects these conflicts in real time and corrects the guidance direction accordingly. A sympathetic reader would care because reliable multi-constraint control without retraining is currently a practical bottleneck in image editing, planning, and other generative tasks.

Core claim

We identify root causes of this off-manifold drift and find that the approximation error scales severely with gradient misalignment. Building on these findings, we propose Conflict-Aware Additive Guidance (g^car), a lightweight and learnable method, which actively rectifies off-manifold drift by dynamically detecting and resolving gradient conflicts. We validate g^car across diverse domains, ranging from synthetic datasets and image editing to generative decision-making for planning and control.

What carries the argument

Conflict-Aware Additive Guidance (g^car), which dynamically detects and resolves gradient conflicts to prevent off-manifold drift while composing multiple external constraints or rewards.

Load-bearing premise

The root cause of off-manifold drift is approximation error that scales severely with gradient misalignment, which the method assumes can be actively detected and resolved through a learnable adjustment.

What would settle it

Run the guidance procedure on a multi-reward task while disabling the conflict-detection component and measure whether manifold deviation increases in proportion to observed gradient misalignment.

Figures

Figures reproduced from arXiv: 2605.20758 by Fucheng Cai, Harold Soh, Meiyi Wang, Xiaopeng Fan, Xuehui Yu.

**Figure 1.** Figure 1: At inference time, our goal is to sample from the tilted target p ′ 1(x1) ∝ p base 1 (x1)e r(x1) . (b-c) A single reward reweights the distribution towards specific attributes (“red” or “dog”). (d) Under compositional rewards (“red” and “dog”), the ideal samples lie at the intersection of high-reward regions (⋆, ); however, existing methods often suffer from off-manifold drift (i.e., the distorted image, •… view at source ↗

**Figure 2.** Figure 2: (a) The base sampling trajectories. (b) Guided sampling adds guidance to inference trajectories and with a fixed source p base(x0); this forces the trajectories to curve significantly to satisfy the constraint, resulting in unnecessarily long and high-curvature paths. (c) Gradient misalignment (between → and →) aggravates local curvature, yielding incorrect and unstable guidance that pushes the trajectory … view at source ↗

**Figure 3.** Figure 3: (a) When gradients are misaligned (large ϕ), the approximate guidance g approx (→), the vector sum of→and→, points offmanifold; the conflict-aware weight wt ≈ 1, so g car ≈ gψ(xt, t) (→) corrects the trajectory toward the true target x1 (⋆). (b) When gradients align (ϕ ≈ 0), g approx is already accurate; wt ≈ 0, so g car ≈ g approx (→). 1. the error scales with the number of reward functions G and gradie… view at source ↗

**Figure 4.** Figure 4: Visualization results on the synthetic dataset under [1, 0] constraints. (c–e) g cov-G shows significant off-manifold drift due to an “energy trap” (highlighted by the red circle), where conflicting gradients lead to erratic sampling trajectories. (f–h) Our g car restores the accurate reward landscape, thereby rectifying the off-manifold drift. An intuitive interpretation of the “energy trap” caused by gra… view at source ↗

**Figure 5.** Figure 5: Visualization on ManiSkill2 StackCube with τ = 0.20. OOD: the trajectory leaves the data manifold, producing physically incoherent motions (e.g., erratic spinning or tangled paths); Fail:the trajectory stays on the manifold but fails the task (e.g., does not reach the goal). rotation, and gripper state. We train a base CFM model solely on unconstrained demonstrations. Reward functions r(x) include: (i) sta… view at source ↗

**Figure 6.** Figure 6: Visualization of text-guided generated faces. 6.5. Text-guided Image Manipulation Tasks. To evaluate the scalability of g car in highdimensional pixel spaces, we conduct text-guided image manipulation on CelebA-HQ. Following Liu et al. (2023b), we use a pre-trained Rectified Flow model as the generative prior and steer it toward compositional text guidance {sad + angry, sad + happy, sad + curly hair}, wh… view at source ↗

**Figure 7.** Figure 7: Inference-time guidance methods arranged by computational cost. Approximate guidance methods (e.g., g cov-G) are lightweight but accumulate local approximation error, leading to off-manifold drift. Exact guidance methods (e.g., Guidance Matching, sample-based guidance) are exact but require substantially more compute. This work (g car), which sits in the middle, aims to improve the compute-light approximat… view at source ↗

**Figure 8.** Figure 8: Spurious local minimum from gradient misalignment. (a, b) Individual energy landscapes for two multi-modal reward functions, Ej = −rj , each with two global minima (stars). One mode at (8, −8) is shared, i.e., the x ⋆ 1. (c) The compositional energy landscape E = E1 + E2 = −(r1 + r2) has a spurious local minimum x † (top): x † ̸= x ⋆ 1, and x † maximizes neither any individual reward rj nor their sum. (d) … view at source ↗

**Figure 9.** Figure 9: Energy dissipation under gradient misalignment. (a) When ϕjk = 0◦ , reward gradients are perfectly collinear, ∆E(xt) = 0, and no correction is needed. (b) When 0 ◦ < ϕjk < 90◦ , gradients are misaligned but remain in the same half-space. PCGrad detects no conflict (cos ϕjk > 0) and takes no action, yet ∆E(xt) > 0; our g car identifies this misalignment and corrects it. (c) When ϕjk > 90◦ , gradients underg… view at source ↗

**Figure 10.** Figure 10: (c-e), simply parameterizing ∇V (xt, t) fails to rectify the off-manifold drift. (a) Sampling dynamics of base FM at 𝑡 = 1 (b) Ground truth posterior 𝑝 𝑥 𝑐 = 1,0 (c) Samples from 𝑔 𝑐𝑜𝑣−𝐺 guided sampling (𝑡 = 1 , 𝑐 = 1,0 ), parameterizing ∇𝑉 𝑥𝑡 ,𝑡 (e) Guided sampling dynamics of 𝑔 𝑐𝑜𝑣−𝐺 (𝑡 = 1 , 𝑐 = 1,0 ), parameterizing ∇𝑉 𝑥𝑡 ,𝑡 (d) Energy landscape of 𝑔 𝑐𝑜𝑣−𝐺 (𝑡 = 1 , 𝑐 = 1,0 ), parameterizing ∇𝑉 𝑥𝑡 ,𝑡 (… view at source ↗

**Figure 11.** Figure 11: Ablation results on the conflict threshold τ . To understand the contribution of each component, we ablate gψ and wt independently [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

**Figure 12.** Figure 12: Component ablation on the synthetic benchmark. (a) Mode Coverage (CS) and (b) Prior Preservation (PC). The baseline g cov-G suffers from severe gradient conflicts. Applying the learned correction without the conflict gate (g approx + gψ) improves PC but hurts CS due to spurious updates in low-conflict regions. Our full method g car leverages the gate wt to restrict corrections strictly to high-conflict st… view at source ↗

**Figure 13.** Figure 13: Visualization of synthetic experiments. (a) The sampling dynamics of the base Rectified Flow model at t = 1. (b) The base posterior distribution p base(x1) consisting of three Gaussian modes. (c)–(e) Ground-truth posteriors under different classifier constraints (c = [0, 0], [1, 0], and [1, 1]), estimated via rejection sampling with 10k samples. E.4.1. INFERENCE-TIME CONSTRAINTS To evaluate the system und… view at source ↗

**Figure 14.** Figure 14: Convergence of conflict scores of g car . The figure tracks the fraction of online samples with a conflict score larger than the early-stopping threshold ϵ. This metric serves as an indicator for training stability. Results are shown for targets (a) c = [0, 0], (b) c = [1, 0], and (c) c = [1, 1] across three early-stopping threshold (ϵ = 0.00, 0.05 and 0.10), with a conflict threshold τ = 0.50. The downwa… view at source ↗

**Figure 15.** Figure 15: Robustness to clustered environments on Maze2D. We evaluate the safety and success rates by varying the number of static obstacles from 2 to 6. Our g car (red) exhibits robustness even with 6 obstacles, whereas g cov-G suffers degradation. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗

**Figure 16.** Figure 16: Visualisation of guided trajectory generation under compositional constraints in Maze2D. (1) static obstacles, (2) goal reachability, (3) dynamic obstacles, and (4) hybrid composition. Observe that g cov-G produces erratic, off-manifold trajectories, while g car yields smooth, feasible trajectories. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗

**Figure 17.** Figure 17: Visual comparison of guided generation under increasing environmental complexity. We scale the number of static obstacles to evaluate the solver’s ability to handle dense constraints. As shown, traditional planning baselines like MPPI struggle with high-dimensional constraint landscapes, often failing to find feasible paths. In contrast, gcar effectively navigates through dense clutter, generating smooth,… view at source ↗

**Figure 18.** Figure 18: Architecture of the Base CFM Policy. The conditioning context includes the goal state (e.g., target placement coordinates), the point cloud observation, and the robot state. The observation (4096 colored points) is compressed via an encoder using a PointNet backbone trained from scratch. The model outputs action chunks of horizon T generated from noise. E.6.1. BASE CFM POLICY We implement a base Condition… view at source ↗

**Figure 19.** Figure 19: Visualization on ManiSkill2 PickCube task with conflict threshold τ = 0.20. OOD: the trajectory leaves the data manifold, producing physically incoherent motions (e.g., erratic spinning or tangled paths); Fail:the trajectory stays on the manifold but fails the task (e.g., does not reach the goal). 35 [PITH_FULL_IMAGE:figures/full_fig_p035_19.png] view at source ↗

**Figure 20.** Figure 20: Additional visualization of text-guided image manipulation. This figure complements [PITH_FULL_IMAGE:figures/full_fig_p038_20.png] view at source ↗

read the original abstract

Inference-time guided sampling steers state-of-the-art diffusion and flow models without fine-tuning by interpreting the generation process as a controllable trajectory. This provides a simple and flexible way to inject external constraints (e.g., cost functions or pre-trained verifiers) for controlled generation. However, existing methods often fail when composing multiple constraints simultaneously, which leads to deviations from the true data manifold. In this work, we identify root causes of this off-manifold drift and find that the approximation error scales severely with gradient misalignment. Building on these findings, we propose Conflict-Aware Additive Guidance ($g^\text{car}$), a lightweight and learnable method, which actively rectifies off-manifold drift by dynamically detecting and resolving gradient conflicts. We validate $g^\text{car}$ across diverse domains, ranging from synthetic datasets and image editing to generative decision-making for planning and control. Our results demonstrate that $g^\text{car}$ effectively rectifies off-manifold drift, surpassing baselines in generation fidelity while using light compute. Code is available at https://github.com/yuxuehui/CAR-guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a learnable conflict detector to additive guidance for flow models, but the load-bearing claim about severe error scaling with misalignment still lacks a derivation or bound.

read the letter

The main takeaway is that this work targets off-manifold drift in compositional guidance by adding a lightweight, learnable rectification step that watches for misaligned gradients from different rewards. They call it g^car and position it as a fix that activates only when conflicts appear, rather than always altering the trajectory the same way static additive methods do. That feels like a reasonable engineering response to a practical failure mode people run into when stacking constraints on diffusion or flow models. They also ship code and test across synthetic data, image editing, and planning/control tasks, which at least shows the method is not tied to one narrow benchmark. Those are the concrete positives: a targeted adjustment plus some breadth in the experiments. The soft spot is exactly the one the stress-test flagged. The abstract and the motivation treat the severe scaling of approximation error with gradient misalignment as an identified root cause, yet nothing in the provided material supplies a bound, a Lipschitz argument, or even a simple scaling plot that isolates angle between gradients from other factors like reward magnitude or integrator step size. Without that, it is hard to know whether detecting pairwise conflicts is the right lever or whether something simpler like per-reward normalization would do as well. The experiments would need to include ablations that turn the conflict detector on and off while holding everything else fixed; if those are missing or weak, the central claim rests more on intuition than on evidence. This is the kind of paper that would interest people already running guided sampling for vision or decision-making tasks and who have hit the multi-constraint drift problem. A reader who wants a drop-in practical tweak rather than a new theoretical framework could get immediate use from the method and the released code. It is coherent enough on its own terms to deserve referee time, even if the theoretical motivation needs tightening. I would send it to peer review.

Referee Report

1 major / 0 minor

Summary. The paper claims that off-manifold drift in compositional guided sampling for flow models arises because approximation error scales severely with gradient misalignment between constraints. It introduces Conflict-Aware Additive Guidance (g^car), a lightweight learnable rectification that dynamically detects and resolves these conflicts, and reports improved generation fidelity over baselines on synthetic data, image editing, and planning/control tasks while using light compute.

Significance. If the scaling relationship is confirmed and g^car rectifies drift without new instabilities, the approach could offer a practical, training-free way to compose constraints in diffusion and flow models. The cross-domain validation and public code release strengthen reproducibility and potential impact for controlled generation.

major comments (1)

[Abstract] Abstract: the central claim that 'the approximation error scales severely with gradient misalignment' is presented as the identified root cause and the direct motivation for g^car, yet no derivation, error bound (e.g., in terms of angle between guidance vectors or Lipschitz constants of the rewards), quantitative analysis, or experimental controls are supplied to support the scaling. This leaves open whether pairwise conflict resolution is the appropriate fix or whether other factors dominate drift.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We appreciate the acknowledgment of the potential practical impact of our method for compositional guidance in flow models, as well as the positive comments on cross-domain validation and code release. We address the major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the approximation error scales severely with gradient misalignment' is presented as the identified root cause and the direct motivation for g^car, yet no derivation, error bound (e.g., in terms of angle between guidance vectors or Lipschitz constants of the rewards), quantitative analysis, or experimental controls are supplied to support the scaling. This leaves open whether pairwise conflict resolution is the appropriate fix or whether other factors dominate drift.

Authors: We agree with the referee that the abstract states this scaling relationship as a central finding without accompanying formal support in the current version of the manuscript. Our identification of gradient misalignment as a root cause of off-manifold drift is primarily grounded in the empirical results on synthetic data, where we observed a clear correlation between increasing misalignment (measured via cosine similarity of guidance vectors) and larger deviations from the data manifold. However, we did not include an explicit derivation, error bound, or dedicated experimental controls isolating this factor while holding others fixed. We will revise the manuscript to add a short theoretical subsection deriving a first-order error bound under the assumption of Lipschitz-continuous rewards, showing that the accumulated approximation error grows with (1 - cos θ), where θ is the angle between guidance gradients. We will also insert quantitative plots and ablation controls that vary only the misalignment angle. These changes will better justify the focus on pairwise conflict resolution while noting that higher-order interactions among more than two constraints remain an open direction for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central proposal is an independent learnable method.

full rationale

The paper identifies off-manifold drift causes via analysis of approximation error scaling with gradient misalignment, then introduces g^car as a new lightweight learnable rectification step that dynamically detects and resolves conflicts. This chain does not reduce by construction to self-defined quantities, fitted inputs renamed as predictions, or load-bearing self-citations. The method is validated empirically across domains rather than derived tautologically from prior results or ansatzes, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that gradient misalignment is the dominant driver of drift and that a learnable detector can correct it without introducing new fitting artifacts.

free parameters (1)

learnable parameters of g^car
The method is explicitly described as learnable, implying parameters that are trained or optimized for conflict detection and resolution.

axioms (1)

domain assumption Approximation error in existing additive guidance scales severely with gradient misalignment.
Identified as the root cause of off-manifold drift in the abstract.

pith-pipeline@v0.9.0 · 5739 in / 1195 out tokens · 45097 ms · 2026-05-21T05:05:38.674294+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

approximation error scales severely with gradient misalignment (1−cosϕ) ... gcar(xt,t)=(1−wt)gapprox+wt gψ
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.2 (Upper Bound of Approximation Error) ... G(G−1)μ² ∫ E[1−cosϕt(xt)] dt

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Chung, H., Kim, J., McCann, M

URL https://proceedings.mlr.press/ v270/chisari25a.html. Chung, H., Kim, J., McCann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy in- verse problems. InInternational Conference on Learning Representations, 2023. Domingo-Enrich, C., Drozdzal, M., Karrer, B., and Chen, R. T. Q. Adjoint matching: Fine-tuning flow and dif...

work page doi:10.48550/arxiv.2506.13922 2023
[2]

Radford, A., Kim, J

URL https://openreview.net/forum? id=PLIt3a4yTm. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021. R¨omer, R., von Rohr, A., and S...

work page doi:10.48550/arxiv.2412.09342 2021
[3]

energy trap

Springer, 2009. Wallace, B., Dang, M., Rafailov, R., Zhou, L., Lou, A., Purushwalkam, S., Ermon, S., Xiong, C., Joty, S., and Naik, N. Diffusion model alignment using direct pref- erence optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8228–8238, 2024. Wang, J., Chan, K. C., and Loy, C. C. Exploring CL...

work page arXiv 2009
[4]

rather than through reward maximization.2 As the trajectory approaches the basin of anyx †, it drifts off-manifold. Energy dissipation under gradient misalignment.To characterize the mechanism that drives trajectories off-manifold, we quantify the effective driving force of the compositional guidance via its squared norm. Expanding ∥gt(xt)∥2 at any 2From ...

work page 2020
[5]

= inf (pt, vt):p 0 vt − →p⋆ 1 Z 1 0 Ext∼pt ∥vt(xt)∥2 dt= Z 1 0 E ∥vbase t +g ⋆ t ∥2 dt,(29) where g⋆ t is the optimal guidance field that steers mass toward the reward-tilted targetp⋆

work page
[6]

The realized field ˆvt =v base t +g approx t therefore only transportsp 0 toˆp1 ̸=p ⋆ 1

Under the two-stage approximation (CIA + Localized Approximation) for compositional rewards R= P j rj, g⋆ t is replaced by the realized guidance gapprox t =P j gapprox j , where each gapprox j =∇ xt rj(ˆx1) with ˆx1 =E[x 1 |x t]. The realized field ˆvt =v base t +g approx t therefore only transportsp 0 toˆp1 ̸=p ⋆ 1. We quantify the resulting approximatio...

work page
[7]

using the stability of the continuity equation (Villani et al., 2009), which bounds the terminal distributional discrepancy by the time-integrated squared velocity field difference along the optimal pathp ⋆ t : E≜W 2 2 (ˆp1, p⋆ 1)≤ Z 1 0 Ext∼p⋆ t ∥v⋆ t (xt)−ˆvt(xt)∥2 dt= Z 1 0 Ext∼p⋆ t ∥g⋆ t −g approx t ∥2 dt.(30) Since compositional guided sampling sums ...

work page 2009
[8]

(B) Gradient misalignment error.We analyze Term (B) in two cases: (B1) When G= 1 or cosϕ jk = 1 for all pairs, Term (B)= 0

By Cauchy–Schwarz: ∥g⋆ t (xt)−g CI t (xt)∥2 2 = Ez∼π CI(·|xt) (P(z)−1)v t|z(xt|z) 2 2 ≤E z∼π CI(·|xt) (P(z)−1) 2 ·E z∼π CI(·|xt) ∥vt|z(xt|z)∥2 2 .(32) Term (A) is small when the coupling shift is negligible (P(z)≈1 ), which holds for flow matching methods with dependent couplings such as mini-batch OT-FM (Tong et al., 2024a), but not for vanilla OT-FM (On...

work page 2021
[9]

A flat reward landscape without sharp peaks or valleys implies less aggressive curvature, thereby minimizing the linearization error in Term (C)

The error is small when thereward landscape is smooth, i.e., small λh =∥∇ 2er∥2. A flat reward landscape without sharp peaks or valleys implies less aggressive curvature, thereby minimizing the linearization error in Term (C)

work page
[10]

This is the case when the flow time t→1 (and σt →0), wherex t reliably predictsx 1

The error is small when σ1 is small, i.e., the conditional covariance Σ1|t has small spectral norm, meaning that at the current state xt the uncertainty about the terminal point x1 is low. This is the case when the flow time t→1 (and σt →0), wherex t reliably predictsx 1

work page
[11]

If ˆx1 lies inside the region where r is large, the approximate guidance is more accurate, as the optimization is conducted locally and the gradient reflects the landscape well

Themagnitude of er(ˆx1) reflects how well the predicted endpoint ˆx1 =E[x 1|xt] matches the reward objective. If ˆx1 lies inside the region where r is large, the approximate guidance is more accurate, as the optimization is conducted locally and the gradient reflects the landscape well. If er(ˆx1) is small, the gradient explores the sample space almost ra...

work page
[12]

20 Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards D

The error scales withthe number of reward functionsGandgradient misalignment(1−cosϕ). 20 Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards D. Guided sampling through the lens of fitted value evaluation Unlike diffusion models, Flow models are governed by deterministic ODE processes. By leveraging this deterministic coupling and ...

work page
[13]

The dataset is infinite, i.e.,|D|=∞

work page
[14]

The Bellman residual minimization is solved exactly at each iteration

work page
[15]

The function class F is simple enough to be estimated, e.g., a one-dimensional linear function class fθ(x) =θ ⊤ϕ(x)

work page
[16]

energy trap

The realizability assumption holds, i.e., the true value function satisfiesV∈ F. This phenomenon is commonly referred to as thedeadly triadin empirical deep reinforcement learning, which arises from the interaction of function approximation, off-policy data, and bootstrapping. In the flow matching setting, however, the dynamics are deterministic and rewar...

work page 2021
[17]

Inference-time guidance applied to pre-trained generative policy models is prone to off-manifold drift, leading to poor prior preservation (e.g., failing to reach the end point) and constraint violations (e.g., colliding with obstacles or maze walls), as shown forg cov-G in Figure 16

work page
[19]

4.g car consistently corrects off-manifold drift across all settings, improving success rate and reducing constraint violations

GLASS-FKS performs well on robot planning tasks. 4.g car consistently corrects off-manifold drift across all settings, improving success rate and reducing constraint violations

work page
[20]

Sometimes, it still suffers from prior preservation issues under compositional constraints

MPPI is a strong planning baseline that refines generated paths from the base CFM model to satisfy runtime constraints. Sometimes, it still suffers from prior preservation issues under compositional constraints. When gcar is applied on top of MPPI, MPPI + gcar achieves the best overall performance, correcting off-manifold drift while satisfying constraint...

work page
[21]

Specifically, we incorporate explicit goal conditioning, and improve success rates

for goal-conditioned manipulation. Specifically, we incorporate explicit goal conditioning, and improve success rates. The detailed architecture is illustrated in Figure 18. Conditioning.The policy is conditioned on a multimodal context vectorc, constructed as follows:

work page
[22]

These are processed by a PointNet backbone to extract a dense feature vector

Observation: Raw 3D point clouds (N= 4096 ) with RGB features are fused from multi-view cameras (left, right, and gripper). These are processed by a PointNet backbone to extract a dense feature vector

work page
[23]

Proprio State: A vector containing the robot’s joint angles and gripper status

work page
[24]

These components are concatenated to form the conditioningc

Goal: The 3D coordinates representing the target placement location (e.g., the stacking position). These components are concatenated to form the conditioningc. Output.The model predicts action chunks of horizon T . The generative component is a Conditional 1D U-Net that predicts the time-dependent velocity field vθ(xt, t|c) . Here, the flow state xt ∈R T×...

work page
[25]

Adding inference-time guidance to pre-trained generative policy models is prone to OOD, and often fails to finish tasks, e.g., the failure shown in Figure 19 ofg cov-G

work page
[26]

PCGrad cannot recover from off-manifold drift

work page
[27]

GLASS-FKS generally performs well, but struggles in high-precision tasks such as StackCube (i.e., stably and precisely placing one cube onto another), due to its high transition variance. On tasks such as conditional generation (e.g., decision-making tasks), as long as the condition often appears in the dataset, GLASS-FKS performs well because it is easie...

work page 2023

[1] [1]

Chung, H., Kim, J., McCann, M

URL https://proceedings.mlr.press/ v270/chisari25a.html. Chung, H., Kim, J., McCann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy in- verse problems. InInternational Conference on Learning Representations, 2023. Domingo-Enrich, C., Drozdzal, M., Karrer, B., and Chen, R. T. Q. Adjoint matching: Fine-tuning flow and dif...

work page doi:10.48550/arxiv.2506.13922 2023

[2] [2]

Radford, A., Kim, J

URL https://openreview.net/forum? id=PLIt3a4yTm. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021. R¨omer, R., von Rohr, A., and S...

work page doi:10.48550/arxiv.2412.09342 2021

[3] [3]

energy trap

Springer, 2009. Wallace, B., Dang, M., Rafailov, R., Zhou, L., Lou, A., Purushwalkam, S., Ermon, S., Xiong, C., Joty, S., and Naik, N. Diffusion model alignment using direct pref- erence optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8228–8238, 2024. Wang, J., Chan, K. C., and Loy, C. C. Exploring CL...

work page arXiv 2009

[4] [4]

rather than through reward maximization.2 As the trajectory approaches the basin of anyx †, it drifts off-manifold. Energy dissipation under gradient misalignment.To characterize the mechanism that drives trajectories off-manifold, we quantify the effective driving force of the compositional guidance via its squared norm. Expanding ∥gt(xt)∥2 at any 2From ...

work page 2020

[5] [5]

= inf (pt, vt):p 0 vt − →p⋆ 1 Z 1 0 Ext∼pt ∥vt(xt)∥2 dt= Z 1 0 E ∥vbase t +g ⋆ t ∥2 dt,(29) where g⋆ t is the optimal guidance field that steers mass toward the reward-tilted targetp⋆

work page

[6] [6]

The realized field ˆvt =v base t +g approx t therefore only transportsp 0 toˆp1 ̸=p ⋆ 1

Under the two-stage approximation (CIA + Localized Approximation) for compositional rewards R= P j rj, g⋆ t is replaced by the realized guidance gapprox t =P j gapprox j , where each gapprox j =∇ xt rj(ˆx1) with ˆx1 =E[x 1 |x t]. The realized field ˆvt =v base t +g approx t therefore only transportsp 0 toˆp1 ̸=p ⋆ 1. We quantify the resulting approximatio...

work page

[7] [7]

using the stability of the continuity equation (Villani et al., 2009), which bounds the terminal distributional discrepancy by the time-integrated squared velocity field difference along the optimal pathp ⋆ t : E≜W 2 2 (ˆp1, p⋆ 1)≤ Z 1 0 Ext∼p⋆ t ∥v⋆ t (xt)−ˆvt(xt)∥2 dt= Z 1 0 Ext∼p⋆ t ∥g⋆ t −g approx t ∥2 dt.(30) Since compositional guided sampling sums ...

work page 2009

[8] [8]

(B) Gradient misalignment error.We analyze Term (B) in two cases: (B1) When G= 1 or cosϕ jk = 1 for all pairs, Term (B)= 0

By Cauchy–Schwarz: ∥g⋆ t (xt)−g CI t (xt)∥2 2 = Ez∼π CI(·|xt) (P(z)−1)v t|z(xt|z) 2 2 ≤E z∼π CI(·|xt) (P(z)−1) 2 ·E z∼π CI(·|xt) ∥vt|z(xt|z)∥2 2 .(32) Term (A) is small when the coupling shift is negligible (P(z)≈1 ), which holds for flow matching methods with dependent couplings such as mini-batch OT-FM (Tong et al., 2024a), but not for vanilla OT-FM (On...

work page 2021

[9] [9]

A flat reward landscape without sharp peaks or valleys implies less aggressive curvature, thereby minimizing the linearization error in Term (C)

The error is small when thereward landscape is smooth, i.e., small λh =∥∇ 2er∥2. A flat reward landscape without sharp peaks or valleys implies less aggressive curvature, thereby minimizing the linearization error in Term (C)

work page

[10] [10]

This is the case when the flow time t→1 (and σt →0), wherex t reliably predictsx 1

The error is small when σ1 is small, i.e., the conditional covariance Σ1|t has small spectral norm, meaning that at the current state xt the uncertainty about the terminal point x1 is low. This is the case when the flow time t→1 (and σt →0), wherex t reliably predictsx 1

work page

[11] [11]

If ˆx1 lies inside the region where r is large, the approximate guidance is more accurate, as the optimization is conducted locally and the gradient reflects the landscape well

Themagnitude of er(ˆx1) reflects how well the predicted endpoint ˆx1 =E[x 1|xt] matches the reward objective. If ˆx1 lies inside the region where r is large, the approximate guidance is more accurate, as the optimization is conducted locally and the gradient reflects the landscape well. If er(ˆx1) is small, the gradient explores the sample space almost ra...

work page

[12] [12]

20 Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards D

The error scales withthe number of reward functionsGandgradient misalignment(1−cosϕ). 20 Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards D. Guided sampling through the lens of fitted value evaluation Unlike diffusion models, Flow models are governed by deterministic ODE processes. By leveraging this deterministic coupling and ...

work page

[13] [13]

The dataset is infinite, i.e.,|D|=∞

work page

[14] [14]

The Bellman residual minimization is solved exactly at each iteration

work page

[15] [15]

The function class F is simple enough to be estimated, e.g., a one-dimensional linear function class fθ(x) =θ ⊤ϕ(x)

work page

[16] [16]

energy trap

The realizability assumption holds, i.e., the true value function satisfiesV∈ F. This phenomenon is commonly referred to as thedeadly triadin empirical deep reinforcement learning, which arises from the interaction of function approximation, off-policy data, and bootstrapping. In the flow matching setting, however, the dynamics are deterministic and rewar...

work page 2021

[17] [17]

Inference-time guidance applied to pre-trained generative policy models is prone to off-manifold drift, leading to poor prior preservation (e.g., failing to reach the end point) and constraint violations (e.g., colliding with obstacles or maze walls), as shown forg cov-G in Figure 16

work page

[18] [19]

4.g car consistently corrects off-manifold drift across all settings, improving success rate and reducing constraint violations

GLASS-FKS performs well on robot planning tasks. 4.g car consistently corrects off-manifold drift across all settings, improving success rate and reducing constraint violations

work page

[19] [20]

Sometimes, it still suffers from prior preservation issues under compositional constraints

MPPI is a strong planning baseline that refines generated paths from the base CFM model to satisfy runtime constraints. Sometimes, it still suffers from prior preservation issues under compositional constraints. When gcar is applied on top of MPPI, MPPI + gcar achieves the best overall performance, correcting off-manifold drift while satisfying constraint...

work page

[20] [21]

Specifically, we incorporate explicit goal conditioning, and improve success rates

for goal-conditioned manipulation. Specifically, we incorporate explicit goal conditioning, and improve success rates. The detailed architecture is illustrated in Figure 18. Conditioning.The policy is conditioned on a multimodal context vectorc, constructed as follows:

work page

[21] [22]

These are processed by a PointNet backbone to extract a dense feature vector

Observation: Raw 3D point clouds (N= 4096 ) with RGB features are fused from multi-view cameras (left, right, and gripper). These are processed by a PointNet backbone to extract a dense feature vector

work page

[22] [23]

Proprio State: A vector containing the robot’s joint angles and gripper status

work page

[23] [24]

These components are concatenated to form the conditioningc

Goal: The 3D coordinates representing the target placement location (e.g., the stacking position). These components are concatenated to form the conditioningc. Output.The model predicts action chunks of horizon T . The generative component is a Conditional 1D U-Net that predicts the time-dependent velocity field vθ(xt, t|c) . Here, the flow state xt ∈R T×...

work page

[24] [25]

Adding inference-time guidance to pre-trained generative policy models is prone to OOD, and often fails to finish tasks, e.g., the failure shown in Figure 19 ofg cov-G

work page

[25] [26]

PCGrad cannot recover from off-manifold drift

work page

[26] [27]

GLASS-FKS generally performs well, but struggles in high-precision tasks such as StackCube (i.e., stably and precisely placing one cube onto another), due to its high transition variance. On tasks such as conditional generation (e.g., decision-making tasks), as long as the condition often appears in the dataset, GLASS-FKS performs well because it is easie...

work page 2023