arxiv: 2605.10302 · v2 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Follow the Mean: Reference-Guided Flow Matching

Pedro M. P. Curvo , Maksim Zhdanov , Floor Eijkelboom , Jan-Willem van de Meent

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:10 UTC · model grok-4.3

classification 💻 cs.LG

keywords meanflowmatchingreferencecontrolcontrollablegenerationguidance

0 comments

The pith

Flow matching admits controllable generation by shifting the conditional endpoint mean computed from a reference set, enabling training-free guidance on frozen pretrained models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative models turn noise into images by following a smooth path. In flow matching, that path is defined by a velocity field. The paper observes that when the path is deterministic, the velocity at any point depends only on the average of the possible ending points. By swapping which set of reference images you average, you change where the flow heads without touching the model weights. They demonstrate this with a training-free correction applied to a 4-billion-parameter FLUX model to control color, identity, and structure while the prompt stays fixed. A second version adds a small learned refiner around an explicit mean anchor so the reference set can be swapped at test time while still matching the quality of standard unconditional models on animal-face datasets.

Core claim

For deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself.

Load-bearing premise

That the interpolants remain deterministic and that the endpoint mean fully determines the velocity field without additional dependencies on the reference distribution or noise schedule.

Figures

Figures reproduced from arXiv: 2605.10302 by Floor Eijkelboom, Jan-Willem van de Meent, Maksim Zhdanov, Pedro M. P. Curvo.

**Figure 2.** Figure 2: Reference-mean guidance on the two-moons distribution. The model and all other settings [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Reference-set swaps on frozen FLUX.2-klein. Prompt and noise seed are fixed within each column. The generated output shifts systematically in color, object identity, and style as the reference set changes. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative evidence of structural control on frozen [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: SPG preserves unconditional generation quality while enabling inference-time control [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: SPG preserves generation quality, avoids memorization, and enables inference-time control [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Examples from the GenEval protocol. Each column shows a representative reference-bank [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Two-moons control. Top: t changes with the reference set fixed. Bottom: the reference composition changes with the model fixed. MNIST. We repeat the analysis on MNIST digits (0 and 1), where the reference set now operates on image-space representations [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Inference-time condition changes; dataset and model are fixed. [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: MNIST steering with soft-labeled references. The same model generates ones or zeros [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: The three βt schedules evaluated in this ablation, shown for β0 = 1 before the shared late-time cutoff at t = 0.85: constant (βt = β0), quadratic decay (βt = β0(1 − t) 2 ), and bell-shaped (βt = 4β0t(1 − t)). Constant applies uniform guidance until the cutoff; quadratic decay front-loads guidance and vanishes at t = 1, avoiding the (1 − t) −1 instability; bell-shaped guidance peaks at t = 0.5 and suppress… view at source ↗

**Figure 12.** Figure 12: Ablation of guidance strength β0 for the constant schedule. The constant schedule applies uniform guidance at every step. At moderate β0 the target attribute transfers cleanly, but above β0 ≈ 1 artifacts appear near t = 1, visible as oversaturated colors and structural distortion. E.4 Reference-Set Size We study the effect of the reference-set size on the diversity of generated outputs. We fix the prompt,… view at source ↗

**Figure 13.** Figure 13: Ablation of guidance strength β0 for the bell-shaped schedule. The bell-shaped schedule concentrates guidance around the midpoint of the trajectory and suppresses both early and late corrections. Relative to the constant schedule, it delays attribute transfer slightly but remains stable at larger β0 values. a single mode. This behavior contrasts with guidance methods that often reduce diversity as control… view at source ↗

**Figure 14.** Figure 14: Ablation of guidance strength β0 for the quadratic decay schedule. The quadratic schedule front-loads guidance and decays to zero at t = 1, cancelling the late-time divergence. Across the full β0 range it provides the cleanest attribute transfer with the fewest late-time artifacts, which is why this schedule is used in the main experiments. E.5 Number of Function Evaluations (NFE) We study the effect of t… view at source ↗

**Figure 15.** Figure 15: Reference-set size ablation. LPIPS diversity increases with the number of reference [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Ring-leap control task across guidance strengths and solver budgets. Columns vary NFE, [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

**Figure 17.** Figure 17: Prompt–reference interaction. Rows change the prompt; columns change the reference [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗

**Figure 18.** Figure 18: Quantitative controllability under reference composition for the prompt [PITH_FULL_IMAGE:figures/full_fig_p037_18.png] view at source ↗

**Figure 19.** Figure 19: Qualitative controllability for the prompt [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗

**Figure 20.** Figure 20: Qualitative controllability for the prompt [PITH_FULL_IMAGE:figures/full_fig_p039_20.png] view at source ↗

**Figure 21.** Figure 21: SPG diversity as a function of reference-set size. Average pairwise LPIPS increases with [PITH_FULL_IMAGE:figures/full_fig_p039_21.png] view at source ↗

**Figure 22.** Figure 22: White-background reference-bank comparison. The reference bank consists of examples [PITH_FULL_IMAGE:figures/full_fig_p040_22.png] view at source ↗

**Figure 23.** Figure 23: Reference bank of 20 images of pink elephants. [PITH_FULL_IMAGE:figures/full_fig_p041_23.png] view at source ↗

**Figure 24.** Figure 24: Reference bank of 20 images of blue elephants. [PITH_FULL_IMAGE:figures/full_fig_p042_24.png] view at source ↗

**Figure 25.** Figure 25: Reference bank of 20 images of giraffes. [PITH_FULL_IMAGE:figures/full_fig_p043_25.png] view at source ↗

**Figure 26.** Figure 26: Reference bank of 20 images of zebras. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_26.png] view at source ↗

**Figure 27.** Figure 27: Reference bank of 20 images of elephants. [PITH_FULL_IMAGE:figures/full_fig_p045_27.png] view at source ↗

**Figure 28.** Figure 28: Reference bank of 20 images of keyholes. [PITH_FULL_IMAGE:figures/full_fig_p046_28.png] view at source ↗

**Figure 29.** Figure 29: Reference bank of 20 images of Van Gogh style images. [PITH_FULL_IMAGE:figures/full_fig_p047_29.png] view at source ↗

**Figure 30.** Figure 30: Reference bank of 20 pencil-sketch house images. [PITH_FULL_IMAGE:figures/full_fig_p048_30.png] view at source ↗

**Figure 31.** Figure 31: Reference bank of 20 cinematic house images. [PITH_FULL_IMAGE:figures/full_fig_p049_31.png] view at source ↗

**Figure 32.** Figure 32: Reference bank of three hand-pose images used for the sign-of-the-horns experiment. [PITH_FULL_IMAGE:figures/full_fig_p049_32.png] view at source ↗

read the original abstract

Existing approaches to controllable generation typically rely on fine-tuning, auxiliary networks, or test-time search. We show that flow matching admits a different control interface: adaptation through examples. For deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself. This yields a simple principle for controllable generation: steer a pretrained model by changing the reference set it follows. We instantiate this idea in two forms. Reference-Mean Guidance is training-free: it computes a closed-form endpoint-mean correction from a reference bank and applies it to a frozen FLUX.2-klein (4B) model, enabling control of color, identity, style, and structure while keeping the prompt, seed, and weights fixed. Semi-Parametric Guidance amortizes the same idea through an explicit mean anchor and learned residual refiner, matching unconditional DiT-B/4 quality on AFHQv2 while allowing the reference set to be swapped at inference time. These results point to a broader direction: generative models that adapt through data, not parameter updates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces flow-matching control to shifting a reference endpoint mean and shows it works on a frozen FLUX model, but the derivation and quantitative checks are still thin.

read the letter

The core claim is that deterministic flow matching lets you steer generation by changing the conditional mean of the endpoint drawn from a reference set, and the authors give a closed-form way to do this on a pretrained model without touching its weights. They also sketch a semi-parametric version that keeps an explicit mean anchor plus a learned refiner. That reduction to endpoint-mean steering is the new piece, and it follows directly from the linear interpolant setup where velocity is set by the expected endpoint. Applying it to FLUX.2-klein for color, identity, and style control while keeping prompt and seed fixed is a practical demonstration that the idea can be used at scale with no fine-tuning. The semi-parametric route also shows you can swap reference sets at inference time and still match unconditional DiT quality on AFHQv2. Those are the parts that land cleanly. The soft spot is that the central step—claiming the velocity field is governed solely by the conditional endpoint mean with no leftover schedule or marginal dependence—needs the full derivation and error analysis to hold up. The current writeup gives qualitative results on one model but no ablations against standard guidance baselines or checks for schedule sensitivity, so the generality is not yet clear. The stress-test concern about implicit dependencies on the noise schedule or reference covariance is worth checking explicitly. This is for people working on controllable generation who want training-free options in flow or diffusion models. A reader who needs a simple steering knob for large frozen models will get something usable from the principle even before the details are tightened. It deserves a serious referee because the idea is straightforward to implement and the evidence, while preliminary, points to a direction worth verifying rather than dismissing.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that flow matching with deterministic interpolants allows the velocity field to be controlled solely through the conditional endpoint mean, enabling reference-guided generation by shifting this mean using example sets. This is instantiated as training-free Reference-Mean Guidance applied to a frozen FLUX.2-klein model for attribute control, and as Semi-Parametric Guidance that amortizes the approach while maintaining quality on AFHQv2.

Significance. If the central theoretical claim holds and is supported by rigorous derivation and experiments, this work could offer a significant advance in controllable generation for flow-based models by providing a simple, training-free adaptation mechanism based on reference data rather than parameter updates or auxiliary models. The application to a large-scale pretrained model like FLUX.2-klein highlights practical potential, though stronger quantitative evidence is needed to establish the method's reliability.

major comments (3)

The core assertion that 'for deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean' lacks a detailed derivation showing that the proposed closed-form correction implements exactly v_t(x_t) = (E[x_1 | x_t] - x_t)/(1-t) with no residual terms from p_t(x_t), reference marginals, or the noise schedule; this is load-bearing for the claim that shifting the mean shifts the flow itself.
In the Reference-Mean Guidance instantiation on the frozen FLUX.2-klein (4B) model, the manuscript does not verify that the endpoint-mean correction avoids injecting schedule-dependent scaling or reference-set covariance effects into the effective velocity, as required by the skeptic's concern on implicit dependencies.
The claim that Semi-Parametric Guidance matches unconditional DiT-B/4 quality on AFHQv2 while allowing reference-set swapping is stated without quantitative metrics, ablations, or error analysis, which is necessary to substantiate that the amortized mean anchor preserves fidelity without introducing new dependencies.

minor comments (2)

Clarify notation for 'FLUX.2-klein (4B)' and 'reference bank' for consistency across sections.
The abstract would benefit from a brief mention of any evaluation metrics used for the qualitative control results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications, additional derivations, and planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: The core assertion that 'for deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean' lacks a detailed derivation showing that the proposed closed-form correction implements exactly v_t(x_t) = (E[x_1 | x_t] - x_t)/(1-t) with no residual terms from p_t(x_t), reference marginals, or the noise schedule; this is load-bearing for the claim that shifting the mean shifts the flow itself.

Authors: We agree that a self-contained derivation is necessary for rigor. In the revision we will add a full proof in the appendix establishing that, for deterministic linear interpolants, the flow-matching velocity reduces exactly to v_t(x_t) = (E[x_1 | x_t] - x_t)/(1-t) with no residual dependence on the marginal p_t(x_t), reference marginals, or noise schedule. The proof proceeds by substituting the deterministic interpolant into the conditional expectation and showing that all other terms cancel. revision: yes
Referee: In the Reference-Mean Guidance instantiation on the frozen FLUX.2-klein (4B) model, the manuscript does not verify that the endpoint-mean correction avoids injecting schedule-dependent scaling or reference-set covariance effects into the effective velocity, as required by the skeptic's concern on implicit dependencies.

Authors: We acknowledge the need for explicit verification. In the revised manuscript we will insert a dedicated analysis subsection that substitutes the closed-form mean correction into the velocity expression and algebraically confirms the absence of schedule-dependent scaling and reference-set covariance terms. We will also add targeted empirical diagnostics on the FLUX.2-klein outputs to corroborate that no unintended dependencies are introduced. revision: yes
Referee: The claim that Semi-Parametric Guidance matches unconditional DiT-B/4 quality on AFHQv2 while allowing reference-set swapping is stated without quantitative metrics, ablations, or error analysis, which is necessary to substantiate that the amortized mean anchor preserves fidelity without introducing new dependencies.

Authors: We agree that quantitative evidence is required. In the revision we will report FID scores comparing Semi-Parametric Guidance against the unconditional DiT-B/4 baseline on AFHQv2, include ablations isolating the mean-anchor and residual-refiner components, and provide error analysis demonstrating that reference-set swapping preserves fidelity without introducing new dependencies beyond those of the base model. revision: yes

Circularity Check

0 steps flagged

No significant circularity; core claim follows from standard deterministic flow-matching properties

full rationale

The paper derives the control principle directly from the mathematical property of deterministic linear interpolants in flow matching, where the velocity satisfies v_t(x_t) = (E[x_1|x_t] - x_t)/(1-t) by definition of the conditional expectation under the path x_t = (1-t)x_0 + t x_1. This is presented as an external fact of the interpolant construction rather than a fitted parameter or self-referential equation. No load-bearing step reduces to a self-citation, ansatz smuggled via prior work, or renaming of a known result; the reference-mean guidance is an application of this property to a frozen model. The derivation remains self-contained against external flow-matching theory and does not force the target result by construction from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the deterministic interpolant property of flow matching and the assumption that the velocity field depends only on the conditional endpoint mean.

axioms (1)

domain assumption Deterministic interpolants govern the velocity field solely via conditional endpoint mean
Invoked to justify that shifting the reference mean directly steers the flow without additional terms.

pith-pipeline@v0.9.0 · 5500 in / 1126 out tokens · 41545 ms · 2026-05-13T06:10:13.891989+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For the commonly used linear interpolant ... ut(x) = μt(x)−x / (1−t), μt(x) := E[x1 | xt = x]
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

[1]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=PqvMRDCJT9t

work page 2023
[2]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=XVjTT1nw5z

work page 2023
[3]

Building normalizing flows with stochastic in- terpolants

Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic in- terpolants. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=li7qeBbCR1t

work page 2023
[4]

2023 , url =

William Peebles and Saining Xie. Scalable diffusion models with transformers. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), page 4172–4182, Paris, France, October 2023. IEEE. ISBN 979-8-3503-0718-4. doi: 10.1109/ICCV51070.2023.00387. URL https://ieeexplore. ieee.org/document/10377858/

work page doi:10.1109/iccv51070.2023.00387 2023
[5]

FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2, 2025

Black Forest Labs. FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2, 2025

work page 2025
[6]

Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

work page 2023
[7]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[8]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023

work page 2023
[9]

Diffusion models beat GANs on image synthesis

Prafulla Dhariwal and Alexander Quinn Nichol. Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. URLhttps://openreview.net/forum?id=AAWuCvzaVt

work page 2021
[10]

On the guidance of flow matching

Ruiqi Feng, Chenglei Yu, Wenhao Deng, Peiyan Hu, and Tailin Wu. On the guidance of flow matching. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/ forum?id=pKaNgFzJBy

work page 2025
[11]

Peter Potaptchik, Cheuk-Kit Lee, and Michael S. Albergo. Tilt matching for scalable sampling and fine-tuning, 2025. URLhttps://arxiv.org/abs/2512.21829

work page arXiv 2025
[12]

If at first you don’t succeed, try, try again: Faithful diffusion-based text-to-image generation by selection, 2023

Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, and Zeynep Akata. If at first you don’t succeed, try, try again: Faithful diffusion-based text-to-image generation by selection, 2023. URL https://arxiv.org/abs/2305.13308

work page arXiv 2023
[13]

Trippe, Christian A

Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, and John P. Cunningham. Practical and asymptotically exact conditional sampling in diffusion models, 2024. URL https://arxiv.org/ abs/2306.17775

work page arXiv 2024
[14]

Improving text-to-image consistency via automatic prompt optimization.Transactions on Machine Learning Research, 2024

Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, and Michal Drozdzal. Improving text-to-image consistency via automatic prompt optimization.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id=g12Gdl6aDL. Featured Certification

work page 2024
[15]

ReNO: Enhanc- ing one-step text-to-image models through reward-based noise optimization

Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. ReNO: Enhanc- ing one-step text-to-image models through reward-based noise optimization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum? id=MXY0qsGgeO

work page 2024
[16]

Naesseth, Max Welling, and Jan-Willem van de Meent

Floor Eijkelboom, Grigory Bartosh, Christian A. Naesseth, Max Welling, and Jan-Willem van de Meent. Variational flow matching for graph generation. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URLhttps://openreview.net/forum?id=UahrHR5HQh

work page 2024
[17]

Training flow matching: The role of weighting and parameterization

Anne Gagneux, Ségolène Tiffany Martin, Rémi Gribonval, and Mathurin Massias. Training flow matching: The role of weighting and parameterization. InICLR 2026 2nd Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, 2026. URL https://openreview.net/forum?id= RYQBTBZxNl. 11

work page 2026
[18]

On the closed-form of flow match- ing: Generalization does not arise from target stochasticity

Quentin Bertrand, Anne Gagneux, Mathurin Massias, and Rémi Emonet. On the closed-form of flow match- ing: Generalization does not arise from target stochasticity. InThe Thirty-ninth Annual Conference on Neu- ral Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=kVz9uvqUna

work page 2025
[19]

How do flow matching models memorize and generalize in sample data subspaces?, 2024

Weiguo Gao and Ming Li. How do flow matching models memorize and generalize in sample data subspaces?, 2024. URLhttps://arxiv.org/abs/2410.23594

work page arXiv 2024
[20]

Nearest neighbour score estimators for diffusion generative models

Matthew Niedoba, Dylan Green, Saeid Naderiparizi, Vasileios Lioutas, Jonathan Wilder Lavington, Xiaoxuan Liang, Yunpeng Liu, Ke Zhang, Setareh Dabiri, Adam Scibior, Berend Zwartsenberg, and Frank Wood. Nearest neighbour score estimators for diffusion generative models. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jo...

work page 2024
[21]

Closed-form diffusion models

Christopher Scarvelis, Haitz Sáez de Ocáriz Borde, and Justin Solomon. Closed-form diffusion models. Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URL https://openreview.net/ forum?id=JkMifr17wc

work page 2025
[22]

Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical Images

Daniel Wolf, Heiko Hillenhagen, Billurvan Taskin, Alex Bäuerle, Meinrad Beer, Michael Götz, and Timo Ropinski. Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical Images. InProceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2025, volume LNCS 15964. Springer Nature Switzerland, September 2025

work page 2025
[23]

Geneval: An object-focused framework for eval- uating text-to-image alignment

Dhruba Ghosh, Hannaneh Hajishirzi, and Ludwig Schmidt. Geneval: An object-focused framework for eval- uating text-to-image alignment. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URLhttps://openreview.net/forum?id=Wbr51vK331

work page 2023
[24]

Albergo, Carles Domingo-Enrich, Nicholas M

Amirmojtaba Sabour, Michael S. Albergo, Carles Domingo-Enrich, Nicholas M. Boffi, Sanja Fidler, Karsten Kreis, and Eric Vanden-Eijnden. Test-time scaling of diffusions with flow maps, 2025. URL https://arxiv.org/abs/2511.22688

work page arXiv 2025
[25]

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas M. Boffi, and Max Simchowitz. Diamond maps: Efficient reward alignment via stochastic flow maps, 2026. URLhttps://arxiv.org/abs/2602.05993

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

Retrieval- augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, volume 33, 2020

work page 2020
[27]

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, ...

work page 2022
[28]

Semi-parametric neural image synthesis

Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas Müller, and Björn Ommer. Semi-parametric neural image synthesis. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum? id=Bqk9c0wBNrZ

work page 2022
[29]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, 2020

work page 2020
[30]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021. URLhttps://openreview.net/forum?id=PxTIG12RRHS

work page 2021
[31]

Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research, 2024

Alexander Tong, Kilian FATRAS, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector- Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URL https://openreview.net/forum?id=CD9Snc73AW. Expert Certification. 12

work page 2024
[32]

Albergo, Nicholas M

Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. In European Conference on Computer Vision, 2024

work page 2024
[33]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https://openreview.net/forum?id= qw8AKxfYbI

work page 2021
[34]

T2i- adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i- adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and...

work page doi:10.1609/aaai.v38i5.28226 2024
[35]

Prompt-to- prompt image editing with cross-attention control

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to- prompt image editing with cross-attention control. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=_CDixzkzeyb

work page 2023
[36]

Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing

Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, and Yinqiang Zheng. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22560–22570, October 2023

work page 2023
[37]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP-adapter: Text compatible image prompt adapter for text-to-image diffusion models, 2023. URLhttps://arxiv.org/abs/2308.06721

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Dongxu Li, Junnan Li, and Steven C. H. Hoi. BLIP-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. InAdvances in Neural Information Processing Systems, 2023

work page 2023
[39]

Instantid: Zero-shot identity-preserving generation in seconds.arXiv preprint arXiv:2401.07519, 2024

Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, and Yao Hu. Instantid: Zero-shot identity-preserving generation in seconds, 2024. URL https://arxiv.org/abs/ 2401.07519

work page arXiv 2024
[40]

Generalization through memorization: Nearest neighbor language models

Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through memorization: Nearest neighbor language models. InInternational Conference on Learning Representations, 2020. URLhttps://openreview.net/forum?id=HklBjCEKvH

work page 2020
[41]

An image is worth one word: Personalizing text-to-image generation using textual inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview. net/forum?id=NAQvF08TcyG

work page 2023
[42]

Multi-concept customization of text-to-image diffusion

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

work page 2023
[43]

Continual diffusion: Continual customization of text-to-image diffusion with c-loRA.Transactions on Machine Learning Research, 2024

James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, and Hongxia Jin. Continual diffusion: Continual customization of text-to-image diffusion with c-loRA.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URL https://openreview.net/forum?id= TZdEgwZ6f3

work page 2024
[44]

Test-time training with self-supervision for generalization under distribution shifts

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-time training with self-supervision for generalization under distribution shifts. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 9229–9248. PMLR,...

work page 2020
[45]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=uXl3bZLkr3c

work page 2021
[46]

Generalization in diffusion models arises from geometry-adaptive harmonic representations

Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, and Stéphane Mallat. Generalization in diffusion models arises from geometry-adaptive harmonic representations. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=ANvmVS2Yr0

work page 2024
[47]

Repa-e: Unlocking vae for end-to-end tuning with latent diffusion transformers, 2025

Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, and Liang Zheng. Repa-e: Unlocking vae for end-to-end tuning with latent diffusion transformers, 2025. URL https://arxiv. org/abs/2504.10483. 13

work page arXiv 2025
[48]

Symbolic discovery of optimization algorithms

Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, and Quoc V Le. Symbolic discovery of optimization algorithms. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview. net/forum?id=ne6zeqLFCZ

work page 2023
[49]

On aliased resizing and surprising subtleties in gan evaluation

Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. On aliased resizing and surprising subtleties in gan evaluation. InCVPR, 2022

work page 2022
[50]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine...

work page 2021
[51]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024. URLhttps://arxiv.org/abs/2409.121...

work page internal anchor Pith review Pith/arXiv arXiv 2024