pith. sign in

arxiv: 2607.01144 · v1 · pith:W2OJJJRWnew · submitted 2026-07-01 · 💻 cs.LG · cs.AI· cs.CE

Sequentially-Controlled Interactive Multi-Particle Flow-Maps for Online Feedback-Driven Search

Pith reviewed 2026-07-02 15:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CE
keywords multi-particle samplingflow mapsFeynman-Kac correctoronline feedback searchKL-tilted distributionpreference alignmentglobal explorationSMC samplers
0
0 comments X

The pith

IMPFM uses flow-map sample sharing to create a multi-particle Feynman-Kac corrector that steers ensembles toward KL-tilted distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops IMPFM for online feedback-driven search when user preferences are unknown in advance and revealed only through sequential interactions. It transports groups of particles using flow maps that share posterior samples across the ensemble at resampling steps, correcting individual drifts with collective information. This setup is shown to produce a multi-particle interaction-aware Feynman-Kac corrector that progressively moves the system to a KL-tilted target distribution. The approach aims to support broad exploration while avoiding mode collapse and reward over-optimization that plague narrower methods.

Core claim

The resulting sampling framework yields a multi-particle interaction-aware Feynman-Kac corrector that progressively steers the multi-particle system toward a KL-tilted target distribution, facilitating global exploration and preventing mode collapse.

What carries the argument

Flow-map powered posterior sample sharing mechanism that supplies collective corrections to individual particle drift at each resampling step.

If this is right

  • The method enables sample-efficient global search in tasks where preferences must be learned through sequential feedback.
  • Multi-particle interaction reweighting overcomes weight degeneracy typical of standard SMC samplers while preserving structural diversity.
  • The framework actively reduces reward over-optimization by maintaining ensemble coverage during transport.
  • Empirical results across search and alignment tasks show improved coverage compared with existing baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The particle-sharing idea could transfer to other sequential decision settings that require balancing exploration with partial feedback.
  • Pairing the flow-map corrections with existing generative models might extend the approach to higher-dimensional or structured search spaces.
  • Testing the unbiased-correction assumption in discrete versus continuous state spaces would clarify the framework's range of validity.

Load-bearing premise

Flow maps can be computed to share posterior samples across particles in a way that produces unbiased collective corrections at each resampling step.

What would settle it

An experiment that demonstrates biased collective corrections or persistent mode collapse in the particle ensemble under the proposed resampling would falsify the claimed Feynman-Kac property.

Figures

Figures reproduced from arXiv: 2607.01144 by Anindya Sarkar, Binglin Ji, Hengchang Lu, Jens Sj\"olund, Yevgeniy Vorobeychik.

Figure 1
Figure 1. Figure 1: Conceptual Overview of IMPFM. have emerged as a powerful alternative—progressively correcting dynamics at every resampling step—making them a highly promising paradigm for rapid, feedback-driven adaptation. However, these approaches rely on reward values or gradients at every resampling step; in practice, these signals are unavailable and must be approximated, introducing systematic bias into the sampler a… view at source ↗
Figure 2
Figure 2. Figure 2: Sampling Mechanism of IMPFM. local modes, ensuring the ensemble maintains a broad, diverse coverage of the target distribution’s landscape. Equation 6 reveals a powerful collaborative mechanism: the drift of the i-th particle is calibrated by the value gradients of its peers. Since computing the value gradient fundamentally requires drawing multiple Monte Carlo samples from a particle’s corresponding poste… view at source ↗
Figure 3
Figure 3. Figure 3: Search Visualizations. Notably, IMPFM demonstrates superior resistance to reward over-optimization. While both IMPFM and the MFM baseline utilize ImageReward to steer the dynamics, IMPFM consistently outperforms the baselines across held-out evaluation reward models, e.g., Pick-Score [18]. We attribute this robust￾ness to our ensemble-based reward guidance mechanism. Rather than evaluating each particle ag… view at source ↗
Figure 4
Figure 4. Figure 4: Search Performance Analysis of Competitive Approaches on Imagenet Target Classes. local reward signals in isolation, IMPFM aggregates reward feedback across the full ensemble’s posterior samples. This global aggregation effectively smooths the reward landscape, establishing a broader consensus for the guidance that prevents individual particles from collapsing into the narrow, high-reward local maxima that… view at source ↗
Figure 6
Figure 6. Figure 6: Efficacy of SS on IMPFM. (left) B = 20; (right) B = 80 stochasticity by converting the ODE to an SDE via Eq. 8. A direct comparison across varying feedback budgets ( [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Impact of Reweighting. Importance of Reweighting Mechanism We isolate the effect of the reweighting step by comparing the full IMPFM framework against an ablated variant, referred to as Multi Particle Control (MPC), in which only the reweighting mechanism is removed. As reported in 7, the reweighting step proves critical — it drives faster convergence to higher reward scores (evaluated using the VQA score)… view at source ↗
Figure 8
Figure 8. Figure 8: Alignment Visualization. (left) Quantity Prompt. (right) Compositional Prompt. ◦ DAS □ FKS ⋄ MFM ▼ IMPFM (Our) Feedback Budgets Feedback Budgets Feedback Budgets Feedback Budgets Feedback Budgets [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Analysis of Competitive Approaches on (right) Compositional, (left ) Quantity-aware Alignment. Alignment Result: The Alignment results (as depicted in [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Search Performance Analysis of Competitive Approaches on Imagenet Target Classes. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Impact of Particle Interaction in Mitigating Reward Over-optimization on FKC. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Impact of Particle Interaction in Mitigating Reward Over-optimization on Optimal Control. 9 Visualizing the Impact of Sufficient Statistic In the main paper (Section 3), we analyze the impact of the Sufficient Statistic (SS). Here, we present additional visualizations highlighting that the SS is crucial for enabling feedback-efficient search, especially under severe budget constraints. As depicted in [PI… view at source ↗
Figure 13
Figure 13. Figure 13: Visualizing the Impact of Sufficient Statistics for Lower Feedback Budget. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Efficacy of Interactive Particle mechanism on Mitigating Weight Degeneracy on FKC [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Impact of Interactive Multi-Particle Drift Correction Mechanism on Mitigating Weight Degeneracy. In this section, we present a deeper qualitative investigation into how the proposed multi-particle interaction-driven drift correction mechanism combats weight degeneracy. To achieve this, we visualize the denoised samples generated via flow-map for each particle in the batch and compute the Effective Sample … view at source ↗
Figure 16
Figure 16. Figure 16: Search Visualizations. 13 Details about the Evaluation Dataset for Alignment Tasks We provide the evaluation prompts for both the compositional and quantity-aware alignment tasks in separate Excel files, available via the GitHub link. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Additional Alignment Visualization. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Additional Alignment Visualization. An ice castle standing proudly in the midst of a blizzard. A lantern casting dim light in a haunted forest. IMPFM A small cat waves a wand. A phoenix soaring above a city, aglow with golden flames. IMPFM [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Additional Alignment Visualization. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
read the original abstract

While generative models have enabled training-free reward alignment, current methods typically excel in local exploration within narrow regions of the underlying distribution. These approaches struggle when preferences are unknown a priori and only revealed through sequential feedback-a scenario demanding broad exploration to uncover high-utility regions. To address this, we propose Sequentially-Controlled Interactive Multi-Particle Flow-Maps (IMPFM), a framework for sample-efficient online feedback-driven search. IMPFM progressively transports a group of interactive particles toward the target distribution, maintaining the broad coverage essential for heterogeneous preference alignment. IMPFM introduces a principled and efficient posterior sample sharing mechanism across particles powered by flow maps. By correcting individual particle drift with the collective posterior samples of the entire ensemble at each resampling step, the framework maximizes sample utility to enable global exploration while actively mitigating reward over-optimization, typical of standard control frameworks. Paired with a principled exploration-exploitation reweighting mechanism involving multi-particle interaction, this sequentially corrected multi-particle dynamics explicitly preserves structural diversity and overcomes the weight degeneracy inherent to standard SMC samplers. Crucially, we prove that the resulting sampling framework yields a multi-particle interaction-aware Feynman-Kac corrector that progressively steers the multi-particle system toward a KL-tilted target distribution, facilitating global exploration and preventing mode collapse. Extensive empirical evaluations and rigorous ablations across diverse search and alignment tasks confirm the efficacy of IMPFM over existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Sequentially-Controlled Interactive Multi-Particle Flow-Maps (IMPFM) for sample-efficient online feedback-driven search. It introduces a flow-map-based mechanism for sharing posterior samples across an ensemble of particles, pairs this with a multi-particle interaction reweighting scheme for exploration-exploitation, and claims to prove that the resulting dynamics constitute a multi-particle interaction-aware Feynman-Kac corrector that steers the system toward a KL-tilted target distribution while preserving diversity. Extensive empirical evaluations and ablations on search and alignment tasks are reported.

Significance. If the claimed proof is rigorous and the empirical advantages are reproducible, the framework could meaningfully extend training-free reward alignment methods by enabling global exploration under sequential feedback, addressing mode collapse and over-optimization issues common in standard control and SMC approaches.

major comments (2)
  1. [Abstract] Abstract: The central claim that the framework 'yields a multi-particle interaction-aware Feynman-Kac corrector' that 'progressively steers the multi-particle system toward a KL-tilted target distribution' is load-bearing. The description supplies no derivation establishing that the flow-map-mediated collective corrections remain a valid Radon-Nikodym factor with respect to the tilted target or that the interaction term preserves the martingale property when sample sharing introduces dependence across particles.
  2. [Abstract] Abstract (proof claim): The proof is asserted to rest on the flow maps delivering unbiased collective posterior corrections at each resampling step. No indication is given of the measure-theoretic conditions under which this holds when the flow maps are learned or approximate, leaving open whether the claimed correctness property can fail due to approximation error or reweighting dependence.
minor comments (1)
  1. The abstract refers to 'rigorous ablations across diverse search and alignment tasks' without specifying the tasks, metrics, or baseline implementations, which hinders assessment of the empirical support for the central claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the need for greater clarity on the central theoretical claims in the abstract. We will revise the manuscript to strengthen the presentation of the proof, including a concise outline in the abstract and expanded discussion of the measure-theoretic conditions and approximation robustness in the main text.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the framework 'yields a multi-particle interaction-aware Feynman-Kac corrector' that 'progressively steers the multi-particle system toward a KL-tilted target distribution' is load-bearing. The description supplies no derivation establishing that the flow-map-mediated collective corrections remain a valid Radon-Nikodym factor with respect to the tilted target or that the interaction term preserves the martingale property when sample sharing introduces dependence across particles.

    Authors: The full derivation is presented in Section 3 (Theoretical Analysis), where we establish that the flow-map-mediated collective corrections yield a valid Radon-Nikodym derivative with respect to the KL-tilted target and that the interaction reweighting preserves the martingale property by explicitly accounting for cross-particle dependence through the ensemble posterior. We agree the abstract is too terse on this point and will revise it to include a high-level sketch of the key steps (unbiased collective correction at resampling, followed by interaction-aware reweighting that maintains the Feynman-Kac structure). revision: yes

  2. Referee: [Abstract] Abstract (proof claim): The proof is asserted to rest on the flow maps delivering unbiased collective posterior corrections at each resampling step. No indication is given of the measure-theoretic conditions under which this holds when the flow maps are learned or approximate, leaving open whether the claimed correctness property can fail due to approximation error or reweighting dependence.

    Authors: Section 3.2 specifies the measure-theoretic conditions (absolute continuity of the flow-map pushforwards and bounded Radon-Nikodym derivatives) under which the unbiasedness holds for exact flow maps. For learned/approximate maps we provide an error propagation bound (Proposition 3) showing that the total variation distance to the target remains controlled under standard Lipschitz assumptions on the learned maps. We will add an explicit statement of these conditions to the abstract and expand the discussion of approximation robustness in Section 3.3. revision: yes

Circularity Check

0 steps flagged

No circularity: claimed proof of multi-particle Feynman-Kac corrector presented as independent derivation

full rationale

The abstract asserts a proof that IMPFM yields a multi-particle interaction-aware Feynman-Kac corrector steering toward a KL-tilted target, but the provided text contains no equations or definitions showing that the target distribution, corrector, or flow-map corrections are defined in terms of themselves or reduce by construction to the input preference model. No self-citation chain, fitted parameter renamed as prediction, or ansatz smuggling is exhibited. The derivation chain is therefore treated as self-contained pending the full equations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, background axioms, or new postulated entities are detailed beyond the proposed method itself.

pith-pipeline@v0.9.1-grok · 5802 in / 1100 out tokens · 44502 ms · 2026-07-02T15:27:50.176116+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 18 canonical work pages · 5 internal anchors

  1. [1]

    Feed- back efficient online fine-tuning of diffusion models.arXiv preprint arXiv:2402.16359, 2024

    Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, and Tommaso Biancalani. Feedback efficient online fine-tuning of diffusion models.arXiv preprint arXiv:2402.16359, 2024

  2. [2]

    arXiv preprint arXiv:2501.06848 , year=

    Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

  3. [3]

    Test-time alignment of diffusion models without reward over-optimization.arXiv preprint arXiv:2501.05803, 2025

    Sunwoo Kim, Minkyu Kim, and Dongmin Park. Test-time alignment of diffusion models without reward over-optimization.arXiv preprint arXiv:2501.05803, 2025

  4. [4]

    Debiasing guidance for discrete diffusion with sequential monte carlo.arXiv preprint arXiv:2502.06079, 2025

    Cheuk Kit Lee, Paul Jeha, Jes Frellsen, Pietro Lio, Michael Samuel Albergo, and Francisco Vargas. Debiasing guidance for discrete diffusion with sequential monte carlo.arXiv preprint arXiv:2502.06079, 2025

  5. [5]

    Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

    Yingqing Guo, Yukang Yang, Hui Yuan, and Mengdi Wang. Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

  6. [6]

    Diffusion tree sampling: Scalable inference-time alignment of diffusion models.arXiv preprint arXiv:2506.20701, 2025

    Vineet Jain, Kusha Sareen, Mohammad Pedramfar, and Siamak Ravanbakhsh. Diffusion tree sampling: Scalable inference-time alignment of diffusion models.arXiv preprint arXiv:2506.20701, 2025

  7. [7]

    Feynman- kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025

    Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alán Aspuru- Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, and Kirill Neklyudov. Feynman- kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025

  8. [8]

    Deft: Efficient fine-tuning of diffusion models by learning the generalisedh-transform.Advances in Neural Information Processing Systems, 37:19636–19682, 2024

    Alexander Denker, Francisco Vargas, Shreyas Padhy, Kieran Didi, Simon Mathis, Vincent Dutordoir, Riccardo Barbano, Emile Mathieu, Urszula J Komorowska, and Pietro Lio. Deft: Efficient fine-tuning of diffusion models by learning the generalisedh-transform.Advances in Neural Information Processing Systems, 37:19636–19682, 2024

  9. [9]

    A stochastic control approach to reciprocal diffusion processes.Applied mathematics and Optimization, 23(1):313–329, 1991

    Paolo Dai Pra. A stochastic control approach to reciprocal diffusion processes.Applied mathematics and Optimization, 23(1):313–329, 1991

  10. [10]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  11. [11]

    Glass flows: Transition sampling for alignment of flow and diffusion models.arXiv preprint arXiv:2509.25170, 2025

    Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky TQ Chen, Yaron Lipman, and Brian Karrer. Glass flows: Transition sampling for alignment of flow and diffusion models.arXiv preprint arXiv:2509.25170, 2025

  12. [12]

    How to build a consistency model: Learning flow maps via self-distillation.arXiv preprint arXiv:2505.18825, 2025

    Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation.arXiv preprint arXiv:2505.18825, 2025

  13. [13]

    Meta Flow Maps enable scalable reward alignment

    Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S Albergo, and Yee Whye Teh. Meta flow maps enable scalable reward alignment.arXiv preprint arXiv:2601.14430, 2026

  14. [14]

    Tilt matching for scalable sampling and fine-tuning.arXiv preprint arXiv:2512.21829, 2025

    Peter Potaptchik, Cheuk-Kit Lee, and Michael S Albergo. Tilt matching for scalable sampling and fine-tuning.arXiv preprint arXiv:2512.21829, 2025

  15. [15]

    Evaluating text-to-visual generation with image-to-text generation

    Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, and Deva Ramanan. Evaluating text-to-visual generation with image-to-text generation. In European Conference on Computer Vision, pages 366–384. Springer, 2024

  16. [16]

    Instructblip: Towards general-purpose vision-language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023

    Wenliang Dai, Junnan Li, Dongxu Li, Anthony Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale N Fung, and Steven Hoi. Instructblip: Towards general-purpose vision-language models with instruction tuning.Advances in neural information processing systems, 36:49250–49267, 2023. 12

  17. [17]

    Imagereward: Learning and evaluating human preferences for text-to-image generation

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36:15903–15935, 2023

  18. [18]

    Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

    Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

  19. [19]

    Genai arena: An open evaluation platform for generative models.Advances in Neural Information Processing Systems, 37:79889–79908, 2024

    Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, and Wenhu Chen. Genai arena: An open evaluation platform for generative models.Advances in Neural Information Processing Systems, 37:79889–79908, 2024

  20. [20]

    Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhenguo Li, and Xihui Liu. T2i- compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3563–3579, 2025

  21. [21]

    Grounding dino: Marrying dino with grounded pre-training for open-set object detection

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEuropean conference on computer vision, pages 38–55. Springer, 2024

  22. [22]

    Segment anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023

  23. [23]

    Sana-sprint: One-step diffusion with continuous-time consistency distillation

    Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16185–16195, 2025

  24. [24]

    Sequential monte carlo samplers.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006

    Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential monte carlo samplers.Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006

  25. [25]

    Particle denoising diffusion sampler.arXiv preprint arXiv:2402.06320, 2024

    Angus Phillips, Hai-Dang Dau, Michael John Hutchinson, Valentin De Bortoli, George Deligiannidis, and Arnaud Doucet. Particle denoising diffusion sampler.arXiv preprint arXiv:2402.06320, 2024

  26. [26]

    arXiv preprint arXiv:2308.07983 , year=

    Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems.arXiv preprint arXiv:2308.07983, 2023

  27. [27]

    Diffusion posterior sampling for linear inverse problem solving: A filtering perspective

    Zehao Dou and Yang Song. Diffusion posterior sampling for linear inverse problem solving: A filtering perspective. InThe Twelfth International Conference on Learning Representations, 2024

  28. [28]

    arXiv preprint arXiv:2206.04119 , year=

    Brian L Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, and Tommi Jaakkola. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem.arXiv preprint arXiv:2206.04119, 2022

  29. [29]

    Practi- cal and asymptotically exact conditional sampling in diffusion models.Advances in Neural Information Processing Systems, 36:31372–31403, 2023

    Luhuan Wu, Brian Trippe, Christian Naesseth, David Blei, and John P Cunningham. Practi- cal and asymptotically exact conditional sampling in diffusion models.Advances in Neural Information Processing Systems, 36:31372–31403, 2023

  30. [30]

    Learning to summarize with human feedback

    Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea V oss, Alec Radford, Dario Amodei, and Paul F Christiano. Learning to summarize with human feedback. Advances in neural information processing systems, 33:3008–3021, 2020

  31. [31]

    Theoretical guarantees on the best-of-n alignment policy

    Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alex D’Amour, Jacob Eisenstein, Chirag Nagpal, and Ananda Theertha Suresh. Theoretical guarantees on the best-of-n alignment policy. 2024.URL https://api. semanticscholar. org/CorpusID, 266741736(3), 1879. 13

  32. [32]

    WebGPT: Browser-assisted question-answering with human feedback

    Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christo- pher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. Webgpt: Browser-assisted question-answering with human feedback, 2022.URL https://arxiv. org/abs/2112.09332, 35, 2022

  33. [33]

    Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

  34. [34]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

  35. [35]

    RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

    Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, and Tong Zhang. Raft: Reward ranked finetuning for generative foundation model alignment.arXiv preprint arXiv:2304.06767, 2023

  36. [36]

    A Red Sports Car

    Qiang Liu, Jason Lee, and Michael Jordan. A kernelized stein discrepancy for goodness-of-fit tests. InInternational conference on machine learning, pages 276–284. PMLR, 2016. 14 Sequentially-Controlled Interactive Multi-Particle Flow-Maps for Online Feedback-Driven Search (Appendix) Appendix Table of Contents Appendix: Theoretical Derivations Page 1 Deriv...