Guidance for twisted particle filter: a continuous-time perspective
Pith reviewed 2026-05-23 21:04 UTC · model grok-4.3
The pith
A neural network trained to minimize KL divergence between path measures guides the Twisted-Path Particle Filter in continuous time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Twisted-Path Particle Filter parameterizes its twisting function by a neural network and trains the network parameters to minimize a specific KL-divergence between path measures; the design is guided by existing control-based importance sampling algorithms in the continuous-time setting, and experiments indicate that the trained filter produces lower-variance Monte Carlo approximations than the untwisted particle filter.
What carries the argument
The neural-network-parameterized twisting function trained by minimizing KL divergence between path measures.
If this is right
- Lower variance Monte Carlo estimates of normalizing constants become available for continuous-time models.
- The same training procedure can be applied to other path-space importance samplers that admit a twisting function.
- The continuous-time perspective supplies a principled objective for choosing the twisting function in discrete-time twisted particle filters.
- High-dimensional filtering problems can be addressed without hand-crafting the twisting function.
Where Pith is reading between the lines
- The method may extend to settings where the underlying process is only partially observed, provided the path-measure KL objective can still be estimated.
- Because the training objective is defined on entire paths, the approach could be combined with existing continuous-time control methods to produce hybrid samplers.
- If the KL minimization succeeds, the resulting filter may serve as a building block for more accurate sequential Monte Carlo algorithms in non-Markovian or infinite-dimensional state spaces.
Load-bearing premise
Training the neural network to minimize the chosen KL divergence between path measures produces a net reduction in the variance of the particle filter estimator.
What would settle it
A side-by-side run on the same continuous-time model in which the empirical variance of the Twisted-Path Particle Filter estimator, after training, is no smaller than that of the ordinary particle filter.
Figures
read the original abstract
The particle filter (PF), also known as sequential Monte Carlo (SMC), approximates high-dimensional probability distributions and their normalizing constants in the discrete-time setting. To reduce the variance of the Monte Carlo approximation, various twisted particle filters (TPFs) have been proposed, in which a twisting function is chosen or learned to modify the Markov transition kernel. Guided by existing control-based importance sampling algorithms in the continuous-time setting, we propose a novel algorithm called the ``Twisted-Path Particle Filter'' (TPPF), in which the twisting function is parameterized by a neural network and trained to minimize a specific KL-divergence between path measures. Numerical experiments illustrate the capability of the proposed algorithm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the Twisted-Path Particle Filter (TPPF) as an extension of twisted particle filters to the continuous-time setting. A neural network parameterizes the twisting function, which is trained by minimizing a KL divergence between path measures; the design is guided by existing control-based importance sampling methods. Numerical experiments are presented as illustrations of the algorithm's capability for Monte Carlo approximation of distributions and normalizing constants.
Significance. If the KL-trained twisting yields a net reduction in estimator variance after accounting for training cost, the continuous-time control perspective could provide a principled route to improved SMC performance on path-space problems. The explicit link to control-based IS is a constructive contribution that may aid future work on learned proposals.
major comments (2)
- [§3] §3 (algorithm derivation): the manuscript states that the chosen KL objective between path measures produces an improved twisting function, but supplies no explicit variance bound or bias-variance decomposition showing that the resulting estimator variance is strictly smaller than the untwisted PF (or existing TPF baselines) for the same number of particles; without this, the central claim that the method 'improves Monte Carlo approximation' rests on the illustrative experiments alone.
- [§5] §5 (numerical experiments): the reported runs use small state dimensions and short time horizons; no scaling study or comparison against a non-neural twisted filter (e.g., analytically chosen twisting) is given, so it remains unclear whether the NN parameterization delivers a practical advantage once training overhead is included.
minor comments (3)
- [§2] Notation for the continuous-time path measure and the twisting function should be introduced with a single consistent symbol table; currently the same symbol appears to be reused for the discrete-time and continuous-time cases.
- [§5] Figure captions should explicitly state the number of particles, the dimension of the state, and the training budget (epochs / samples) so that the plots can be reproduced without consulting the main text.
- The reference list omits several standard works on continuous-time SMC and on control-based importance sampling (e.g., the original papers on the continuous-time Feynman-Kac framework); adding them would clarify the precise novelty of the KL choice.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and constructive comments. We address each major comment below, providing clarifications on the theoretical motivation and the scope of the numerical experiments.
read point-by-point responses
-
Referee: [§3] §3 (algorithm derivation): the manuscript states that the chosen KL objective between path measures produces an improved twisting function, but supplies no explicit variance bound or bias-variance decomposition showing that the resulting estimator variance is strictly smaller than the untwisted PF (or existing TPF baselines) for the same number of particles; without this, the central claim that the method 'improves Monte Carlo approximation' rests on the illustrative experiments alone.
Authors: We agree that the manuscript does not derive an explicit finite-particle variance bound. The KL objective is selected because it arises directly from the continuous-time control formulation of importance sampling, where the optimal twisting function minimizes a path-space cost that is known to yield the zero-variance estimator in the limit; this connection is the central guidance provided by the continuous-time perspective. A rigorous bias-variance decomposition for the resulting particle estimator is technically involved and lies beyond the scope of the present work, which focuses on algorithm derivation and the control-theoretic link. We will revise §3 to make this motivation and limitation explicit, while retaining the claim of improvement on the basis of the principled objective and supporting experiments. revision: partial
-
Referee: [§5] §5 (numerical experiments): the reported runs use small state dimensions and short time horizons; no scaling study or comparison against a non-neural twisted filter (e.g., analytically chosen twisting) is given, so it remains unclear whether the NN parameterization delivers a practical advantage once training overhead is included.
Authors: The experiments are explicitly described in the abstract and §5 as illustrations of the algorithm's capability rather than a comprehensive benchmark. The neural-network parameterization is intended for regimes in which closed-form twisting functions are unavailable; direct comparison to an analytic baseline is therefore not always feasible and would not demonstrate the method's intended use case. Training cost is acknowledged as part of the procedure, but the paper does not assert net computational superiority. We therefore do not plan revisions to §5, as expanding the experiments would shift the manuscript away from its stated focus on the continuous-time derivation. revision: no
Circularity Check
No significant circularity
full rationale
The derivation introduces a neural-network-parameterized twisting function trained on an external KL-divergence between path measures, guided by prior control-based importance sampling results. This objective is independent of the final particle filter estimator and does not reduce to it by construction; the claimed variance reduction is presented as an empirical consequence rather than a definitional identity. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the abstract or described chain. The numerical experiments are explicitly illustrative, leaving the central proposal self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a novel algorithm called the Twisted-Path Particle Filter (TPPF), in which the twisting function is parameterized by a neural network and trained to minimize a specific KL-divergence between path measures.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the discrete-time model converges to a continuous-time limit, which can be solved through a series of well-studied control-based importance sampling algorithms.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space
Proposes Latent Interacting Particle Systems with an efficient parameterization of twist potentials to enable approximate posterior inference for coupled continuous-time hidden Markov models via twisted sequential Mon...
Reference graph
Works this paper leans on
-
[1]
Zero-variance importance sampling estimators for Markov process expectations
Hernan P Awad, Peter W Glynn, and Reuven Y Rubinstein. Zero-variance importance sampling estimators for Markov process expectations. Mathematics of Operations Re- search, 38(2):358–388, 2013
work page 2013
-
[2]
An intuitive proof of the data processing inequality
Normand J Beaudry and Renato Renner. An intuitive proof of the data processing inequality. arXiv preprint arXiv:1107.0740 , 2011
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[3]
Monte Carlo twisting for particle filters
Joshua J Bon, Christopher Drovandi, and Anthony Lee. Monte Carlo twisting for particle filters. arXiv preprint arXiv:2208.04288 , 2022
-
[4]
A variational representation for certain functionals of Brownian motion
Michelle Bou´ e and Paul Dupuis. A variational representation for certain functionals of Brownian motion. The Annals of Probability , 26(4):1641–1659, 1998
work page 1998
-
[5]
Optimized auxiliary particle filters: adapting mix- ture proposals via convex optimization
Nicola Branchini and V´ ıctor Elvira. Optimized auxiliary particle filters: adapting mix- ture proposals via convex optimization. In Uncertainty in Artificial Intelligence , pages 1289–1299. PMLR, 2021
work page 2021
-
[6]
A sequential particle filter method for static models
Nicolas Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539–552, 2002. 32
work page 2002
-
[7]
Approximation by superpositions of a sigmoidal function
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathemat- ics of control, signals and systems , 2(4):303–314, 1989
work page 1989
-
[8]
Theoretical guarantees for approximate sampling from smooth and log-concave densities
Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017
work page 2017
- [9]
-
[10]
Sequential Monte Carlo samplers
Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006
work page 2006
-
[11]
On adaptive resampling strategies for sequential Monte Carlo methods
Pierrre Del Moral, Arnaud Doucet, and Ajay Jasra. On adaptive resampling strategies for sequential Monte Carlo methods. Bernoulli, 18(1):252–278, 2012
work page 2012
-
[12]
Jean-Dominique Deuschel and Daniel W Stroock. Large deviations, volume 342. Amer- ican Mathematical Soc., 2001
work page 2001
-
[13]
Petar M Djuric, Jayesh H Kotecha, Jianqui Zhang, Yufei Huang, Tadesse Ghirmai, M´ onica F Bugallo, and Joaquin Miguez. Particle filtering. IEEE signal processing magazine, 20(5):19–38, 2003
work page 2003
-
[14]
An introduction to sequential Monte Carlo methods
Arnaud Doucet, Nando De Freitas, and Neil Gordon. An introduction to sequential Monte Carlo methods. Sequential Monte Carlo methods in practice , pages 3–14, 2001
work page 2001
-
[15]
On sequential Monte Carlo sampling methods for Bayesian filtering
Arnaud Doucet, Simon Godsill, and Christophe Andrieu. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and computing , 10:197–208, 2000
work page 2000
-
[16]
A tutorial on particle filtering and smoothing: Fifteen years later
Arnaud Doucet, Adam M Johansen, et al. A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of nonlinear filtering, 12(656-704):3, 2009
work page 2009
-
[17]
Temporal difference learning in continuous time and space
Kenji Doya. Temporal difference learning in continuous time and space. Advances in neural information processing systems, 8, 1995
work page 1995
-
[18]
Time series analysis by state space methods , volume 38
James Durbin and Siem Jan Koopman. Time series analysis by state space methods , volume 38. OUP Oxford, 2012
work page 2012
-
[19]
Stochastic calculus: a practical introduction
Richard Durrett. Stochastic calculus: a practical introduction . CRC press, 2018
work page 2018
-
[20]
Marco Fuhrman, Federica Masiero, and Gianmario Tessitore. Stochastic equations with delay: Optimal control via BSDEs and regular solutions of Hamilton-Jacobi-Bellman equations. SIAM Journal on Control and Optimization , 48(7):4624–4651, 2010
work page 2010
-
[21]
Igor Vladimirovich Girsanov. On transforming a certain class of stochastic processes by absolutely continuous substitution of measures. Theory of Probability & Its Appli- cations, 5(3):285–301, 1960
work page 1960
-
[22]
Monte Carlo methods in financial engineering , volume 53
Paul Glasserman. Monte Carlo methods in financial engineering , volume 53. Springer, 2004
work page 2004
-
[23]
Importance sampling for portfolio credit risk
Paul Glasserman and Jingyi Li. Importance sampling for portfolio credit risk. Man- agement science, 51(11):1643–1656, 2005
work page 2005
-
[24]
Importance sampling for stochastic simulations
Peter W Glynn and Donald L Iglehart. Importance sampling for stochastic simulations. Management science, 35(11):1367–1392, 1989
work page 1989
-
[25]
Novel approach to nonlinear/non-Gaussian Bayesian state estimation
Neil J Gordon, David J Salmond, and Adrian FM Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. In IEE proceedings F (radar and signal processing), volume 140, pages 107–113. IET, 1993. 33
work page 1993
-
[26]
The iterated auxiliary particle filter
Pieralberto Guarniero, Adam M Johansen, and Anthony Lee. The iterated auxiliary particle filter. Journal of the American Statistical Association , 112(520):1636–1647, 2017
work page 2017
-
[27]
Reinforcement learning with deep energy-based policies
Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. Reinforcement learning with deep energy-based policies. In International conference on machine learn- ing, pages 1352–1361. PMLR, 2017
work page 2017
-
[28]
Solving high-dimensional partial differen- tial equations using deep learning
Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning. Proceedings of the National Academy of Sciences , 115(34):8505–8510, 2018
work page 2018
-
[29]
Nonasymptotic bounds for suboptimal im- portance sampling
Carsten Hartmann and Lorenz Richter. Nonasymptotic bounds for suboptimal im- portance sampling. SIAM/ASA Journal on Uncertainty Quantification , 12(2):309–346, 2024
work page 2024
-
[30]
Variational characterization of free energy: Theory and algorithms
Carsten Hartmann, Lorenz Richter, Christof Sch¨ utte, and Wei Zhang. Variational characterization of free energy: Theory and algorithms. Entropy, 19(11):626, 2017
work page 2017
-
[31]
Efficient rare event simulation by optimal nonequilibrium forcing
Carsten Hartmann and Christof Sch¨ utte. Efficient rare event simulation by optimal nonequilibrium forcing. Journal of Statistical Mechanics: Theory and Experiment , 2012(11):P11004, 2012
work page 2012
-
[32]
Model reduction algorithms for optimal control and importance sampling of diffusions
Carsten Hartmann, Christof Sch¨ utte, and Wei Zhang. Model reduction algorithms for optimal control and importance sampling of diffusions. Nonlinearity, 29(8):2298, 2016
work page 2016
-
[33]
Controlled sequential Monte Carlo
Jeremy Heng, Adrian N Bishop, George Deligiannidis, and Arnaud Doucet. Controlled sequential Monte Carlo. The Annals of Statistics , 48(5):2904–2929, 2020
work page 2020
-
[34]
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[35]
A new approach to linear filtering and prediction problems
Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. 1960
work page 1960
-
[36]
Adaptive importance sampling with forward-backward stochastic differential equations
Omar Kebiri, Lara Neureither, and Carsten Hartmann. Adaptive importance sampling with forward-backward stochastic differential equations. In Stochastic Dynamics Out of Equilibrium: Institut Henri Poincar´ e, Paris, France, 2017, pages 265–281. Springer, 2019
work page 2017
-
[37]
Monte Carlo filter and smoother for non-Gaussian nonlinear state space models
Genshiro Kitagawa. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of computational and graphical statistics , 5(1):1–25, 1996
work page 1996
-
[38]
Bayesian estimates of equation system param- eters: an application of integration by Monte Carlo
Teun Kloek and Herman K Van Dijk. Bayesian estimates of equation system param- eters: an application of integration by Monte Carlo. Econometrica: Journal of the Econometric Society, pages 1–19, 1978
work page 1978
-
[39]
Sequential imputations and Bayesian missing data problems
Augustine Kong, Jun S Liu, and Wing Hung Wong. Sequential imputations and Bayesian missing data problems. Journal of the American statistical association , 89(425):278–288, 1994
work page 1994
-
[40]
Sixo: Smoothing inference with twisted objectives
Dieterich Lawson, Allan Ravent´ os, Andrew Warrington, and Scott Linderman. Sixo: Smoothing inference with twisted objectives. Advances in Neural Information Process- ing Systems, 35:38844–38858, 2022
work page 2022
-
[41]
Twisted variational sequential Monte Carlo
Dieterich Lawson, George Tucker, Christian A Naesseth, Chris Maddison, Ryan P Adams, and Yee Whye Teh. Twisted variational sequential Monte Carlo. In Third workshop on Bayesian Deep Learning (NeurIPS) , 2018
work page 2018
-
[42]
Propagation of chaos in path spaces via information theory
Lei Li, Yuelin Wang, and Yuliang Wang. Propagation of chaos in path spaces via information theory. arXiv preprint arXiv:2312.00339 , 2023. 34
-
[43]
Lei Li and Yuliang Wang. A sharp uniform-in-time error estimate for Stochastic Gra- dient Langevin Dynamics. arXiv preprint arXiv:2207.09304 , 2022
-
[44]
Han Cheng Lie. On a strongly convex approximation of a stochastic optimal control problem for importance sampling of metastable diffusions . PhD thesis, 2016
work page 2016
-
[45]
Blind deconvolution via sequential imputations
Jun S Liu and Rong Chen. Blind deconvolution via sequential imputations. Journal of the american statistical association , 90(430):567–576, 1995
work page 1995
-
[46]
Sequential Monte Carlo methods for dynamic systems
Jun S Liu and Rong Chen. Sequential Monte Carlo methods for dynamic systems. Journal of the American statistical association , 93(443):1032–1044, 1998
work page 1998
-
[47]
Predictability: A problem partly solved
Edward N Lorenz. Predictability: A problem partly solved. In Proc. Seminar on predictability, volume 1. Reading, 1996
work page 1996
-
[48]
Im- proved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity
Wenlong Mou, Nicolas Flammarion, Martin J Wainwright, and Peter L Bartlett. Im- proved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity. Bernoulli, 28(3):1577–1601, 2022
work page 2022
-
[49]
On Bellman equations for continuous-time policy eval- uation i: discretization and approximation
Wenlong Mou and Yuhua Zhu. On Bellman equations for continuous-time policy eval- uation i: discretization and approximation. arXiv preprint arXiv:2407.05966 , 2024
-
[50]
Lawrence M Murray, Sumeetpal S Singh, and Anthony Lee. Anytime Monte Carlo. Data-Centric Engineering, 2:e7, 2021
work page 2021
-
[51]
On the optimal and suboptimal nonlinear fil- tering problem for discrete-time systems
M Netto, L Gimeno, and M Mendes. On the optimal and suboptimal nonlinear fil- tering problem for discrete-time systems. IEEE Transactions on Automatic Control , 23(6):1062–1067, 1978
work page 1978
-
[52]
Filtering via simulation: Auxiliary particle filters
Michael K Pitt and Neil Shephard. Filtering via simulation: Auxiliary particle filters. Journal of the American statistical association , 94(446):590–599, 1999
work page 1999
-
[53]
Enric Ribera Borrell, Jannes Quer, Lorenz Richter, and Christof Sch¨ utte. Improv- ing control based importance sampling strategies for metastable diffusions via adapted metadynamics. SIAM Journal on Scientific Computing , 46(2):S298–S323, 2024
work page 2024
-
[54]
Lorenz Richter. Solving high-dimensional PDEs, approximation of path space measures and importance sampling of diffusions . PhD thesis, BTU Cottbus-Senftenberg, 2021
work page 2021
-
[55]
Bayesian filtering and smoothing , volume 17
Simo S¨ arkk¨ a and Lennart Svensson. Bayesian filtering and smoothing , volume 17. Cambridge university press, 2023
work page 2023
-
[56]
Equivalence Between Policy Gradients and Soft Q-Learning
John Schulman, Xi Chen, and Pieter Abbeel. Equivalence between policy gradients and soft q-learning. arXiv preprint arXiv:1704.06440 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[57]
Learning to summarize with human feedback
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. Learning to summarize with human feedback. Advances in Neural Information Processing Systems , 33:3008–3021, 2020
work page 2020
-
[58]
Learning to predict by the methods of temporal differences
Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3:9–44, 1988
work page 1988
-
[59]
Policy gradient methods for reinforcement learning with function approximation
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999
work page 1999
-
[60]
An introduction to optimal control of FBSDE with incomplete information
Guangchen Wang, Zhen Wu, Jie Xiong, et al. An introduction to optimal control of FBSDE with incomplete information . Springer, 2018
work page 2018
-
[61]
Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020. 35
work page 2020
-
[62]
Mixture models, Monte Carlo, Bayesian updating, and dynamic models
Mike West. Mixture models, Monte Carlo, Bayesian updating, and dynamic models. Computing Science and Statistics , pages 325–325, 1993
work page 1993
-
[63]
Nick Whiteley and Anthony Lee. Twisted particle filters. The Annals of Statistics , 42(1):115–141, 2014
work page 2014
-
[64]
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992
work page 1992
-
[65]
FUDGE: Controlled text generation with future discrim- inators
Kevin Yang and Dan Klein. FUDGE: Controlled text generation with future discrim- inators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 3511–3535, 2021
work page 2021
-
[66]
Ap- plications of the cross-entropy method to importance sampling and optimal control of diffusions
Wei Zhang, Han Wang, Carsten Hartmann, Marcus Weber, and Christof Sch¨ utte. Ap- plications of the cross-entropy method to importance sampling and optimal control of diffusions. SIAM Journal on Scientific Computing , 36(6):A2654–A2672, 2014
work page 2014
-
[67]
Stephen Zhao, Rob Brekelmans, Alireza Makhzani, and Roger Grosse. Probabilis- tic inference in language models via twisted sequential Monte Carlo. arXiv preprint arXiv:2404.17546, 2024
-
[68]
Mo Zhou and Jianfeng Lu. Solving time-continuous stochastic optimal control prob- lems: Algorithm design and convergence analysis of actor-critic flow. arXiv preprint arXiv:2402.17208, 2024
-
[69]
Fine-Tuning Language Models from Human Preferences
Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from hu- man preferences. arXiv preprint arXiv:1909.08593 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[70]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023. 36
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.