A Two-fold Randomization Framework for Impulse Control Problems

Haoyang Cao; Yuchao Dong; Zhouhao Yang

arxiv: 2509.12018 · v7 · pith:537MKNUQnew · submitted 2025-09-15 · 🧮 math.OC

A Two-fold Randomization Framework for Impulse Control Problems

Haoyang Cao , Yuchao Dong , Zhouhao Yang This is my paper

Pith reviewed 2026-05-18 16:07 UTC · model grok-4.3

classification 🧮 math.OC

keywords impulse controlrandomizationHJB equationreinforcement learningverification theoremconvergencefixed point operatorPoisson measure

0 comments

The pith

Randomized impulse control problems converge to the classical problem as the randomization parameter vanishes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a randomization framework for impulse control problems in which the solution is the fixed point of a compound operator made from regularized nonlocal and stopping operators. This leads to a semi-linear HJB equation and allows a verification theorem using a Poisson compound measure. The authors prove existence through iteration and show that the randomized version approaches the classical impulse control problem as the parameter lambda goes to zero. This convergence, along with local regularity of the value function, supports using the framework to build reinforcement learning algorithms that can approximate the original solutions.

Core claim

By introducing a two-fold randomization scheme, the impulse control problem is reformulated as the fixed point of a compound operator consisting of a regularized nonlocal operator and a regularized stopping operator. This yields a semi-linear Hamilton-Jacobi-Bellman equation. An equivalent scheme using Poisson compound measure establishes a verification theorem for uniqueness, while an iterative approach proves existence. As the randomization parameter lambda tends to zero, the randomized problem converges to its classical counterpart, providing a robust approximation that enables offline reinforcement learning algorithms with geometric convergence.

What carries the argument

The compound operator formed by the regularized nonlocal operator and the regularized stopping operator, whose fixed point characterizes the solution to the randomized problem.

If this is right

The value function possesses local Holder continuity of order alpha in the second derivative.
The offline RL algorithm derived from the iterative proof converges geometrically to the randomized solution.
The learned randomized solution approximates the classical impulse control solution with high accuracy.
Sensitivity to volatility parameter reveals the exploration-exploitation balance in the algorithm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may extend to other types of stochastic control problems involving jumps or impulses.
Similar randomization could provide numerical methods for problems where direct classical solutions are intractable.
Combining with other RL techniques might improve scalability to high-dimensional state spaces.

Load-bearing premise

The compound operator admits a fixed point and the Poisson compound measure scheme correctly supports the verification theorem.

What would settle it

Numerical experiments in which the difference between the value function of the randomized problem and a known classical solution decreases to zero as lambda is reduced to small values.

Figures

Figures reproduced from arXiv: 2509.12018 by Haoyang Cao, Yuchao Dong, Zhouhao Yang.

**Figure 2.** Figure 2: Sensitivity analysis with respect to volatility [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

**Figure 3.** Figure 3: Sensitivity analysis with respect to volatility [PITH_FULL_IMAGE:figures/full_fig_p035_3.png] view at source ↗

read the original abstract

We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter $\pmb \lambda$ vanishes. This convergence, combined with the value function's $C^{2,\alpha}_{loc}$ regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter $\sigma$ in the state process effectively demonstrates the exploration-exploitation tradeoff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a randomization framework for impulse control that directly yields an RL algorithm with geometric convergence, though the limit uniqueness step could use tighter estimates.

read the letter

The paper's core contribution is a two-fold randomization for impulse control problems that yields both a semi-linear HJB equation and a natural offline RL algorithm derived directly from the existence iteration. They set up the randomized problem via a compound operator mixing a regularized nonlocal operator and a regularized stopping operator. This lets them write down the semi-linear HJB. They then switch to an equivalent Poisson compound measure scheme to prove a verification theorem, which gives uniqueness. Existence follows from iterating the operator. They show the value converges to the classical impulse control value as the randomization parameter lambda goes to zero, and they use the local Holder regularity of the value function to justify the limit. From the same iteration they extract a policy improvement step that gives geometric convergence for the RL algorithm. The model-free version is tested on a standard example, with some sensitivity checks on volatility. The approach looks technically coherent. The convergence argument relies on the regularity result, which is standard in these problems, so that part seems solid enough. The RL part is a nice bonus because the improvement step comes for free from the proof. One place that could use more work is the justification for passing uniqueness through the limit. The Poisson measure equivalence is used for the verification at fixed lambda, and if the measures do not converge uniformly, there might be a gap in showing the limit satisfies the original variational inequality uniquely. The paper probably addresses this with the regularity, but a referee might ask for an explicit uniform estimate or a direct argument in the limit. This paper is aimed at researchers in stochastic optimal control who are interested in bringing reinforcement learning into impulse problems. Someone already working on HJB equations for control with jumps or impulses will see the most value. It has enough new technical content and a working algorithm to warrant sending it out for peer review. I would recommend sending it to a journal in applied probability or control theory.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-fold randomization framework for general impulse control problems. The randomized problem is characterized as the fixed point of a compound operator formed by a regularized nonlocal operator and a regularized stopping operator, yielding a semi-linear HJB equation. Existence is proved via iteration, while uniqueness follows from a verification theorem obtained through an equivalent Poisson compound measure randomization scheme. The value function of the randomized problem is shown to converge to that of the classical impulse control problem as the randomization parameter λ vanishes; combined with C^{2,α}_loc regularity this is used to justify the approximation. An offline RL algorithm is derived from the iterative existence proof (with geometric convergence) and demonstrated numerically on a standard example, including sensitivity analysis with respect to volatility.

Significance. If the convergence and well-posedness results hold, the framework supplies a theoretically grounded regularization that enables model-free RL for impulse control while recovering the classical solution in the limit. The explicit geometric rate for the policy-improvement iteration and the numerical illustration of the exploration-exploitation trade-off via σ-sensitivity are concrete strengths that could support further algorithmic development in stochastic control.

major comments (2)

[Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.
[Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.

minor comments (2)

Notation for the compound operator and the two regularization parameters should be introduced with explicit definitions before the fixed-point argument is stated.
The numerical section would benefit from a table comparing the learned value function against a classical benchmark (e.g., finite-difference solution) for several λ values, rather than qualitative plots alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which help clarify key aspects of the convergence and uniqueness arguments. We appreciate the positive assessment of the framework's potential for model-free RL in impulse control. Below we respond point by point to the major comments, indicating revisions where appropriate to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.

Authors: For each fixed λ the two randomization schemes are equivalent, so the verification theorem yields uniqueness of the randomized value function directly. Convergence of value functions as λ → 0 is established separately by direct comparison and the C^{2,α}_loc regularity. To make the passage to the limit fully rigorous and address the transfer of uniqueness, we will add uniform-in-λ bounds on the Poisson intensity together with weak convergence of the compound jump measures to the admissible impulse measures (in the sense of the original control set). These estimates will be placed in Section 4 and an appendix; they confirm that any limit point satisfies the classical variational inequality and inherits uniqueness from the standard verification theorem for the unregularized problem. revision: yes
Referee: [Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.

Authors: The C^{2,α}_loc regularity is obtained from the semi-linear HJB equation satisfied by the randomized value function and holds uniformly on compact sets for λ small enough. Combined with pointwise convergence of the value functions, this already allows passage to the limit inside the equation. Nevertheless, we acknowledge that explicit uniform control on the regularized nonlocal term strengthens the argument. In the revision we will insert a lemma providing such estimates, showing that the difference between the regularized nonlocal operator and the classical impulse operator vanishes uniformly on compact sets as λ → 0. This guarantees that the limit function satisfies the classical variational inequality in the viscosity (and, under the regularity, classical) sense, thereby confirming the robust approximation property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper independently defines the randomized impulse control problem through regularization of the nonlocal and stopping operators, characterizes its solution as the fixed point of the resulting compound operator, establishes uniqueness via an equivalent Poisson compound measure scheme that yields a verification theorem, proves existence by iteration, and separately demonstrates convergence of the value function to the classical impulse control problem as the randomization parameter λ vanishes (combined with C^{2,α}_loc regularity). None of these steps reduces the central claims to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation whose content is unverified outside the present work. The framework is constructed to provide an approximation whose well-posedness and limiting behavior are established directly from the stated assumptions and iterative arguments.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard diffusion assumptions for the state process and the well-posedness of the regularized operators; the randomization parameter λ is introduced as a tunable regularizer rather than fitted to data.

free parameters (1)

randomization parameter λ
Controls the strength of regularization in the nonlocal and stopping operators; vanishes to recover the classical problem.

axioms (1)

domain assumption The state process is a diffusion satisfying standard regularity conditions that allow the HJB derivation and C^{2,α}_loc regularity.
Invoked to justify the semi-linear HJB equation and convergence result.

pith-pipeline@v0.9.0 · 5785 in / 1205 out tokens · 38053 ms · 2026-05-18T16:07:19.680707+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator... Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter λ vanishes. This convergence, combined with the value function’s C^{2,α}_loc regularity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.