arxiv: 2605.02462 · v1 · submitted 2026-05-04 · 📊 stat.ML · cs.LG

Recognition: 3 theorem links

· Lean Theorem

Black-box optimization of noisy functions with unknown smoothness

Jean-bastien Grill, Michal Valko, R\'emi Munos

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords black-box optimizationnoisy optimizationunknown smoothnessadaptive algorithmsoptimistic optimizationfinite-time analysisglobal optimizationparallel algorithms

0 comments

The pith

POO optimizes noisy black-box functions with unknown smoothness, achieving error at most a sqrt(ln n) factor worse than the best smoothness-aware algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an adaptive algorithm called POO for black-box optimization of noisy functions that are locally smooth near a global optimum, without knowing the smoothness level in advance. POO runs multiple optimistic optimization instances in parallel to adapt to the unknown smoothness and delivers finite-time performance bounds. A sympathetic reader would care because many practical optimization tasks involve noisy evaluations where smoothness cannot be estimated reliably beforehand, rendering non-adaptive methods inefficient. The algorithm also applies to a strictly larger class of functions than earlier approaches considered, including those that are harder to optimize in a precise sense.

Core claim

POO performs almost as well as the best known algorithms requiring the knowledge of the smoothness. Furthermore, POO works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a very precise sense. We provide a finite-time analysis of POO's performance, which shows that its error after n evaluations is at most a factor of sqrt(ln n) away from the error of the best known optimization algorithms using the knowledge of the smoothness.

What carries the argument

The POO (parallel optimistic optimization) algorithm, which adapts to unknown smoothness by executing multiple parallel instances of optimistic optimization.

If this is right

After any number n of evaluations the algorithm's error remains within a sqrt(ln n) multiplicative factor of the best possible error achievable when smoothness is known.
The method applies to a strictly larger class of functions than prior analyses covered, including those previously viewed as difficult.
Finite-time regret bounds hold without requiring the user to supply or guess a smoothness parameter.
The parallel structure allows simultaneous exploration at different smoothness scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In practice this removes the need to run separate smoothness-estimation experiments before optimization begins.
The sqrt(ln n) overhead may represent a fundamental price of adaptation that future algorithms would have to match or beat.
The same parallel-optimistic template could be tested on non-stationary noise models or on functions with multiple local optima of differing smoothness.

Load-bearing premise

The function is locally smooth around at least one global optimum, even though the smoothness parameter remains unknown to the algorithm.

What would settle it

On a concrete test function satisfying the local smoothness condition, observing that POO's error after n evaluations exceeds the known-smoothness optimum by more than a sqrt(ln n) factor would disprove the performance claim.

Figures

Figures reproduced from arXiv: 2605.02462 by Jean-bastien Grill, Michal Valko, R\'emi Munos.

**Figure 1.** Figure 1: Difficult function f : x → s (log2 |x − 0.5|) · ( p |x − 0.5| − (x − 0.5)2 ) − p |x − 0.5| where, s(x) = 1 if the fractional part of x, that is, x − ⌊x⌋, is in [0, 0.5] and s(x) = 0, if it is in (0.5, 1). Left: Oscillation between two envelopes of different smoothness leading to a nonzero d for a standard partitioning. Right: Simple regret of HOO after 5000 evaluations for different values of ρ. StoSOO [15… view at source ↗

**Figure 2.** Figure 2: Simple regret of POO and HOO run for different values of ρ. 4 Experiments We ran experiments on the function plotted in view at source ↗

read the original abstract

We study the problem of black-box optimization of a function f of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorithm, POO or parallel optimistic optimization, that is able to deal with this setting. POO performs almost as well as the best known algorithms requiring the knowledge of the smoothness. Furthermore, POO works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a very precise sense. We provide a finite-time analysis of POO's performance, which shows that its error after n evaluations is at most a factor of sqrt(ln n) away from the error of the best known optimization algorithms using the knowledge of the smoothness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

POO adapts to unknown smoothness in noisy black-box optimization with a sqrt(ln n) penalty while handling a wider function class under local smoothness.

read the letter

The main point is that POO runs multiple optimistic optimization instances in parallel across guessed smoothness levels and ends up within a sqrt(ln n) factor of the best known-smoothness algorithms after n noisy evaluations. It also applies to a strictly larger class of functions than prior work, needing only local smoothness around one global optimum rather than stronger global conditions. The finite-time analysis is the concrete new piece. The paper does well by making the adaptation explicit and showing that the extra log factor is the only price paid for not knowing the parameter. The local-smoothness assumption is realistic for many practical problems, and the parallel structure keeps the method simple to implement. The soft spots are minor and expected for this style of result. The sqrt(ln n) comes from a union bound or discretization over possible smoothness values, so it is not claimed to be tight. The analysis follows standard optimistic-optimization covering arguments, which means the details need to be checked carefully but do not appear circular or dependent on fitted quantities. No load-bearing gaps show up in the stated assumptions. This is useful reading for anyone working on bandits or derivative-free optimization. A reader who cares about adaptive algorithms for unknown parameters will get a clear bound and a practical method. It deserves a serious referee because the contribution is a direct extension with an explicit guarantee that addresses a real limitation in earlier theory.

Referee Report

1 major / 2 minor

Summary. The paper introduces the POO (parallel optimistic optimization) algorithm for black-box optimization of noisy functions in any dimension. The function is assumed to be locally smooth around one of its global optima, with this smoothness parameter unknown. The central claim is a finite-time analysis showing that after n evaluations, POO's error is at most a multiplicative sqrt(ln n) factor larger than that achieved by the best known algorithms that require knowledge of the smoothness; the algorithm is also claimed to apply to a strictly larger class of functions than previously considered.

Significance. If the finite-time analysis is correct, the result is significant for the field of global optimization and noisy black-box methods. It removes the practical requirement of knowing the smoothness parameter in advance while incurring only a mild logarithmic penalty, and it broadens the applicable function class to include more difficult instances. The provision of an explicit finite-time bound (rather than asymptotic) is a strength that supports direct comparison to smoothness-aware baselines.

major comments (1)

The finite-time analysis establishing the sqrt(ln n) multiplicative factor is load-bearing for the main claim, yet the provided manuscript excerpt does not include the detailed derivation, key lemmas, or explicit handling of the noise model and local smoothness assumption; this prevents confirmation that the bound holds without hidden dependencies on the unknown smoothness or the larger function class.

minor comments (2)

The abstract states that POO 'works for a larger class of functions... in a very precise sense,' but the precise characterization of this enlarged class (relative to prior work) and the sense in which the functions are 'difficult to optimize' should be stated explicitly in the introduction or assumptions section.
Notation for the noise model, the local smoothness radius, and the definition of 'error after n evaluations' should be introduced consistently before the analysis to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and positive evaluation of the significance of our results. We are pleased that the finite-time analysis and the extension to a larger function class are recognized as strengths. Below, we address the major comment point by point.

read point-by-point responses

Referee: The finite-time analysis establishing the sqrt(ln n) multiplicative factor is load-bearing for the main claim, yet the provided manuscript excerpt does not include the detailed derivation, key lemmas, or explicit handling of the noise model and local smoothness assumption; this prevents confirmation that the bound holds without hidden dependencies on the unknown smoothness or the larger function class.

Authors: The full manuscript contains the complete finite-time analysis in Section 4, supported by detailed proofs and key lemmas in the supplementary material (Appendices A–C). These sections explicitly address the noise model, which is assumed to be sub-Gaussian with a known variance bound, and the local smoothness assumption, which is Hölder continuity with unknown exponent β around a global optimum. The analysis shows that POO adapts to the unknown smoothness without introducing hidden dependencies; the sqrt(ln n) factor arises from the doubling trick used to search over possible smoothness parameters. The larger function class is handled by relaxing the global smoothness requirement to local smoothness around one optimum only. We can include a concise proof outline in the main text for improved readability and will add explicit cross-references to the lemmas in the revised version. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents POO as an adaptive algorithm whose finite-time regret bound (within sqrt(ln n) of smoothness-aware optima) is derived directly from the optimistic optimization framework and the local smoothness assumption around a global optimum. No step reduces a claimed prediction or bound to a fitted parameter, self-defined quantity, or unverified self-citation chain; the analysis uses standard concentration and covering arguments on the stated function class without smuggling ansatzes or renaming known results as new derivations. The central guarantee remains independent of any internal fitting procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of local smoothness around a global optimum and on the technical extension to a larger function class; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The objective function is locally smooth around one of its global optima
Explicitly stated in the abstract as the modeling assumption for the optimization setting.

pith-pipeline@v0.9.0 · 5439 in / 1206 out tokens · 62094 ms · 2026-05-08T18:45:03.858444+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost / FunctionalEquation (J(x)=½(x+x⁻¹)−1) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define the near-optimality dimension d(ν,ρ) ≜ inf{d' ∈ ℝ_+ : ∃C>0, ∀h≥0, N_h(2νρ^h) ≤ Cρ^{-d'h}}
Foundation.ArithmeticFromLogic / orbit ladder embed_eq_pow (φ-ladder structure) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

POO's simple regret: E[R_n] ≤ O((ln² n / n)^{1/(2+d(ν⋆,ρ⋆))})
Constants (φ, c, ℏ, G as forced ladder constants) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 1: ∀h≥0, ∀x ∈ P_{h,i*_h}, f(x) ≥ f(x⋆) − νρ^h with ρ ∈ (0,1) unknown and searched in parallel.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references

[1]

Finite-time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002

Peter Auer, Nicol `o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002

2002
[2]

Online stochastic op- timization under correlated bandit feedback

Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. Online stochastic op- timization under correlated bandit feedback. InInternational Conference on Machine Learning (ICML), 2014

2014
[3]

Pure exploration in finitely-armed and continuous-armed bandits.Theoretical Computer Science, 412(19):1832–1852, 2011

S ´ebastien Bubeck, R ´emi Munos, and Gilles Stoltz. Pure exploration in finitely-armed and continuous-armed bandits.Theoretical Computer Science, 412(19):1832–1852, 2011

2011
[4]

S ´ebastien Bubeck, R´emi Munos, Gilles Stoltz, and Csaba Szepesv´ari.X-armed bandits.Jour- nal of Machine Learning Research, 12:1587–1627, 2011

2011
[5]

Lipschitz Bandits without the Lipschitz Constant

S ´ebastien Bubeck, Gilles Stoltz, and Jia Yuan Yu. Lipschitz Bandits without the Lipschitz Constant. InAlgorithmic Learning Theory (ALT), 2011

2011
[6]

Adam D. Bull. Adaptive-treed bandits.Bernoulli, 21(4):2289–2307, 2015

2015
[7]

Unimodal bandits: Regret lower bounds and opti- mal algorithms

Richard Combes and Alexandre Prouti `ere. Unimodal bandits: Regret lower bounds and opti- mal algorithms. InInternational Conference on Machine Learning (ICML), 2014

2014
[8]

Bandit algorithms for tree search

Pierre-Arnaud Coquelin and R ´emi Munos. Bandit algorithms for tree search. InUncertainty in Artificial Intelligence (UAI), 2007

2007
[9]

Multi-armed bandit problems in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandit problems in metric spaces. InACM Symposium on Theory of Computing (STOC), 2008

2008
[10]

Bandit-based Monte-Carlo planning

Levente Kocsis and Csaba Szepesv ´ari. Bandit-based Monte-Carlo planning. InEuropean Conference on Machine Learning (ECML), 2006

2006
[11]

Optimistic optimization of deterministic functions without the knowledge of its smoothness

R ´emi Munos. Optimistic optimization of deterministic functions without the knowledge of its smoothness. InNeural Information Processing Systems (NeurIPS), 2011

2011
[12]

From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning.Foundations and Trends in Machine Learning, 7(1):1–130, 2014

R ´emi Munos. From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning.Foundations and Trends in Machine Learning, 7(1):1–130, 2014

2014
[13]

Bandits attack function optimization

Philippe Preux, R ´emi Munos, and Michal Valko. Bandits attack function optimization. In Congress on Evolutionary Computation (CEC), 2014

2014
[14]

Multi-armed bandits on implicit metric spaces

Aleksandrs Slivkins. Multi-armed bandits on implicit metric spaces. InNeural Information Processing Systems (NeurIPS), 2011

2011
[15]

Stochastic simultaneous optimistic optimization

Michal Valko, Alexandra Carpentier, and R ´emi Munos. Stochastic simultaneous optimistic optimization. InInternational Conference on Machine Learning (ICML), 2013. 9 A Proof sketch of Theorem 1 In this part we give the roadmap of the proof. The full proof is in Appendix B. First stepFor any choice ofρ ⋆ verifying Assumption 1 and any suboptimalρsuch that ...

2013