Recognition: unknown
An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions
Pith reviewed 2026-05-10 02:43 UTC · model grok-4.3
The pith
The PALM-Mean algorithm delivers valid lower bounds and ε-global convergence for optimizing Gaussian process posterior mean functions via hybrid relaxations in spatial branch-and-bound.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose PALM-Mean, a piecewise-analytic lower-bounding framework embedded in reduced-space spatial branch-and-bound. At each node, kernel terms that are locally important are replaced by a sign-aware piecewise-linear relaxation in an appropriate scalar distance variable, while the remaining terms are bounded analytically in closed form. We show this hybrid approach yields a valid lower bound for the posterior mean, while limiting the size of the branch-and-bound subproblems. We establish validity of the node lower bounds and ε-global convergence of the resulting algorithm.
What carries the argument
PALM-Mean hybrid lower-bounding scheme, which partitions kernel terms into sign-aware piecewise-linear relaxations in a scalar distance variable for locally important terms and analytic bounds for the remainder.
If this is right
- The node lower bounds remain valid throughout the search tree.
- The algorithm converges to an ε-global optimum of the posterior mean function.
- The approach improves scalability with respect to the number of training data points compared to general solvers.
- Exact global optimization becomes practical for larger Gaussian process models.
Where Pith is reading between the lines
- Similar term-partitioning ideas could apply to global optimization of other additive nonconvex functions common in machine learning.
- The reduced-space formulation may help in problems with many input dimensions by focusing branching on relevant variables.
- Integration with warm-starting or local refinement steps could further improve practical performance on real applications.
Load-bearing premise
Kernel terms can be partitioned into locally important ones that admit tight sign-aware piecewise-linear relaxations in a scalar distance variable while the remainder can be bounded analytically without destroying the overall lower-bound property.
What would settle it
A case where the proposed relaxation produces a lower bound that is strictly greater than the true posterior mean value at some point in the domain.
read the original abstract
We study the deterministic global optimization of trained Gaussian process posterior mean functions over hyperrectangular domains. Although the posterior mean function has a compact closed-form representation, its global optimization is challenging because it remains nonlinear and nonconvex. Existing exact deterministic approaches become increasingly difficult to scale as the number of training data points grows, leading to approximation-based methods that improve tractability by optimizing a modified (inexact) objective. In this work, we propose PALM-Mean, a piecewise-analytic lower-bounding framework embedded in reduced-space spatial branch-and-bound. At each node, kernel terms that are locally important are replaced by a sign-aware piecewise-linear relaxation in an appropriate scalar distance variable, while the remaining terms are bounded analytically in closed form. We show this hybrid approach yields a valid lower bound for the posterior mean, while limiting the size of the branch-and-bound subproblems. We establish validity of the node lower bounds and $\varepsilon$-global convergence of the resulting algorithm. Computational results on synthetic benchmarks and real-world application problems show that PALM-Mean improves scalability relative to representative general-purpose deterministic global solvers, particularly as the number of training data points increases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PALM-Mean, a reduced-space spatial branch-and-bound algorithm for deterministic global optimization of trained Gaussian process posterior mean functions over hyperrectangles. At each node the kernel terms are partitioned into locally important ones, which receive a sign-aware piecewise-linear relaxation in a scalar distance variable, and the remainder, which receive closed-form analytical bounds; the resulting hybrid underestimator is embedded in spatial B&B. The authors claim to prove validity of the node lower bounds and ε-global convergence of the algorithm, and report computational results on synthetic benchmarks and real-world problems showing improved scalability relative to general-purpose deterministic solvers as the number of training points grows.
Significance. If the validity and convergence claims are rigorously established, the work supplies a structurally exploiting deterministic global optimizer for GP posterior means that scales better than black-box solvers while retaining exactness guarantees. The hybrid bounding strategy that limits subproblem size while preserving a consistent underestimator is a concrete technical contribution, and the reported experiments on both synthetic and real instances provide useful evidence of practical improvement.
major comments (1)
- [Convergence theorem / proof of ε-global convergence] The ε-global convergence argument (stated in the abstract and presumably detailed in the convergence theorem) requires that the hybrid lower bound gap vanishes as node diameter tends to zero. While the sign-aware PWL relaxations on the locally important kernels are consistent by construction, the closed-form analytical bounds on the remaining kernel terms are described only as “closed form” without an explicit demonstration that they become exact (or that their gap tends to zero) when the current hyperrectangle shrinks to a point. If the dynamic partitioning into “locally important” terms does not guarantee that every kernel eventually receives a consistent relaxation, or if the analytical bounds retain a positive gap independent of domain size, the standard B&B convergence theorem does not apply. Please supply the missing limit argument or the precise assumption that ensures consistency
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We are pleased that the referee recognizes the technical contribution of the hybrid bounding strategy and the practical improvements shown in the experiments. Below we respond to the single major comment.
read point-by-point responses
-
Referee: The ε-global convergence argument (stated in the abstract and presumably detailed in the convergence theorem) requires that the hybrid lower bound gap vanishes as node diameter tends to zero. While the sign-aware PWL relaxations on the locally important kernels are consistent by construction, the closed-form analytical bounds on the remaining kernel terms are described only as “closed form” without an explicit demonstration that they become exact (or that their gap tends to zero) when the current hyperrectangle shrinks to a point. If the dynamic partitioning into “locally important” terms does not guarantee that every kernel eventually receives a consistent relaxation, or if the analytical bounds retain a positive gap independent of domain size, the standard B&B convergence theorem does not apply. Please supply the missing limit argument or the precise assumption that ensures consistency
Authors: We agree that an explicit consistency argument for the closed-form analytical bounds is required for a fully rigorous presentation. The manuscript establishes validity of the node lower bounds and states ε-global convergence, but the limit behavior of the analytical bounds on the non-locally-important terms is only implicit. In the revised version we will insert a short lemma immediately before the convergence theorem that proves the gap of these bounds tends to zero as the node diameter approaches zero. The proof relies on the continuity of the kernel functions together with the fact that the closed-form bounds are derived from the exact range of each kernel term over the current hyperrectangle (which contracts to a point). We will also add a clarifying remark on the dynamic partitioning criterion, showing that it does not interfere with overall consistency because the analytical bounds themselves are consistent for any partition. These additions will make the application of the standard spatial B&B convergence theorem fully explicit. revision: yes
Circularity Check
No circularity: validity and convergence follow from standard relaxation properties and B&B theory
full rationale
The paper constructs a hybrid lower bound by partitioning kernel terms into locally important ones (relaxed via sign-aware PWL in a scalar distance variable) and remainder terms (bounded analytically in closed form). Validity is shown by verifying each piece is a valid underestimator, and ε-global convergence follows from the standard spatial B&B argument once the lower bound is consistent (gap vanishes as node diameter → 0). No step reduces by definition to a fitted quantity, self-citation chain, or renamed input; the central claims rest on explicit analytic properties of the GP posterior mean and established B&B convergence theorems that are independent of the specific hybrid construction. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The posterior mean is a finite sum of kernel evaluations and is therefore continuous and differentiable on the compact domain.
- domain assumption A sign-aware piecewise-linear function constructed from the scalar distance variable provides a valid lower bound for each selected kernel term over a sub-rectangle.
Reference graph
Works this paper leans on
-
[1]
Computers & Chemical Engineering179, 108411 (2023)
Misener, R., Biegler, L.: Formulating data-driven surrogate models for process optimization. Computers & Chemical Engineering179, 108411 (2023)
2023
-
[2]
Alex Shtoff, Elie Abboud, Rotem Stram, and Oren Somekh
Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of com- puter experiments. Statistical Science4(4), 409–423 (1989) https://doi.org/10. 1214/ss/1177012413
-
[3]
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer, New York (2003). https://doi.org/10.1007/ 978-1-4757-3799-8 .https://doi.org/10.1007/978-1-4757-3799-8
-
[4]
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Com- puter Experiments, 2nd edn. Springer, New York (2019). https://doi.org/10. 1007/978-1-4939-8847-1 .https://doi.org/10.1007/978-1-4939-8847-1
-
[5]
Progress in Aerospace Sciences41(1), 1–28 (2005) https://doi.org/10.1016/j.paerosci.2005.02.001
Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R., Tucker, P.K.: Surrogate-based analysis and optimization. Progress in Aerospace Sciences41(1), 1–28 (2005) https://doi.org/10.1016/j.paerosci.2005.02.001
-
[7]
European Journal of Operational Research 183, 1109–1130
Kleijnen, J.P.: Kriging metamodeling in simulation: A review. European Journal of Operational Research192(3), 707–716 (2009) https://doi.org/10.1016/j.ejor. 2007.10.013 30
-
[8]
MIT Press, Cambridge, MA (2006)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA (2006). https://doi.org/10.7551/mitpress/3206.001. 0001 .https://doi.org/10.7551/mitpress/3206.001.0001
-
[9]
A Tutorial on Bayesian Optimization
Frazier, P.I.: A Tutorial on Bayesian Optimization (2018). https://doi.org/10. 48550/arXiv.1807.02811 . https://doi.org/10.48550/arXiv.1807.02811
work page internal anchor Pith review doi:10.48550/arxiv.1807.02811 2018
-
[10]
Jones, Matthias Schonlau, and William J
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. Journal of Global Optimization13(4), 455–492 (1998) https: //doi.org/10.1023/A:1008306431147
-
[11]
doi: 10.1109/JPROC.2015.2494218
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., Freitas, N.: Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE 104(1), 148–175 (2016) https://doi.org/10.1109/JPROC.2015.2494218
-
[12]
Brochu, E., Cora, V.M., Freitas, N.: A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierar- chical Reinforcement Learning (2010). https://doi.org/10.48550/arXiv.1012.2599 . https://doi.org/10.48550/arXiv.1012.2599
-
[13]
Current Opinion in Green and Sustainable Chemistry51, 100983 (2025)
Paulson, J.A., Tsay, C.: Bayesian optimization as a flexible and efficient design framework for sustainable process systems. Current Opinion in Green and Sustainable Chemistry51, 100983 (2025)
2025
-
[14]
Mathematical Programming Computation13(3), 553–581 (2021) https://doi.org/10.1007/s12532-021-00204-y
Schweidtmann, A.M., Bongartz, D., Grothe, D., Kerkenhoff, T., Lin, X., Naj- man, J., Mitsos, A.: Deterministic global optimization with gaussian processes embedded. Mathematical Programming Computation13(3), 553–581 (2021) https://doi.org/10.1007/s12532-021-00204-y
-
[15]
Journal of Global Optimization21(4), 345–383 (2001) https://doi.org/10
Jones, D.R.: A taxonomy of global optimization methods based on response sur- faces. Journal of Global Optimization21(4), 345–383 (2001) https://doi.org/10. 1023/A:1012771025575
2001
-
[16]
Journal of Global Optimization8(2), 201–205 (1996) https://doi.org/10.1007/ BF00138693
Sahinidis, N.V.: Baron: A general purpose global optimization software package. Journal of Global Optimization8(2), 201–205 (1996) https://doi.org/10.1007/ BF00138693
1996
-
[17]
Journal of Global Optimization59(2–3), 503–526 (2014) https://doi.org/10.1007/s10898-014-0166-2
Misener, R., Floudas, C.A.: Antigone: Algorithms for continuous / integer global optimization of nonlinear equations. Journal of Global Optimization59(2–3), 503–526 (2014) https://doi.org/10.1007/s10898-014-0166-2
-
[18]
Management science15(9), 550–569 (1969)
Falk, J.E., Soland, R.M.: An algorithm for separable nonconvex programming problems. Management science15(9), 550–569 (1969)
1969
-
[19]
Springer, Berlin, Heidelberg (1996)
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 3rd edn. Springer, Berlin and Heidelberg (1996). https://doi.org/10.1007/ 978-3-662-03199-5 .https://doi.org/10.1007/978-3-662-03199-5 31
-
[20]
Computers & Chemical Engineering19(5), 551–566 (1995) https://doi.org/10.1016/0098-1354(94)00097-2
Ryoo, H.S., Sahinidis, N.V.: Global optimization of nonconvex NLPs and MINLPs with applications in process design. Computers & Chemical Engineering19(5), 551–566 (1995) https://doi.org/10.1016/0098-1354(94)00097-2
-
[21]
Belotti, P., Lee, J., Liberti, L., Margot, F., W¨ achter, A.: Branching and bounds tightening techniques for non-convex MINLP. Optimization Methods and Software24(4–5), 597–634 (2009) https://doi.org/10.1080/10556780903087124
-
[22]
Acta Numerica22, 1–131 (2013) https: //doi.org/10.1017/S0962492913000032
Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-integer nonlinear optimization. Acta Numerica22, 1–131 (2013) https: //doi.org/10.1017/S0962492913000032
-
[23]
Journal of Global Optimization 11, 287–311 (1997) https://doi.org/10.1023/A:1008212418949
Epperly, T.G.W., Pistikopoulos, E.N.: A reduced space branch and bound algo- rithm for global optimization. Journal of Global Optimization11(3), 287–311 (1997) https://doi.org/10.1023/A:1008212418949
-
[24]
Kluwer Academic Publishers, Dordrecht (2000)
Floudas, C.A.: Deterministic Global Optimization: Theory, Methods and Applica- tions. Kluwer Academic Publishers, Dordrecht (2000). https://doi.org/10.1007/ 978-1-4757-4949-6
2000
-
[25]
In: Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
Xie, Y., Zhang, S., Paulson, J.A., Tsay, C.: Global optimization of gaussian process acquisition functions using a piecewise-linear kernel approximation. In: Proceedings of The 28th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 258, pp. 2287–2295 (2025).https://proceedings.mlr.press/v258...
2025
-
[26]
Mathematical Programming 128(1–2), 49–72 (2011) https://doi.org/10.1007/S10107-009-0295-4
Vielma, J.P., Nemhauser, G.L.: Modeling disjunctive constraints with a loga- rithmic number of binary variables and constraints. Mathematical Programming 128(1–2), 49–72 (2011) https://doi.org/10.1007/S10107-009-0295-4
-
[27]
Operations Research71(5), 1835–1856 (2023) https://doi.org/10.1287/opre.2019.1973
Huchette, J., Vielma, J.P.: Nonconvex piecewise linear functions: Advanced for- mulations and simple modeling tools. Operations Research71(5), 1835–1856 (2023) https://doi.org/10.1287/opre.2019.1973
-
[28]
Journal of Global Optimization77, 455–486 (2020) https://doi.org/10.1007/s10898-020-00881-4
Grimstad, B., Knudsen, B.R.: Mathematical programming formulations for piece- wise polynomial functions. Journal of Global Optimization77, 455–486 (2020) https://doi.org/10.1007/s10898-020-00881-4
-
[29]
In: Pardalos, P.M., Prokopyev, O.A
Schweidtmann, A.M., Bongartz, D., Mitsos, A.: Optimization with trained machine learning models embedded. In: Pardalos, P.M., Prokopyev, O.A. (eds.) Encyclopedia of Optimization, pp. 1–8. Springer, Cham (2023). https://doi.org/ 10.1007/978-3-030-54621-2 735-1
-
[30]
Mathemat- ical Programming183(1), 3–39 (2020) 32
Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., Vielma, J.P.: Strong mixed-integer programming formulations for trained neural networks. Mathemat- ical Programming183(1), 3–39 (2020) 32
2020
-
[31]
Computers & Chemical Engineering131, 106580 (2019)
Grimstad, B., Andersson, H.: ReLU networks as surrogate models in mixed- integer linear programs. Computers & Chemical Engineering131, 106580 (2019)
2019
-
[32]
INFORMS Journal on Computing (2026)
Huchette, J., Mu˜ noz, G., Serra, T., Tsay, C.: When deep learning meets polyhedral theory: A survey. INFORMS Journal on Computing (2026)
2026
-
[33]
Advances in neural information processing systems34, 3068–3080 (2021)
Tsay, C., Kronqvist, J., Thebelt, A., Misener, R.: Partition-based formulations for mixed-integer optimization of trained relu neural networks. Advances in neural information processing systems34, 3068–3080 (2021)
2021
-
[34]
Journal of Global Optimization71(2), 407–438 (2018) https://doi.org/10.1007/ s10898-018-0609-2
Bradford, E., Schweidtmann, A.M., Lapkin, A.: Efficient multiobjective optimiza- tion employing Gaussian processes, spectral sampling and a genetic algorithm. Journal of Global Optimization71(2), 407–438 (2018) https://doi.org/10.1007/ s10898-018-0609-2
2018
-
[35]
Paulson, J.A., Lu, C.: COBALT: COnstrained Bayesian optimizAtion of compu- tationaLly expensive grey-box models exploiting derivaTive information. Com- puters & Chemical Engineering160, 107700 (2022) https://doi.org/10.1016/j. compchemeng.2022.107700
work page doi:10.1016/j 2022
-
[36]
Reliability Engineering & System Safety247, 110094 (2024) https://doi.org/10.1016/j.ress.2024.110094
Marrel, A., Iooss, B.: Probabilistic surrogate modeling by gaussian process: A review on recent insights in estimation and validation. Reliability Engineering & System Safety247, 110094 (2024) https://doi.org/10.1016/j.ress.2024.110094
-
[37]
Mathematical P rogramming 10, 147–175 (1976) https://doi.org/10.1007/BF01580665
McCormick, G.P.: Computability of global solutions to factorable nonconvex pro- grams: Part I — convex underestimating problems. Mathematical Programming 10(1), 147–175 (1976) https://doi.org/10.1007/BF01580665
-
[38]
SIAM Journal on Optimization 20(2), 573–601 (2009) https://doi.org/10.1137/080717341
Mitsos, A., Chachuat, B., Barton, P.I.: Mccormick-based relaxations of algo- rithms. SIAM Journal on Optimization20(2), 573–601 (2009) https://doi.org/ 10.1137/080717341
-
[39]
Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming. Kluwer Academic Publishers, Dordrecht (2002). https://doi.org/10.1007/978-1-4757-3532-1
-
[40]
Journal of Global Optimization45(1), 3–38 (2009) https://doi.org/10.1007/ s10898-008-9332-8
Floudas, C.A., Gounaris, C.E.: A review of recent advances in global optimiza- tion. Journal of Global Optimization45(1), 3–38 (2009) https://doi.org/10.1007/ s10898-008-9332-8
2009
-
[41]
arXiv preprint arXiv:2410.22322 (2024)
Adebiyi, T.A., Do, B., Zhang, R.: Optimizing posterior samples for bayesian optimization via rootfinding. arXiv preprint arXiv:2410.22322 (2024)
- [42]
-
[43]
Technical report, Process Systems Engineering (AVT.SVT), RWTH Aachen University, Aachen, Germany (2018)
Bongartz, D., Najman, J., Sass, S., Mitsos, A.: Maingo – mccormick-based algo- rithm for mixed-integer nonlinear global optimization. Technical report, Process Systems Engineering (AVT.SVT), RWTH Aachen University, Aachen, Germany (2018). http://permalink.avt.rwth-aachen.de/?id=729717
2018
-
[44]
Journal of Global Optimization 69(4), 761–796 (2017)
Bongartz, D., Mitsos, A.: Deterministic global optimization of process flowsheets in a reduced space using mccormick relaxations. Journal of Global Optimization 69(4), 761–796 (2017)
2017
-
[45]
Optimization Metho ds & Software 33(3), 563–593 (2017) https://doi.org/10.1080/10556788.2017.1335312
Vigerske, S., Gleixner, A.M.: Scip: Global optimization of mixed-integer nonlinear programs in a branch-and-cut framework. Optimization Methods and Software 33(3), 563–593 (2018) https://doi.org/10.1080/10556788.2017.1335312
-
[46]
https: //www.gurobi.com
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2026). https: //www.gurobi.com
2026
-
[47]
Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on mathematical software (TOMS)23(4), 550–560 (1997) https://doi.org/10.1145/ 279232.27923
-
[48]
The Annals of Statistics24(5), 2058– 2080 (1996) https://doi.org/10.1214/aos/1069362310
Loh, W.-L.: On latin hypercube sampling. The Annals of Statistics24(5), 2058– 2080 (1996) https://doi.org/10.1214/aos/1069362310
-
[49]
Technometrics42(1), 55–61 (2000)
McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics42(1), 55–61 (2000)
2000
-
[50]
Advances in Neural Information Processing Systems34, 20708–20720 (2021)
M¨ uller, S., Rohr, A., Trimpe, S.: Local policy search with Bayesian optimization. Advances in Neural Information Processing Systems34, 20708–20720 (2021)
2021
-
[51]
https://doi.org/10.48550/arXiv.2510.05516
Tang, W.-T., Kudva, A., Paulson, J.A.: NeST-BO: Fast Local Bayesian Opti- mization via Newton-Step Targeting of Gradient and Hessian Information (2025). https://doi.org/10.48550/arXiv.2510.05516 . https://doi.org/10.48550/ arXiv.2510.05516
-
[52]
Liang, Q., Gongora, A.E., Ren, Z., Tiihonen, A., Liu, Z.,et al.: Benchmarking the performance of bayesian optimization across multiple experimental materials science domains. npj Computational Materials7(1), 188 (2021) https://doi.org/ 10.1038/s41524-021-00656-9
-
[53]
MRS Bulletin46(7), 566–575 (2021) https://doi.org/10
Deneault, J.R., Chang, J., Myung, J., Hooper, D., Armstrong, A., Pitt, M., Maruyama, B.: Toward autonomous additive manufacturing: Bayesian optimiza- tion on a 3d printer. MRS Bulletin46(7), 566–575 (2021) https://doi.org/10. 1557/s43577-021-00051-1 34
2021
-
[54]
Advances in neural information processing systems33, 21524–21538 (2020) 35
Balandat, M., Karrer, B., Jiang, D., Daulton, S., Letham, B., Wilson, A.G., Bak- shy, E.: Botorch: A framework for efficient monte-carlo bayesian optimization. Advances in neural information processing systems33, 21524–21538 (2020) 35
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.