Recognition: unknown
Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications
Pith reviewed 2026-05-09 16:07 UTC · model grok-4.3
The pith
A novel Gaussian Process kernel encodes permutation invariance to improve Bayesian optimization for well placement in carbon capture projects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations, and can be combined with standard kernels for additional vector-valued inputs. This is motivated by the permutation symmetries arising in well placement under group control in CCS simulators.
What carries the argument
GP-Perm kernel that induces permutation invariance via stable divergence between empirical representations of sets.
If this is right
- The GP-Perm kernel enables more efficient surrogate modeling in Bayesian Optimization by avoiding learning on permuted equivalents.
- It supports hybrid inputs by combination with standard kernels for vector-valued features.
- Evaluation on the Johansen formation CCS case demonstrates potential performance gains in realistic well placement optimization.
- Provides an alternative to deep learning approaches like Deep Sets for inducing invariance in GPs.
Where Pith is reading between the lines
- This method could apply to other optimization tasks involving unordered collections, such as sensor placement or portfolio selection.
- The divergence-based comparison might offer better generalization than learned embeddings when data is scarce.
- Accounting for permutation symmetry reduces the effective dimensionality of the input space by the size of the symmetric groups.
Load-bearing premise
The high-fidelity CCS simulator's group-control mode produces genuine permutation symmetries that standard GP kernels cannot exploit effectively.
What would settle it
Running the optimization on the Johansen formation and finding no improvement in convergence speed or final performance with GP-Perm over standard kernels would falsify the practical benefit.
Figures
read the original abstract
Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most common kernels employed are better suited for vector inputs rather than unordered sets of items. Motivated by this issue, we turn to permutation invariant Bayesian Optimization for well placement in Carbon Capture and Storage projects. The high fidelity black box simulator is instructed to operate wells under group control, giving rise to permutation symmetries within injector and producer groups that cannot be exploited with standard GP kernels. In this work, our main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations, and can be combined with standard kernels for additional vector-valued inputs. As a learned invariant baseline, we also consider a Deep Kernel Learning model (DKL-DS) using the Deep Sets architecture to learn a permutation-invariant embedding. We evaluate the proposed methodology across 8 use cases, comprising seven synthetic benchmarks and one realistic CCS case study (Johansen formation)
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a novel Gaussian Process kernel (GP-Perm) for Bayesian optimization that encodes permutation invariance by comparing unordered sets of inputs through a stable divergence between their induced empirical representations. This kernel can be combined with standard kernels for additional vector-valued features. The approach is motivated by well-placement optimization in carbon capture and storage (CCS), where group-control modes in high-fidelity simulators induce permutation symmetries among injector and producer wells that standard kernels cannot exploit. The method is compared to a Deep Kernel Learning baseline using Deep Sets (DKL-DS) and evaluated on seven synthetic benchmarks plus one realistic case study on the Johansen formation.
Significance. If the GP-Perm kernel is positive semi-definite and yields measurable gains in sample efficiency by exploiting the induced symmetries, the work would provide a practical extension of BO to set-structured inputs in engineering applications. The grounding in a realistic CCS simulator is a strength that moves beyond purely synthetic tests common in the literature.
major comments (2)
- [GP-Perm kernel definition] In the section defining the GP-Perm kernel: the construction relies on a divergence between empirical representations of sets, yet no derivation, theorem, or reference is supplied to establish that the resulting function is positive semi-definite. Standard divergences are not kernels, and simple transformations such as exponentiation do not automatically guarantee the PSD property required for a valid GP covariance. This is load-bearing for the central modeling claim; without it the surrogate cannot be used in Bayesian optimization. Please supply an explicit feature-map argument, invocation of a relevant theorem (e.g., Schoenberg), or systematic empirical verification that all Gram matrices remain PSD across the reported experiments.
- [Experiments] Section 5 (Experiments) and associated tables/figures: the abstract states that the method was evaluated on eight use cases including the Johansen CCS study, but the manuscript must include quantitative results (regret curves, final objective values, wall-clock times) together with statistical details and direct comparisons against standard kernels (e.g., RBF) and the DKL-DS baseline. Absence of these data prevents verification of the claimed performance advantage arising from permutation invariance.
minor comments (3)
- [Kernel construction] Clarify the precise divergence measure employed and the mechanism that renders it 'stable'; an explicit formula or pseudocode would remove ambiguity.
- [Related work] Add citations to existing literature on set kernels, permutation-invariant GPs, and Deep Sets to better situate the novelty of GP-Perm.
- [Benchmarks] In the synthetic benchmark descriptions, explicitly state the input dimensionality, how permutation symmetry is generated, and the number of independent runs used for statistical reporting.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the theoretical grounding and experimental presentation.
read point-by-point responses
-
Referee: In the section defining the GP-Perm kernel: the construction relies on a divergence between empirical representations of sets, yet no derivation, theorem, or reference is supplied to establish that the resulting function is positive semi-definite. Standard divergences are not kernels, and simple transformations such as exponentiation do not automatically guarantee the PSD property required for a valid GP covariance. This is load-bearing for the central modeling claim; without it the surrogate cannot be used in Bayesian optimization. Please supply an explicit feature-map argument, invocation of a relevant theorem (e.g., Schoenberg), or systematic empirical verification that all Gram matrices remain PSD across the reported experiments.
Authors: We acknowledge that the submitted manuscript does not contain an explicit derivation, theorem, or reference proving that the GP-Perm kernel is positive semi-definite. This is a substantive point. In the revision we will add an appendix that performs systematic empirical verification: for every Gram matrix arising in the eight reported experiments we will report the minimum eigenvalue (within floating-point precision) and confirm non-negativity. We will also attempt to supply a feature-map argument or invoke Schoenberg’s theorem on the specific form of the set divergence; if a complete theoretical guarantee cannot be derived in time for the revision, the empirical checks will be presented as practical evidence that the kernel is usable for the GP surrogate in the evaluated settings. revision: yes
-
Referee: Section 5 (Experiments) and associated tables/figures: the abstract states that the method was evaluated on eight use cases including the Johansen CCS study, but the manuscript must include quantitative results (regret curves, final objective values, wall-clock times) together with statistical details and direct comparisons against standard kernels (e.g., RBF) and the DKL-DS baseline. Absence of these data prevents verification of the claimed performance advantage arising from permutation invariance.
Authors: We apologize that the quantitative results were not presented with the requested level of detail. Although the manuscript describes evaluations across the eight use cases, we will expand Section 5 to include: (i) regret curves for all seven synthetic benchmarks and the Johansen formation case, (ii) tables of final objective values with means and standard deviations over multiple independent runs, (iii) wall-clock timings for kernel evaluation and BO iterations, and (iv) direct side-by-side comparisons against both the standard RBF kernel and the DKL-DS baseline. Statistical significance (e.g., Wilcoxon signed-rank tests) will be added to support claims of improvement due to permutation invariance. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper defines GP-Perm as a novel kernel construction that compares sets via a stable divergence on empirical representations, presented as an independent modeling choice rather than derived from or reduced to fitted parameters, self-referential equations, or prior self-citations. No load-bearing step equates a claimed prediction or invariance property to its own inputs by construction. The evaluation on synthetic benchmarks and the Johansen CCS case is framed as empirical validation of the independent construction, with DKL-DS introduced as a separate baseline. This satisfies the default expectation of a self-contained contribution without the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gaussian Processes remain suitable surrogate models once a permutation-invariant kernel is substituted
invented entities (1)
-
GP-Perm kernel
no independent evidence
Reference graph
Works this paper leans on
-
[1]
M. Bui, G. D. Puxty, M. Gazzani, S. M. Soltani, C. Pozo, The role of carbon capture and storage (ccs) technologies in a net-zero carbon future (2021)
2021
-
[2]
Ismail, V
I. Ismail, V. Gaganis, Carbon capture, utilization, and storage in saline aquifers: Subsurface policies, development plans, well control strategies and optimization approaches—a review, Clean Technologies 5 (2023) 609–637. 25
2023
-
[3]
P. I. Frazier, A tutorial on bayesian optimization, arXiv preprint arXiv:1807.02811 (2018)
work page internal anchor Pith review arXiv 2018
-
[4]
C. K. Williams, C. E. Rasmussen, Gaussian processes for machine learn- ing, volume 2, MIT press Cambridge, MA, 2006
2006
-
[5]
Brown, A
T. Brown, A. Cioba, I. Bogunovic, Sample-efficient bayesian optimisa- tion using known invariances, Advances in Neural Information Process- ing Systems 37 (2024) 47931–47965
2024
-
[6]
J. Kim, M. McCourt, T. You, S. Kim, S. Choi, Bayesian optimization with approximate set kernels, Machine Learning 110 (2021) 857–879
2021
-
[7]
Zaheer, S
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, A. J. Smola, Deep sets, Advances in neural information processing systems 30 (2017)
2017
-
[8]
A.G.Wilson, Z.Hu, R.Salakhutdinov, E.P.Xing, Deepkernellearning, in: Artificial intelligence and statistics, PMLR, 2016, pp. 370–378
2016
-
[9]
Buathong, D
P. Buathong, D. Ginsbourger, T. Krityakierne, Kernels over sets of finite setsusingrkhsembeddings, withapplicationtobayesian(combinatorial) optimization, in: International conference on artificial intelligence and statistics, PMLR, 2020, pp. 2731–2741
2020
-
[10]
H. Moss, D. Leslie, D. Beck, J. Gonzalez, P. Rayson, Boss: Bayesian op- timization over string spaces, Advances in neural information processing systems 33 (2020) 15476–15486
2020
-
[11]
C. Oh, J. Tomczak, E. Gavves, M. Welling, Combinatorial bayesian optimization using the graph cartesian product, Advances in Neural Information Processing Systems 32 (2019)
2019
-
[12]
S. P. Fotias, I. Ismail, V. Gaganis, Optimization of well placement in carbon capture and storage (ccs): Bayesian optimization framework under permutation invariance, Applied Sciences 14 (2024) 3528
2024
-
[13]
Garnett, M
R. Garnett, M. A. Osborne, S. J. Roberts, Bayesian optimization for sensor set selection, in: Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, 2010, pp. 209– 219. 26
2010
-
[14]
Kandasamy, W
K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, E. P. Xing, Neural architecture search with bayesian optimisation and optimal transport, Advances in neural information processing systems 31 (2018)
2018
-
[15]
Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013)
M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013)
2013
-
[16]
J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, Y. W. Teh, Set trans- former: A framework for attention-based permutation-invariant neural networks, in: International conference on machine learning, PMLR, 2019, pp. 3744–3753
2019
-
[17]
On permutation- invariant neural networks.arXiv preprint arXiv:2403.17410, 2024
M. Kimura, R. Shimizu, Y. Hirakawa, R. Goto, Y. Saito, On permutation-invariantneuralnetworks, arXivpreprintarXiv:2403.17410 (2024)
-
[18]
Gretton, K
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, A kernel two-sample test, The journal of machine learning research 13 (2012) 723–773
2012
-
[19]
Bachoc, L
F. Bachoc, L. Béthune, A. Gonzalez-Sanz, J.-M. Loubes, Gaussian pro- cesses on distributions based on regularized optimal transport, in: In- ternational Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 4986–5010
2023
-
[20]
Feydy, T
J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, G. Peyré, Interpolating between optimal transport and mmd using sinkhorn diver- gences, in: The 22nd international conference on artificial intelligence and statistics, PMLR, 2019, pp. 2681–2690
2019
-
[21]
Ament, S
S. Ament, S. Daulton, D. Eriksson, M. Balandat, E. Bakshy, Unex- pected improvements to expected improvement for bayesian optimiza- tion, Advances in Neural Information Processing Systems 36 (2023) 20577–20612
2023
-
[22]
Andersen, G
O. Andersen, G. Tangen, P. Ringrose, S. E. Greenberg, Co2 data share: a platform for sharing co2 storage reference datasets from demonstra- tion projects, in: 14th greenhouse gas control technologies conference Melbourne, 2018, pp. 21–26. 27
2018
-
[23]
A. F. Rasmussen, T. H. Sandve, K. Bao, A. Lauser, J. Hove, B. Skaflestad, R. Klöfkorn, M. Blatt, A. B. Rustad, O. Sævareid, et al., The open porous media flow reservoir simulator, Computers & Mathe- matics with Applications 81 (2021) 159–185
2021
-
[24]
P. S. Bergmo, E. Lindeberg, F. Riis, W. T. Johansen, Exploring geo- logical storage sites for co2 from norwegian gas power plants: Johansen formation, Energy Procedia 1 (2009) 2945–2952
2009
-
[25]
P. S. Bergmoa, A.-A. Grimstad, E. Lindeberg, F. Riis, W. T. Johansen, Exploring geological storage sites for co2 from norwegian gas power plants: Utsira south, Energy Procedia 1 (2009) 2953–2959. Appendix A. Synthetic Test Functions In all experiments, point coordinates are implemented in a normalized box and can be affinely mapped to[0,1] 2 without chang...
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.