arxiv: 2605.02409 · v1 · submitted 2026-05-04 · 💻 cs.LG

Recognition: unknown

Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

Sofianos Panagiotis Fotias , Vassilis Gaganis

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords Bayesian optimizationGaussian processespermutation invariancecarbon capture and storagewell placement optimizationdeep kernel learning

0 comments

The pith

A novel Gaussian Process kernel encodes permutation invariance to improve Bayesian optimization for well placement in carbon capture projects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to make Bayesian Optimization more efficient when optimizing well placements in Carbon Capture and Storage, where the simulator's group control mode makes the order of wells within injector or producer groups irrelevant. Standard kernels treat these as ordered vectors, leading to redundant learning on equivalent inputs. The authors develop GP-Perm, a kernel that compares sets of wells using a stable divergence between their empirical representations, allowing the model to respect the true symmetries. This is tested against synthetic benchmarks and a realistic Johansen formation case, alongside a Deep Sets-based baseline. If successful, it means fewer expensive simulator runs are needed to find good configurations.

Core claim

The main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations, and can be combined with standard kernels for additional vector-valued inputs. This is motivated by the permutation symmetries arising in well placement under group control in CCS simulators.

What carries the argument

GP-Perm kernel that induces permutation invariance via stable divergence between empirical representations of sets.

If this is right

The GP-Perm kernel enables more efficient surrogate modeling in Bayesian Optimization by avoiding learning on permuted equivalents.
It supports hybrid inputs by combination with standard kernels for vector-valued features.
Evaluation on the Johansen formation CCS case demonstrates potential performance gains in realistic well placement optimization.
Provides an alternative to deep learning approaches like Deep Sets for inducing invariance in GPs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could apply to other optimization tasks involving unordered collections, such as sensor placement or portfolio selection.
The divergence-based comparison might offer better generalization than learned embeddings when data is scarce.
Accounting for permutation symmetry reduces the effective dimensionality of the input space by the size of the symmetric groups.

Load-bearing premise

The high-fidelity CCS simulator's group-control mode produces genuine permutation symmetries that standard GP kernels cannot exploit effectively.

What would settle it

Running the optimization on the Johansen formation and finding no improvement in convergence speed or final performance with GP-Perm over standard kernels would falsify the practical benefit.

Figures

Figures reproduced from arXiv: 2605.02409 by Sofianos Panagiotis Fotias, Vassilis Gaganis.

**Figure 1.** Figure 1: Permutation invariance in physical configuration: a standard GP sees view at source ↗

**Figure 2.** Figure 2: GP-Perm kernel construction. The left and right panels show two inputs, each view at source ↗

**Figure 3.** Figure 3: DKL-DS architecture. An MLP embeds the auxiliary vector inputs, while Deep view at source ↗

**Figure 4.** Figure 4: Synthetic benchmarks: best-so-far objective versus BO iteration (mean view at source ↗

**Figure 5.** Figure 5: BO results on (left) CCS-like two-set synthetic benchmark and (right) Johansen view at source ↗

read the original abstract

Bayesian Optimization is an iterative method, tailored to optimizing expensive black box objective functions. Surrogate models like Gaussian Processes, which are the gold standard in Bayesian Optimization, can be inefficient for inputs with permutation symmetries, as the most common kernels employed are better suited for vector inputs rather than unordered sets of items. Motivated by this issue, we turn to permutation invariant Bayesian Optimization for well placement in Carbon Capture and Storage projects. The high fidelity black box simulator is instructed to operate wells under group control, giving rise to permutation symmetries within injector and producer groups that cannot be exploited with standard GP kernels. In this work, our main contribution is a novel Gaussian Process kernel (GP-Perm) that encodes permutation invariance by comparing sets through a stable divergence between their induced empirical representations, and can be combined with standard kernels for additional vector-valued inputs. As a learned invariant baseline, we also consider a Deep Kernel Learning model (DKL-DS) using the Deep Sets architecture to learn a permutation-invariant embedding. We evaluate the proposed methodology across 8 use cases, comprising seven synthetic benchmarks and one realistic CCS case study (Johansen formation)

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a divergence-based GP kernel for permutation invariance in BO but leaves open whether the construction is actually positive semi-definite.

read the letter

The core contribution is a new kernel, GP-Perm, that treats inputs as unordered sets by comparing their empirical representations with a stable divergence and then combines that with ordinary kernels for any extra vector features. This is aimed at Bayesian optimization of well placement under group control in CCS simulators, where injectors and producers are interchangeable within their groups. The approach is a reasonable response to a real modeling gap: standard RBF or Matérn kernels do not exploit those symmetries and can waste evaluations on equivalent configurations. Adding a Deep Sets baseline for comparison is also a sensible choice of learned invariant method. The CCS application itself is timely and the synthetic benchmarks give a broad test bed. That said, the abstract supplies no numbers, no baseline comparisons, and no statistical details on any of the eight cases, so the performance claim cannot be checked yet. The larger issue is whether GP-Perm is a valid kernel at all. Divergences are not kernels in general, and simply using one inside an exponential does not guarantee positive semi-definiteness unless the paper shows a feature map, invokes a theorem, or at least verifies that the Gram matrices stay PSD on the reported problems. If that step is missing or only empirical without explanation, the modeling claim does not hold. The paper is aimed at BO practitioners who deal with set-valued or symmetric inputs, especially in engineering domains like energy systems. A reader working on invariant kernels or CCS simulation would find the construction worth looking at, but the current draft is too thin on verification to stand on its own. I would send it for peer review because the problem is concrete and the kernel idea is distinct enough to deserve referee input on the PSD question and the missing experiments.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes a novel Gaussian Process kernel (GP-Perm) for Bayesian optimization that encodes permutation invariance by comparing unordered sets of inputs through a stable divergence between their induced empirical representations. This kernel can be combined with standard kernels for additional vector-valued features. The approach is motivated by well-placement optimization in carbon capture and storage (CCS), where group-control modes in high-fidelity simulators induce permutation symmetries among injector and producer wells that standard kernels cannot exploit. The method is compared to a Deep Kernel Learning baseline using Deep Sets (DKL-DS) and evaluated on seven synthetic benchmarks plus one realistic case study on the Johansen formation.

Significance. If the GP-Perm kernel is positive semi-definite and yields measurable gains in sample efficiency by exploiting the induced symmetries, the work would provide a practical extension of BO to set-structured inputs in engineering applications. The grounding in a realistic CCS simulator is a strength that moves beyond purely synthetic tests common in the literature.

major comments (2)

[GP-Perm kernel definition] In the section defining the GP-Perm kernel: the construction relies on a divergence between empirical representations of sets, yet no derivation, theorem, or reference is supplied to establish that the resulting function is positive semi-definite. Standard divergences are not kernels, and simple transformations such as exponentiation do not automatically guarantee the PSD property required for a valid GP covariance. This is load-bearing for the central modeling claim; without it the surrogate cannot be used in Bayesian optimization. Please supply an explicit feature-map argument, invocation of a relevant theorem (e.g., Schoenberg), or systematic empirical verification that all Gram matrices remain PSD across the reported experiments.
[Experiments] Section 5 (Experiments) and associated tables/figures: the abstract states that the method was evaluated on eight use cases including the Johansen CCS study, but the manuscript must include quantitative results (regret curves, final objective values, wall-clock times) together with statistical details and direct comparisons against standard kernels (e.g., RBF) and the DKL-DS baseline. Absence of these data prevents verification of the claimed performance advantage arising from permutation invariance.

minor comments (3)

[Kernel construction] Clarify the precise divergence measure employed and the mechanism that renders it 'stable'; an explicit formula or pseudocode would remove ambiguity.
[Related work] Add citations to existing literature on set kernels, permutation-invariant GPs, and Deep Sets to better situate the novelty of GP-Perm.
[Benchmarks] In the synthetic benchmark descriptions, explicitly state the input dimensionality, how permutation symmetry is generated, and the number of independent runs used for statistical reporting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the theoretical grounding and experimental presentation.

read point-by-point responses

Referee: In the section defining the GP-Perm kernel: the construction relies on a divergence between empirical representations of sets, yet no derivation, theorem, or reference is supplied to establish that the resulting function is positive semi-definite. Standard divergences are not kernels, and simple transformations such as exponentiation do not automatically guarantee the PSD property required for a valid GP covariance. This is load-bearing for the central modeling claim; without it the surrogate cannot be used in Bayesian optimization. Please supply an explicit feature-map argument, invocation of a relevant theorem (e.g., Schoenberg), or systematic empirical verification that all Gram matrices remain PSD across the reported experiments.

Authors: We acknowledge that the submitted manuscript does not contain an explicit derivation, theorem, or reference proving that the GP-Perm kernel is positive semi-definite. This is a substantive point. In the revision we will add an appendix that performs systematic empirical verification: for every Gram matrix arising in the eight reported experiments we will report the minimum eigenvalue (within floating-point precision) and confirm non-negativity. We will also attempt to supply a feature-map argument or invoke Schoenberg’s theorem on the specific form of the set divergence; if a complete theoretical guarantee cannot be derived in time for the revision, the empirical checks will be presented as practical evidence that the kernel is usable for the GP surrogate in the evaluated settings. revision: yes
Referee: Section 5 (Experiments) and associated tables/figures: the abstract states that the method was evaluated on eight use cases including the Johansen CCS study, but the manuscript must include quantitative results (regret curves, final objective values, wall-clock times) together with statistical details and direct comparisons against standard kernels (e.g., RBF) and the DKL-DS baseline. Absence of these data prevents verification of the claimed performance advantage arising from permutation invariance.

Authors: We apologize that the quantitative results were not presented with the requested level of detail. Although the manuscript describes evaluations across the eight use cases, we will expand Section 5 to include: (i) regret curves for all seven synthetic benchmarks and the Johansen formation case, (ii) tables of final objective values with means and standard deviations over multiple independent runs, (iii) wall-clock timings for kernel evaluation and BO iterations, and (iv) direct side-by-side comparisons against both the standard RBF kernel and the DKL-DS baseline. Statistical significance (e.g., Wilcoxon signed-rank tests) will be added to support claims of improvement due to permutation invariance. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines GP-Perm as a novel kernel construction that compares sets via a stable divergence on empirical representations, presented as an independent modeling choice rather than derived from or reduced to fitted parameters, self-referential equations, or prior self-citations. No load-bearing step equates a claimed prediction or invariance property to its own inputs by construction. The evaluation on synthetic benchmarks and the Johansen CCS case is framed as empirical validation of the independent construction, with DKL-DS introduced as a separate baseline. This satisfies the default expectation of a self-contained contribution without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that permutation symmetries exist in the CCS simulator output and that a divergence-based set kernel can capture them usefully; no free parameters are named in the abstract, and the only invented element is the kernel itself.

axioms (1)

domain assumption Gaussian Processes remain suitable surrogate models once a permutation-invariant kernel is substituted
Standard modeling choice in Bayesian optimization literature invoked without further justification.

invented entities (1)

GP-Perm kernel no independent evidence
purpose: To induce permutation invariance by comparing empirical set representations via stable divergence
Newly defined construction whose performance is asserted to improve optimization on the target tasks.

pith-pipeline@v0.9.0 · 5504 in / 1226 out tokens · 36322 ms · 2026-05-09T16:07:28.669385+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages · 1 internal anchor

[1]

M. Bui, G. D. Puxty, M. Gazzani, S. M. Soltani, C. Pozo, The role of carbon capture and storage (ccs) technologies in a net-zero carbon future (2021)

2021
[2]

Ismail, V

I. Ismail, V. Gaganis, Carbon capture, utilization, and storage in saline aquifers: Subsurface policies, development plans, well control strategies and optimization approaches—a review, Clean Technologies 5 (2023) 609–637. 25

2023
[3]

P. I. Frazier, A tutorial on bayesian optimization, arXiv preprint arXiv:1807.02811 (2018)

work page internal anchor Pith review arXiv 2018
[4]

C. K. Williams, C. E. Rasmussen, Gaussian processes for machine learn- ing, volume 2, MIT press Cambridge, MA, 2006

2006
[5]

Brown, A

T. Brown, A. Cioba, I. Bogunovic, Sample-efficient bayesian optimisa- tion using known invariances, Advances in Neural Information Process- ing Systems 37 (2024) 47931–47965

2024
[6]

J. Kim, M. McCourt, T. You, S. Kim, S. Choi, Bayesian optimization with approximate set kernels, Machine Learning 110 (2021) 857–879

2021
[7]

Zaheer, S

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, A. J. Smola, Deep sets, Advances in neural information processing systems 30 (2017)

2017
[8]

A.G.Wilson, Z.Hu, R.Salakhutdinov, E.P.Xing, Deepkernellearning, in: Artificial intelligence and statistics, PMLR, 2016, pp. 370–378

2016
[9]

Buathong, D

P. Buathong, D. Ginsbourger, T. Krityakierne, Kernels over sets of finite setsusingrkhsembeddings, withapplicationtobayesian(combinatorial) optimization, in: International conference on artificial intelligence and statistics, PMLR, 2020, pp. 2731–2741

2020
[10]

H. Moss, D. Leslie, D. Beck, J. Gonzalez, P. Rayson, Boss: Bayesian op- timization over string spaces, Advances in neural information processing systems 33 (2020) 15476–15486

2020
[11]

C. Oh, J. Tomczak, E. Gavves, M. Welling, Combinatorial bayesian optimization using the graph cartesian product, Advances in Neural Information Processing Systems 32 (2019)

2019
[12]

S. P. Fotias, I. Ismail, V. Gaganis, Optimization of well placement in carbon capture and storage (ccs): Bayesian optimization framework under permutation invariance, Applied Sciences 14 (2024) 3528

2024
[13]

Garnett, M

R. Garnett, M. A. Osborne, S. J. Roberts, Bayesian optimization for sensor set selection, in: Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, 2010, pp. 209– 219. 26

2010
[14]

Kandasamy, W

K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, E. P. Xing, Neural architecture search with bayesian optimisation and optimal transport, Advances in neural information processing systems 31 (2018)

2018
[15]

Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013)

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013)

2013
[16]

J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, Y. W. Teh, Set trans- former: A framework for attention-based permutation-invariant neural networks, in: International conference on machine learning, PMLR, 2019, pp. 3744–3753

2019
[17]

On permutation- invariant neural networks.arXiv preprint arXiv:2403.17410, 2024

M. Kimura, R. Shimizu, Y. Hirakawa, R. Goto, Y. Saito, On permutation-invariantneuralnetworks, arXivpreprintarXiv:2403.17410 (2024)

work page arXiv 2024
[18]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, A. Smola, A kernel two-sample test, The journal of machine learning research 13 (2012) 723–773

2012
[19]

Bachoc, L

F. Bachoc, L. Béthune, A. Gonzalez-Sanz, J.-M. Loubes, Gaussian pro- cesses on distributions based on regularized optimal transport, in: In- ternational Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 4986–5010

2023
[20]

Feydy, T

J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, G. Peyré, Interpolating between optimal transport and mmd using sinkhorn diver- gences, in: The 22nd international conference on artificial intelligence and statistics, PMLR, 2019, pp. 2681–2690

2019
[21]

Ament, S

S. Ament, S. Daulton, D. Eriksson, M. Balandat, E. Bakshy, Unex- pected improvements to expected improvement for bayesian optimiza- tion, Advances in Neural Information Processing Systems 36 (2023) 20577–20612

2023
[22]

Andersen, G

O. Andersen, G. Tangen, P. Ringrose, S. E. Greenberg, Co2 data share: a platform for sharing co2 storage reference datasets from demonstra- tion projects, in: 14th greenhouse gas control technologies conference Melbourne, 2018, pp. 21–26. 27

2018
[23]

A. F. Rasmussen, T. H. Sandve, K. Bao, A. Lauser, J. Hove, B. Skaflestad, R. Klöfkorn, M. Blatt, A. B. Rustad, O. Sævareid, et al., The open porous media flow reservoir simulator, Computers & Mathe- matics with Applications 81 (2021) 159–185

2021
[24]

P. S. Bergmo, E. Lindeberg, F. Riis, W. T. Johansen, Exploring geo- logical storage sites for co2 from norwegian gas power plants: Johansen formation, Energy Procedia 1 (2009) 2945–2952

2009
[25]

P. S. Bergmoa, A.-A. Grimstad, E. Lindeberg, F. Riis, W. T. Johansen, Exploring geological storage sites for co2 from norwegian gas power plants: Utsira south, Energy Procedia 1 (2009) 2953–2959. Appendix A. Synthetic Test Functions In all experiments, point coordinates are implemented in a normalized box and can be affinely mapped to[0,1] 2 without chang...

2009