Statistical Tapers for Correlation-Based Localization in Ensemble Data Assimilation

Alexandre A. Emerick; Vinicius Luiz Santos Silva

arxiv: 2605.29922 · v1 · pith:HO34VO2Tnew · submitted 2026-05-28 · 📊 stat.ME

Statistical Tapers for Correlation-Based Localization in Ensemble Data Assimilation

Alexandre A. Emerick , Vinicius Luiz Santos Silva This is my paper

Pith reviewed 2026-06-29 06:07 UTC · model grok-4.3

classification 📊 stat.ME

keywords ensemble data assimilationcorrelation-based localizationstatistical tapersspurious correlationsensemble variancesubsurface modelingpower-law taperlogistic taper

0 comments

The pith

Correlation-based tapers derived from model-data reliability can replace distance-based localization in ensemble data assimilation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates localization in ensemble data assimilation when finite ensembles create noisy covariances that produce spurious updates. Instead of relying on spatial distance, which often fails to reflect actual parameter-data links in flow-driven or nonlinear systems, the authors treat localization as shrinkage in correlation space. They introduce three tapers computed from the statistical reliability of estimated correlations: a generalized power-law, a logistic function from a spike-and-slab prior, and a discrepancy-based taper. Tests on synthetic reservoir problems with scalar and grid parameters show these tapers suppress spurious correlations while sometimes preserving more posterior ensemble variance than distance-based methods, with the logistic taper strongest on variance and smoother tapers better on data match. This matters because it provides a statistically grounded option precisely where traditional distance criteria are unavailable or misleading.

Core claim

The central claim is that tapering coefficients computed from the statistical reliability of estimated model-data correlations suppress spurious correlations while preserving meaningful parameter-data relationships; in the synthetic tests the power-law and logistic tapers retained more posterior ensemble variance than distance-based localization while still achieving acceptable data-match quality, with the logistic taper giving the strongest variance preservation.

What carries the argument

Three statistical tapers (generalized power-law motivated by mean-square-error correction, logistic derived from Bayesian spike-and-slab, and discrepancy-based inspired by Morozov's principle) that act as shrinkage operators on estimated correlations according to their reliability.

If this is right

Correlation-based localization applies directly to cases where spatial distance does not align with parameter-data relationships, such as non-local parameters or prior conditioning effects.
Power-law and logistic tapers can retain higher posterior ensemble variance than distance-based localization while keeping data-match quality acceptable.
The logistic taper favors variance preservation; smoother tapers favor data-match quality.
The approach works for both scalar parameters and grid-based parameters with non-trivial correlation patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reliability-based shrinkage could be tested in ensemble methods outside reservoir applications, such as atmospheric or ocean data assimilation where correlations are strongly flow-dependent.
Hybrid schemes that blend correlation reliability with distance when both are available might reduce sensitivity to the choice of taper.
Because the tapers are derived from ensemble statistics, they could adapt automatically across assimilation cycles without manual retuning.

Load-bearing premise

That the statistical reliability of estimated correlations will consistently separate spurious from meaningful relationships across the range of tested synthetic problems without additional case-specific tuning.

What would settle it

A controlled synthetic experiment in which the reliability metric either fails to taper known spurious correlations or over-tapers known meaningful ones, producing visibly worse posterior variance or data match than distance-based localization.

Figures

Figures reproduced from arXiv: 2605.29922 by Alexandre A. Emerick, Vinicius Luiz Santos Silva.

**Figure 1.** Figure 1: Power taper as function of the correlation coefficient for different combinations of the parameters t0 and β, and ensemble size. The ensemble size enters the taper computation through σ, which is used to define the standardized correlation. (An expression for computing σ is presented later in Section 4.2.3.) As a result, the taper automatically adapts to the ensemble size, with larger ensembles requiring … view at source ↗

**Figure 2.** Figure 2: Logistic taper as function of the correlation coefficient for different combinations of the coefficients t0, γ, and the ensemble size Ne. significant correlations. In practice, t0 is the key parameter of the taper function, as it directly governs the balance between suppressing spurious correlations and preserving meaningful ones. Although t0 can be motivated using the Student-t statistic (see Appendix A.… view at source ↗

**Figure 3.** Figure 3: illustrates the resulting taper for different values of the parameter η and for varying ensemble sizes. The taper has a single parameter, η, which acts as a hard-threshold in t: correlations with standardized magnitude below η are fully suppressed, while larger values are progressively retained. As expected, the need for tapering decreases as the ensemble size increases, leading to less aggressive attenuat… view at source ↗

**Figure 4.** Figure 4: Correlation-based taper functions considered in the test cases (Ne = 100). r(ρe) = fGC 1 − |ρe| 1 − θ , (43) where fGC(·) denotes the Gaspari-Cohn correlation function. Luo and Bhakta (2020) discussed strategies to define the scale parameter θ as a function of the noise level in ρe, which is expected to be on the order of 1/ √ Ne. Although Luo and Bhakta (2020) refer to θ as a threshold, it appears tha… view at source ↗

**Figure 5.** Figure 5: Average data-mismatch objective function, normalized variance and mean offset for different localization methods. Each case represents the average over 10 runs with different initial ensembles, except for the case labeled “Large ensemble,” which corresponds to data assimilation results obtained with an ensemble of size Ne = 20,000 without localization. The dashed blue line in the boxplot highlights an obje… view at source ↗

**Figure 6.** Figure 6: Histogram of the taper values for different localization methods. Test case 1. suppressed and those that are largely retained. This behavior indicates that these tapers are more effective in discriminating between meaningful and spurious correlations. The CGC taper presents the most distinct distribution among the cases analyzed. In particular, it does not produce taper values equal to zero for the choice… view at source ↗

**Figure 7.** Figure 7: Average data-mismatch objective function, normalized variance and mean offset for different localization methods. Each case represents the average over 10 runs with different initial ensembles, except for the case labeled “Large ensemble,” which corresponds to data assimilation results obtained with an ensemble of size Ne = 5,000 without localization. The dashed blue line in the boxplot highlights an objec… view at source ↗

**Figure 8.** Figure 8: Histogram of the taper values for different localization methods. Test case 2. is that it assigns small, or even zero localization coefficients to regions far from the well, even when meaningful correlations exist. This limitation is evidenced by the results labeled 5000 Ref(Corr). We also observe from [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Localization values for horizontal log-permeability in a middle layer. Each row corresponds to an observed data from a different well (orange star). 5000 Ref(loc) refers to localization computed using the reference ensemble with 5,000 realizations. 100 Corr and 5000 Ref(Corr) represent the correlation coefficients computed using ensembles of 100 and 5,000 members, respectively. Test case 2. reference range… view at source ↗

**Figure 10.** Figure 10: Average data-mismatch objective function and normalized variance for different localization methods. Each case represents the average over 10 runs with different initial ensembles, except for the case labeled “Large ensemble,” which corresponds to data assimilation results obtained with an ensemble of size Ne = 10,000 without localization. The dashed blue line in the boxplot highlights an objective functi… view at source ↗

**Figure 11.** Figure 11: Taper values computed for the water-cut data of well 21 (green star) at the final time step. Test case 3. in relevant underestimation of the posterior variance [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Normalized variance of log-permeability after data assimilation with different tapers. Test case 3. 5.4 Test Case 4: Large Number of Model Parameters A potential limitation of correlation-based tapers is related to the dimensionality of the problem, both in terms of the number of model parameters and the number of data points. Two related issues may arise. First, as the problem dimension increases, the qu… view at source ↗

**Figure 13.** Figure 13: Average data-mismatch objective function and normalized variance as functions of ensemble size for the different localization methods. Test case 3. difficult to estimate with finite ensembles, we expect the ability of correlation-based tapers to discriminate between meaningful and spurious correlations to be reduced. These observations suggest that the dimensions Nm and Nd, together with the ensemble size… view at source ↗

**Figure 14.** Figure 14: Reservoir models used in the test case 4. For this test problem, in addition to distance-based localization, we considered the power-law and logistic tapers using two strategies for defining t0. Besides adopting a fixed value of t0 = 2, we also evaluated an adaptive strategy in which t0 is determined from the current distribution of t values. Specifically, t0 was selected as the 90th percentile of the t v… view at source ↗

**Figure 15.** Figure 15: shows the evolution over time of the 90th percentile of the standardized correlation for water-cut data, denoted by tp90, for seven wells uniformly distributed throughout the reservoir, considering models with 1, 10, and 50 layers. This figure shows that tp90 values range approximately from 1.5 to 3.5. Moreover, we observe only a slight reduction in the values of tp90 as the number of model parameters in… view at source ↗

**Figure 16.** Figure 16: Average data-mismatch objective function (first row) and normalized variance (second row) for different localization methods, considering the cases with 1, 10, and 50 layers. Test case 4. data point. This metric provides an estimate of the effective number of model parameters updated, on average, by each observation. Since rij ∈ [0, 1], each taper coefficient can be interpreted as a fractional contributio… view at source ↗

read the original abstract

Localization is essential in ensemble-based data assimilation because finite ensembles produce noisy covariance estimates, causing spurious updates and excessive loss of ensemble variance. In subsurface applications, localization is usually based on spatial distance, but this criterion can be hard to justify when parameter-data relationships are controlled by flow dynamics, nonlinear operators, non-local parameters, or prior conditioning effects. This work investigates correlation-based localization as an alternative strategy in which tapering coefficients are computed from the statistical reliability of estimated model-data correlations. We interpret localization as a shrinkage problem in correlation space and propose three tapers: a generalized power-law taper motivated by mean-square-error correction, a logistic taper derived from a Bayesian spike-and-slab formulation, and a discrepancy-based taper inspired by Morozov's principle. The tapers are evaluated using synthetic reservoir data assimilation problems involving scalar and grid-based parameters, localized flow responses, non-trivial correlation patterns, and increasing model dimension. The results show that correlation-based localization can suppress spurious correlations while preserving meaningful parameter-data relationships. In several cases, the proposed power-law and logistic tapers retained more posterior ensemble variance than distance-based localization while maintaining acceptable data-match quality. The logistic taper provided the strongest variance preservation, whereas smoother tapers favored better data matches. Overall, the results indicate that correlation-based localization is a statistically motivated alternative to distance-based localization, especially when spatial distance is unavailable or misleading.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives three new ways to set correlation-based tapers for localization in ensemble DA, motivated by MSE correction, spike-and-slab, and Morozov discrepancy, and shows they can preserve more variance than distance-based tapers on some reservoir synthetics.

read the letter

The core contribution is the three explicit taper constructions: a generalized power-law drawn from MSE shrinkage ideas, a logistic form from a Bayesian spike-and-slab prior, and a discrepancy taper based on Morozov's principle. These are positioned as a direct alternative when spatial distance is not a reliable proxy for correlation strength, which is common in flow-driven or conditioned subsurface problems.

The work does a clean job of framing localization as shrinkage in correlation space and of testing the tapers on synthetic cases that include both scalar parameters and grid fields with varying correlation structures. In those tests the power-law and logistic versions sometimes retain more posterior variance than standard distance localization while keeping acceptable data match, which is a practical point worth checking.

The main limitations are the lack of quantitative detail in the reported results—no error bars, no tabulated RMSE or variance ratios, and no description of how many ensemble members were used or how the reliability scores were computed exactly. The stress-test concern about circularity is real: the reliability estimates that drive the tapers are derived from the same finite ensemble that produces the noisy correlations, so any sampling artifact that creates spurious correlations can also distort the reliability measure. The synthetics have known truth by construction, which may make the separation look cleaner than it would on real data where the true pattern is unknown.

This is a specialized methods paper aimed at people already working on ensemble data assimilation for reservoir or subsurface models. A reader who needs alternatives to distance-based localization will find concrete formulations to implement and test. The ideas are grounded enough and the motivation is honest enough that the paper should go to peer review rather than desk rejection, though the experiments will need tighter quantification and checks against the circularity issue.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes three correlation-based tapers (generalized power-law motivated by MSE correction, logistic from Bayesian spike-and-slab, and discrepancy-based inspired by Morozov's principle) for localization in ensemble data assimilation. Tapering coefficients are derived from the statistical reliability of estimated model-data correlations rather than spatial distance. The tapers are evaluated on synthetic reservoir problems involving scalar/grid parameters and varying correlation patterns, with the claim that the power-law and logistic tapers can suppress spurious correlations while retaining more posterior ensemble variance than distance-based localization, and that the logistic taper provides the strongest variance preservation.

Significance. If the central claims hold after addressing verification gaps, the work supplies a statistically motivated alternative to distance-based localization for subsurface DA applications where spatial distance is a poor proxy due to flow dynamics or non-local effects. The focus on ensemble variance preservation is relevant for uncertainty quantification. The absence of quantitative benchmarks and independent validation of the reliability measure, however, limits the strength of the current evidence.

major comments (3)

The tapering coefficients rely on reliability estimates (p-values, bootstrap, etc.) computed from the same finite ensemble used to obtain the raw correlations. Because sampling noise that produces spurious correlations will also affect the reliability scores, the separation of spurious from meaningful correlations is not guaranteed to be informative when the true correlation structure is unknown.
The abstract and evaluation sections provide no error bars, exact quantitative comparisons between tapers and distance-based baselines, or details on data exclusion. This prevents verification of the claim that the proposed tapers retained more posterior ensemble variance while maintaining acceptable data-match quality.
The synthetic tests are constructed with known ground-truth correlation structures. This setup permits implicit selection of regimes where the reliability measure succeeds and does not test performance when the true correlation pattern must be discovered without external knowledge.

minor comments (2)

Define the precise functional forms, hyperparameters, and implementation details of the three tapers (including how the reliability measure is converted to a taper coefficient) to enable reproducibility.
Report the ensemble sizes used for correlation estimation and localization, and discuss their relation to the reliability estimation procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: The tapering coefficients rely on reliability estimates (p-values, bootstrap, etc.) computed from the same finite ensemble used to obtain the raw correlations. Because sampling noise that produces spurious correlations will also affect the reliability scores, the separation of spurious from meaningful correlations is not guaranteed to be informative when the true correlation structure is unknown.

Authors: We agree that reliability estimates are subject to the same sampling noise as the correlations themselves, so the separation cannot be guaranteed to be fully informative in all regimes. The tapers are nevertheless constructed as shrinkage operators that downweight low-reliability estimates, and the synthetic results show empirical gains over distance-based localization. We will add a limitations paragraph in the methods section acknowledging this dependence on ensemble-derived statistics. revision: partial
Referee: The abstract and evaluation sections provide no error bars, exact quantitative comparisons between tapers and distance-based baselines, or details on data exclusion. This prevents verification of the claim that the proposed tapers retained more posterior ensemble variance while maintaining acceptable data-match quality.

Authors: The referee is correct; the current manuscript lacks error bars, precise numerical comparisons, and explicit data-exclusion details. We will revise the abstract and results sections to report error bars from repeated ensemble realizations, exact variance and data-mismatch values, and clarify any data handling steps. revision: yes
Referee: The synthetic tests are constructed with known ground-truth correlation structures. This setup permits implicit selection of regimes where the reliability measure succeeds and does not test performance when the true correlation pattern must be discovered without external knowledge.

Authors: The experiments use known ground truth only for post-hoc evaluation; the tapers themselves are computed exclusively from the ensemble without access to the true correlations. The test suite already includes multiple correlation patterns (localized flow, non-local effects) to probe robustness. We will expand the discussion to emphasize applicability when no ground truth is available. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper derives tapers from independent statistical principles (MSE correction, Bayesian spike-and-slab, Morozov's discrepancy) and evaluates them on separate synthetic test cases. No equations reduce to their own inputs by construction, no load-bearing self-citations, and no fitted parameters renamed as predictions. Reliability estimates from the ensemble are a methodological input rather than a tautological reduction of the claimed results. This is the common honest outcome for a methods paper with external motivation and independent validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; full text unavailable so ledger entries are limited to explicitly stated assumptions in the provided text.

axioms (1)

domain assumption Finite ensembles produce noisy covariance estimates causing spurious updates and excessive loss of ensemble variance.
Stated directly in the opening sentence of the abstract as the core motivation for localization.

pith-pipeline@v0.9.1-grok · 5777 in / 1186 out tokens · 23365 ms · 2026-06-29T06:07:22.481779+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 36 canonical work pages

[1]

Anderson, J

DOI: 10.2118/117274-PA. Anderson, J. L. Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter.Physica D: Nonlinear Phenomena, 230(1–2):99–111,

work page doi:10.2118/117274-pa
[2]

Anderson, J

DOI: 10.1016/j.physd.2006.02.011. Anderson, J. L. Localization and sampling error correction in ensemble Kalman filter data assim- ilation.Monthly Weather Review, 140:2359–2371,

work page doi:10.1016/j.physd.2006.02.011 2006
[3]

Bishop, C

DOI: 10.1175/MWR-D-11-00013.1. Bishop, C. H. and Hodyss, D. Flow-adaptive moderation of spurious ensemble correlations and its use in ensemble-based data assimilation.Quarterly Journal of the Royal Meteorological Society, 133(629):2029–2044,

work page doi:10.1175/mwr-d-11-00013.1 2029
[4]

DOI: 10.1002/qj.169. Chen, Y. and Oliver, D. S. Cross-covariance and localization for EnKF in multiphase flow data assimilation.Computational Geosciences, 14(4):579–601,

work page doi:10.1002/qj.169
[5]

Deutsch, C

DOI: 10.1007/s10596-009-9174-6. Deutsch, C. V.Geostatistical reservoir modeling. Oxford University Press,

work page doi:10.1007/s10596-009-9174-6
[6]

Emerick, A

DOI: 10.1007/s10596-010-9198-y. Emerick, A. A. and Reynolds, A. C. Ensemble smoother with multiple data assimilation.Computers & Geosciences, 55:3–15,

work page doi:10.1007/s10596-010-9198-y
[7]

Evensen, G., Raanes, P

DOI: 10.1016/j.cageo.2012.03.011. Evensen, G., Raanes, P. N., Stordal, A. S., and Hove, J. Efficient implementation of an iterative ensemble smoother for data assimilation and reservoir history matching.Frontiers in Applied Mathematics and Statistics, 3,

work page doi:10.1016/j.cageo.2012.03.011 2012
[8]

Evensen, G., Oliver, D

DOI: 10.3389/fams.2019.00047. Evensen, G., Oliver, D. S., and Hanea, R. G.Ensemble History Matching: Conditioning Reservoir Models on Dynamic Data. Springer Cham,

work page doi:10.3389/fams.2019.00047 2019
[9]

A Piecewise Rotation of the Circle, IPR Maps and Their Connection with Translation Surfaces

ISBN 978-3-031-99157-8. DOI: 10.1007/978- 3-031-99155-4. Fisher, R. A. On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1,

work page doi:10.1007/978-
[10]

Flowerdew, J

DOI: 10.1144/petgeo.7.S.S87. Flowerdew, J. Towards a theory of optimal localisation.Tellus A: Dynamic Meteorology and Oceanography, 67(1),

work page doi:10.1144/petgeo.7.s.s87
[11]

Furrer, R

DOI: 10.3402/tellusa.v67.25257. Furrer, R. and Bengtsson, T. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants.Journal of Multivariate Analysis, 98(2):227–255,

work page doi:10.3402/tellusa.v67.25257
[12]

Gaspari, G

DOI: 10.1016/j.jmva.2006.08.003. Gaspari, G. and Cohn, S. E. Construction of correlation functions in two and three di- mensions.Quarterly Journal of the Royal Meteorological Society, 125(554):723–757,

work page doi:10.1016/j.jmva.2006.08.003 2006
[13]

Gnambs, T

DOI: 10.1002/qj.49712555417. Gnambs, T. A brief note on the standard error of the pearson correlation.Collabra: Psychology, 9(1),

work page doi:10.1002/qj.49712555417
[14]

Houtekamer, P

DOI: 10.1525/collabra.87615. Houtekamer, P. L. and Mitchell, H. L. A sequential ensemble Kalman filter for atmospheric data assimilation.Monthly Weather Review, 129(1):123–137,

work page doi:10.1525/collabra.87615
[16]

Lacerda, J

DOI: 10.1016/j.petrol.2018.08.056. Lacerda, J. M., Emerick, A. A., and Pires, A. P. Using a machine learning proxy for localization in ensemble data assimilation.Computational Geosciences, 25:931–944,

work page doi:10.1016/j.petrol.2018.08.056 2018
[17]

DOI: 10.1007/s10596- 020-10031-0. Lee, Y. Sampling error correction in ensemble kalman inversion.arXiv:2105.11341,

work page doi:10.1007/s10596-
[18]

Lorenc, A

DOI: 10.48550/arXiv.2105.11341. Lorenc, A. C. The potential of the ensemble Kalman filter for NWP—a comparison with 4D-Var.Quarterly Journal of the Royal Meteorological Society, 129(595):3183–3203,

work page doi:10.48550/arxiv.2105.11341
[19]

DOI: 10.1256/qj.02.132. Luo, X. and Bhakta, T. Automatic and adaptive localization for ensemble-based history matching. Journal of Petroleum Science and Engineering, 184,

work page doi:10.1256/qj.02.132
[20]

Luo, X., Bhakta, T., and Nævdal, G

DOI: 10.1016/j.petrol.2019.106559. Luo, X., Bhakta, T., and Nævdal, G. Correlation-based adaptive localization with applica- tions to ensemble-based 4D-seismic history matching.SPE Journal, 23(2):396–427,

work page doi:10.1016/j.petrol.2019.106559 2019
[21]

M´ en´ etrier, B., Montmerle, T., Michel, Y., and Berre, L

DOI: 10.2118/185936-PA. M´ en´ etrier, B., Montmerle, T., Michel, Y., and Berre, L. Linear filtering of sample covariances for ensemble-based data assimilation. part i: Optimality criteria and application to variance filtering and covariance localization.Monthly Weather Review, 143(5),

work page doi:10.2118/185936-pa
[22]

Mitchell, T

DOI: 10.1175/MWR-D-14- 00157.1. Mitchell, T. J. and Beauchamp, J. J. Bayesian variable selection in linear regression.Journal of the American Statistical Association, 83(404),

work page doi:10.1175/mwr-d-14-
[23]

Morozov, V.Methods for Solving Incorrectly Posed Problems

DOI: 10.2307/2290129. Morozov, V.Methods for Solving Incorrectly Posed Problems. Springer-Verlag New York,

work page doi:10.2307/2290129
[24]

O’Hara, R

DOI: 10.1007/978-1-4612-5280-1. O’Hara, R. B. and Sillanpaa, M. J. A review of Bayesian variable selection methods: what, how and which.Bayesian Analysis, 4(1),

work page doi:10.1007/978-1-4612-5280-1
[25]

Oliver, D

DOI: 10.1214/09-BA403. Oliver, D. S. and Alfonzo, M. Calibration of imperfect models to biased observations.Computa- tional Geosciences, 22:145–161,

work page doi:10.1214/09-ba403
[26]

33 Oliver, D

DOI: 10.1007/s10596-017-9678-4. 33 Oliver, D. S., Reynolds, A. C., and Liu, N.Inverse Theory for Petroleum Reservoir Charac- terization and History Matching. Cambridge University Press, Cambridge, UK,

work page doi:10.1007/s10596-017-9678-4
[27]

DOI: 10.1017/CBO9780511535642

ISBN 978-0511535642. DOI: 10.1017/CBO9780511535642. Raanes, P. N., Stordal, A. S., and Evensen, G. Revising the stochastic iterative ensemble smoother. Nonlinear Processes in Geophysics, 26(3),

work page doi:10.1017/cbo9780511535642
[28]

Ranazzi, P

DOI: 10.5194/npg-2019-10. Ranazzi, P. H., Luo, X., and Sampaio, M. A. Improving pseudo-optimal Kalman-gain localization using the random shuffle method.Journal of Petroleum Science and Engineering, 215,

work page doi:10.5194/npg-2019-10 2019
[29]

Ranazzi, P

DOI: 10.1016/j.petrol.2022.110589. Ranazzi, P. H., Luo, X., and Sampaio, M. A. Covariance scaling: Theory, extension, and ap- plications to ensemble-based history matching.Computational Geosciences, on line,

work page doi:10.1016/j.petrol.2022.110589 2022
[30]

DOI: 10.1007/s10596-026-10413-w. Rice, J. A.Mathematical Statistics and Data Analysis. Thomson Brooks/Cole, third edition,

work page doi:10.1007/s10596-026-10413-w
[31]

Sakov, P

DOI: 10.1007/s10596-010-9202-6. Sakov, P. and Oke, P. R. Implications of the form of the ensemble transformation in the ensemble square root filters.Monthly Weather Review, 136:1042–1053,

work page doi:10.1007/s10596-010-9202-6
[32]

Silva, V

DOI: 10.1175/2007MWR2021.1. Silva, V. L. S., Seabra, G. S., and Emerick, A. A. Machine learning to enhance the covariance esti- mations of non-local model parameters in ensemble based-data assimilation. InProceedings of the SPE Reservoir Simulation Conference, number SPE-223908-MS, 2025a. DOI: 10.2118/223908- MS. Silva, V. L. S., Seabra, G. S., and Emeric...

work page doi:10.1175/2007mwr2021.1
[33]

Tippett, M

DOI: 10.2307/2331802. Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, J. S. Ensem- ble square-root filters.Monthly Weather Review, 131:1485–1490,

work page doi:10.2307/2331802
[34]

Vishny, D., Morzfeld, M., Gwirtz, K., Bach, E., Dunbar, O

DOI: 10.1175/1520- 0493(2003)131<1485:ESRF>2.0.CO;2. Vishny, D., Morzfeld, M., Gwirtz, K., Bach, E., Dunbar, O. R. A., and Hodyss, D. High-dimensional covariance estimation from a small number of samples.Journal of Advances in Modeling Earth Systems, 16(9),

work page doi:10.1175/1520- 2003
[35]

Vossepoel, F

DOI: 10.1029/2024MS004417. Vossepoel, F. C., Evensen, G., and van Leeuwen, P. J. Adaptive correlation- and distance-based localization for iterative ensemble smoothers in a coupled nonlinear multiscale model.Monthly Weather Review, 153(11),

work page doi:10.1029/2024ms004417
[36]

Zhang, Y

DOI: 10.1175/MWR-D-24-0269.1. Zhang, Y. and Oliver, D. S. Improving the ensemble estimate of the Kalman gain by bootstrap sampling.Mathematical Geosciences, 42:327–345,

work page doi:10.1175/mwr-d-24-0269.1
[37]

A Appendix A.1 Derivation of the MSE taper To derive Eq

DOI: 10.1007/s11004-010-9267-8. A Appendix A.1 Derivation of the MSE taper To derive Eq. 7, we start from the MSE 34 J(r) =E (reρ−ρ)2 (50) =E r2eρ2 −2reρρ+ρ 2 =r 2E eρ2 −2rE[eρ]ρ+ρ

work page doi:10.1007/s11004-010-9267-8
[38]

spike”) or drawn from a continuous distribution allowing nonzero values (the “slab

Then, we compute the derivative with respect torand set to zero, which leads to r⋆ = ρE[eρ] E[eρ2] .(51) Assumingeρis approximately unbiased for moderate ensemble sizes,E[eρ]≈ρ, the optimal taper can be written as r⋆ = ρ2 ρ2 + var[eρ]= ρ2 ρ2 +σ 2 .(52) A.2 Spike-and-Slab Distribution The spike-and-slab distribution is a mixture distribution commonly used ...

1988
[39]

Generalized logistic taper Alternatively, assume that the log-Bayes factor grows proportionally to a power oft: ln (BF(t)) =c(t γ −t γ 0),(107) whereγ >0,c >0, andt 0 >0

The generalization toβ̸= 2 therefore corresponds to assuming that the evidence in favor of inclusion grows as a powert β. Generalized logistic taper Alternatively, assume that the log-Bayes factor grows proportionally to a power oft: ln (BF(t)) =c(t γ −t γ 0),(107) whereγ >0,c >0, andt 0 >0. Exponentiating 107 gives BF(t) = exp(c(t γ −t γ 0)).(108) Substi...

2007

[1] [1]

Anderson, J

DOI: 10.2118/117274-PA. Anderson, J. L. Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter.Physica D: Nonlinear Phenomena, 230(1–2):99–111,

work page doi:10.2118/117274-pa

[2] [2]

Anderson, J

DOI: 10.1016/j.physd.2006.02.011. Anderson, J. L. Localization and sampling error correction in ensemble Kalman filter data assim- ilation.Monthly Weather Review, 140:2359–2371,

work page doi:10.1016/j.physd.2006.02.011 2006

[3] [3]

Bishop, C

DOI: 10.1175/MWR-D-11-00013.1. Bishop, C. H. and Hodyss, D. Flow-adaptive moderation of spurious ensemble correlations and its use in ensemble-based data assimilation.Quarterly Journal of the Royal Meteorological Society, 133(629):2029–2044,

work page doi:10.1175/mwr-d-11-00013.1 2029

[4] [4]

DOI: 10.1002/qj.169. Chen, Y. and Oliver, D. S. Cross-covariance and localization for EnKF in multiphase flow data assimilation.Computational Geosciences, 14(4):579–601,

work page doi:10.1002/qj.169

[5] [5]

Deutsch, C

DOI: 10.1007/s10596-009-9174-6. Deutsch, C. V.Geostatistical reservoir modeling. Oxford University Press,

work page doi:10.1007/s10596-009-9174-6

[6] [6]

Emerick, A

DOI: 10.1007/s10596-010-9198-y. Emerick, A. A. and Reynolds, A. C. Ensemble smoother with multiple data assimilation.Computers & Geosciences, 55:3–15,

work page doi:10.1007/s10596-010-9198-y

[7] [7]

Evensen, G., Raanes, P

DOI: 10.1016/j.cageo.2012.03.011. Evensen, G., Raanes, P. N., Stordal, A. S., and Hove, J. Efficient implementation of an iterative ensemble smoother for data assimilation and reservoir history matching.Frontiers in Applied Mathematics and Statistics, 3,

work page doi:10.1016/j.cageo.2012.03.011 2012

[8] [8]

Evensen, G., Oliver, D

DOI: 10.3389/fams.2019.00047. Evensen, G., Oliver, D. S., and Hanea, R. G.Ensemble History Matching: Conditioning Reservoir Models on Dynamic Data. Springer Cham,

work page doi:10.3389/fams.2019.00047 2019

[9] [9]

A Piecewise Rotation of the Circle, IPR Maps and Their Connection with Translation Surfaces

ISBN 978-3-031-99157-8. DOI: 10.1007/978- 3-031-99155-4. Fisher, R. A. On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1,

work page doi:10.1007/978-

[10] [10]

Flowerdew, J

DOI: 10.1144/petgeo.7.S.S87. Flowerdew, J. Towards a theory of optimal localisation.Tellus A: Dynamic Meteorology and Oceanography, 67(1),

work page doi:10.1144/petgeo.7.s.s87

[11] [11]

Furrer, R

DOI: 10.3402/tellusa.v67.25257. Furrer, R. and Bengtsson, T. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants.Journal of Multivariate Analysis, 98(2):227–255,

work page doi:10.3402/tellusa.v67.25257

[12] [12]

Gaspari, G

DOI: 10.1016/j.jmva.2006.08.003. Gaspari, G. and Cohn, S. E. Construction of correlation functions in two and three di- mensions.Quarterly Journal of the Royal Meteorological Society, 125(554):723–757,

work page doi:10.1016/j.jmva.2006.08.003 2006

[13] [13]

Gnambs, T

DOI: 10.1002/qj.49712555417. Gnambs, T. A brief note on the standard error of the pearson correlation.Collabra: Psychology, 9(1),

work page doi:10.1002/qj.49712555417

[14] [14]

Houtekamer, P

DOI: 10.1525/collabra.87615. Houtekamer, P. L. and Mitchell, H. L. A sequential ensemble Kalman filter for atmospheric data assimilation.Monthly Weather Review, 129(1):123–137,

work page doi:10.1525/collabra.87615

[15] [16]

Lacerda, J

DOI: 10.1016/j.petrol.2018.08.056. Lacerda, J. M., Emerick, A. A., and Pires, A. P. Using a machine learning proxy for localization in ensemble data assimilation.Computational Geosciences, 25:931–944,

work page doi:10.1016/j.petrol.2018.08.056 2018

[16] [17]

DOI: 10.1007/s10596- 020-10031-0. Lee, Y. Sampling error correction in ensemble kalman inversion.arXiv:2105.11341,

work page doi:10.1007/s10596-

[17] [18]

Lorenc, A

DOI: 10.48550/arXiv.2105.11341. Lorenc, A. C. The potential of the ensemble Kalman filter for NWP—a comparison with 4D-Var.Quarterly Journal of the Royal Meteorological Society, 129(595):3183–3203,

work page doi:10.48550/arxiv.2105.11341

[18] [19]

DOI: 10.1256/qj.02.132. Luo, X. and Bhakta, T. Automatic and adaptive localization for ensemble-based history matching. Journal of Petroleum Science and Engineering, 184,

work page doi:10.1256/qj.02.132

[19] [20]

Luo, X., Bhakta, T., and Nævdal, G

DOI: 10.1016/j.petrol.2019.106559. Luo, X., Bhakta, T., and Nævdal, G. Correlation-based adaptive localization with applica- tions to ensemble-based 4D-seismic history matching.SPE Journal, 23(2):396–427,

work page doi:10.1016/j.petrol.2019.106559 2019

[20] [21]

M´ en´ etrier, B., Montmerle, T., Michel, Y., and Berre, L

DOI: 10.2118/185936-PA. M´ en´ etrier, B., Montmerle, T., Michel, Y., and Berre, L. Linear filtering of sample covariances for ensemble-based data assimilation. part i: Optimality criteria and application to variance filtering and covariance localization.Monthly Weather Review, 143(5),

work page doi:10.2118/185936-pa

[21] [22]

Mitchell, T

DOI: 10.1175/MWR-D-14- 00157.1. Mitchell, T. J. and Beauchamp, J. J. Bayesian variable selection in linear regression.Journal of the American Statistical Association, 83(404),

work page doi:10.1175/mwr-d-14-

[22] [23]

Morozov, V.Methods for Solving Incorrectly Posed Problems

DOI: 10.2307/2290129. Morozov, V.Methods for Solving Incorrectly Posed Problems. Springer-Verlag New York,

work page doi:10.2307/2290129

[23] [24]

O’Hara, R

DOI: 10.1007/978-1-4612-5280-1. O’Hara, R. B. and Sillanpaa, M. J. A review of Bayesian variable selection methods: what, how and which.Bayesian Analysis, 4(1),

work page doi:10.1007/978-1-4612-5280-1

[24] [25]

Oliver, D

DOI: 10.1214/09-BA403. Oliver, D. S. and Alfonzo, M. Calibration of imperfect models to biased observations.Computa- tional Geosciences, 22:145–161,

work page doi:10.1214/09-ba403

[25] [26]

33 Oliver, D

DOI: 10.1007/s10596-017-9678-4. 33 Oliver, D. S., Reynolds, A. C., and Liu, N.Inverse Theory for Petroleum Reservoir Charac- terization and History Matching. Cambridge University Press, Cambridge, UK,

work page doi:10.1007/s10596-017-9678-4

[26] [27]

DOI: 10.1017/CBO9780511535642

ISBN 978-0511535642. DOI: 10.1017/CBO9780511535642. Raanes, P. N., Stordal, A. S., and Evensen, G. Revising the stochastic iterative ensemble smoother. Nonlinear Processes in Geophysics, 26(3),

work page doi:10.1017/cbo9780511535642

[27] [28]

Ranazzi, P

DOI: 10.5194/npg-2019-10. Ranazzi, P. H., Luo, X., and Sampaio, M. A. Improving pseudo-optimal Kalman-gain localization using the random shuffle method.Journal of Petroleum Science and Engineering, 215,

work page doi:10.5194/npg-2019-10 2019

[28] [29]

Ranazzi, P

DOI: 10.1016/j.petrol.2022.110589. Ranazzi, P. H., Luo, X., and Sampaio, M. A. Covariance scaling: Theory, extension, and ap- plications to ensemble-based history matching.Computational Geosciences, on line,

work page doi:10.1016/j.petrol.2022.110589 2022

[29] [30]

DOI: 10.1007/s10596-026-10413-w. Rice, J. A.Mathematical Statistics and Data Analysis. Thomson Brooks/Cole, third edition,

work page doi:10.1007/s10596-026-10413-w

[30] [31]

Sakov, P

DOI: 10.1007/s10596-010-9202-6. Sakov, P. and Oke, P. R. Implications of the form of the ensemble transformation in the ensemble square root filters.Monthly Weather Review, 136:1042–1053,

work page doi:10.1007/s10596-010-9202-6

[31] [32]

Silva, V

DOI: 10.1175/2007MWR2021.1. Silva, V. L. S., Seabra, G. S., and Emerick, A. A. Machine learning to enhance the covariance esti- mations of non-local model parameters in ensemble based-data assimilation. InProceedings of the SPE Reservoir Simulation Conference, number SPE-223908-MS, 2025a. DOI: 10.2118/223908- MS. Silva, V. L. S., Seabra, G. S., and Emeric...

work page doi:10.1175/2007mwr2021.1

[32] [33]

Tippett, M

DOI: 10.2307/2331802. Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, J. S. Ensem- ble square-root filters.Monthly Weather Review, 131:1485–1490,

work page doi:10.2307/2331802

[33] [34]

Vishny, D., Morzfeld, M., Gwirtz, K., Bach, E., Dunbar, O

DOI: 10.1175/1520- 0493(2003)131<1485:ESRF>2.0.CO;2. Vishny, D., Morzfeld, M., Gwirtz, K., Bach, E., Dunbar, O. R. A., and Hodyss, D. High-dimensional covariance estimation from a small number of samples.Journal of Advances in Modeling Earth Systems, 16(9),

work page doi:10.1175/1520- 2003

[34] [35]

Vossepoel, F

DOI: 10.1029/2024MS004417. Vossepoel, F. C., Evensen, G., and van Leeuwen, P. J. Adaptive correlation- and distance-based localization for iterative ensemble smoothers in a coupled nonlinear multiscale model.Monthly Weather Review, 153(11),

work page doi:10.1029/2024ms004417

[35] [36]

Zhang, Y

DOI: 10.1175/MWR-D-24-0269.1. Zhang, Y. and Oliver, D. S. Improving the ensemble estimate of the Kalman gain by bootstrap sampling.Mathematical Geosciences, 42:327–345,

work page doi:10.1175/mwr-d-24-0269.1

[36] [37]

A Appendix A.1 Derivation of the MSE taper To derive Eq

DOI: 10.1007/s11004-010-9267-8. A Appendix A.1 Derivation of the MSE taper To derive Eq. 7, we start from the MSE 34 J(r) =E (reρ−ρ)2 (50) =E r2eρ2 −2reρρ+ρ 2 =r 2E eρ2 −2rE[eρ]ρ+ρ

work page doi:10.1007/s11004-010-9267-8

[37] [38]

spike”) or drawn from a continuous distribution allowing nonzero values (the “slab

Then, we compute the derivative with respect torand set to zero, which leads to r⋆ = ρE[eρ] E[eρ2] .(51) Assumingeρis approximately unbiased for moderate ensemble sizes,E[eρ]≈ρ, the optimal taper can be written as r⋆ = ρ2 ρ2 + var[eρ]= ρ2 ρ2 +σ 2 .(52) A.2 Spike-and-Slab Distribution The spike-and-slab distribution is a mixture distribution commonly used ...

1988

[38] [39]

Generalized logistic taper Alternatively, assume that the log-Bayes factor grows proportionally to a power oft: ln (BF(t)) =c(t γ −t γ 0),(107) whereγ >0,c >0, andt 0 >0

The generalization toβ̸= 2 therefore corresponds to assuming that the evidence in favor of inclusion grows as a powert β. Generalized logistic taper Alternatively, assume that the log-Bayes factor grows proportionally to a power oft: ln (BF(t)) =c(t γ −t γ 0),(107) whereγ >0,c >0, andt 0 >0. Exponentiating 107 gives BF(t) = exp(c(t γ −t γ 0)).(108) Substi...

2007