pith. machine review for the scientific record. sign in

arxiv: 2605.06496 · v1 · submitted 2026-05-07 · 📊 stat.ME · stat.AP

Recognition: unknown

Bivariate Frank Copula: Some More Results on Point Estimation of the Association Parameter from a Bayesian Perspective and Revisiting the Goodness of Fit Tests with an Application to Model Groundwater Data from Dong Thap, Vietnam

Dung T. Nguyen, Nabendu Pal, Thi-Yen-Anh Pham

Pith reviewed 2026-05-08 07:19 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords Frank copulaBayes estimatorJeffreys priormaximum likelihood estimatorgoodness of fit testassociation parametergroundwater data
0
0 comments X

The pith

For small samples the Jeffreys prior Bayes estimator for the Frank copula association parameter has lower mean squared error than the maximum likelihood estimator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares point estimators for the dependence parameter of a bivariate Frank copula. Monte Carlo simulations show that the Bayes estimator constructed with the Jeffreys prior achieves uniformly lower mean squared error than both the maximum likelihood estimator and the Bayes estimator under a generalized flat prior when the sample size is at most 25. For moderate and large samples the three estimators perform comparably in bias and mean squared error. The work then fits the Frank copula to groundwater arsenic and hydrochemical measurements from Vietnam and supplies simulated critical values for several goodness-of-fit tests after documenting certain non-intuitive behaviors of the test statistics.

Core claim

Simulation studies demonstrate that the Bayes estimator under the Jeffreys prior for the dependence parameter of the bivariate Frank copula yields lower mean squared error than both the generalized flat prior Bayes estimator and the maximum likelihood estimator when the sample size is 25 or smaller. For moderate and large samples the three estimators exhibit comparable bias and mean squared error. Computational issues that can affect maximum likelihood estimation for very small samples are also noted. The Frank copula is applied to a Vietnamese groundwater dataset, and extensive simulated critical value tables are provided for the goodness-of-fit tests after examining their behavior.

What carries the argument

The Bayes estimator for the Frank copula association parameter under the Jeffreys prior, assessed by direct comparison of mean squared error in finite-sample Monte Carlo experiments.

If this is right

  • For bivariate datasets of size 25 or less the Jeffreys prior Bayes estimator should be used to reduce error in estimating the strength of association.
  • Maximum likelihood computation for the Frank copula requires care when samples are very small to avoid numerical instability.
  • Goodness-of-fit testing for copula models can rely on the supplied simulated critical value tables to obtain more stable p-values.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reported advantage for small samples indicates that Bayesian estimators with non-informative priors may be preferable in environmental studies where sample sizes are frequently limited.
  • The non-intuitive test statistic behavior documented in the paper suggests that simulation-based calibration remains essential even for well-studied copula families.
  • The critical value tables can be reused by analysts fitting Frank copulas to other paired environmental or hydrological measurements.

Load-bearing premise

The simulation designs and the assumption that the groundwater observations truly follow a bivariate Frank copula distribution are representative of performance under real data-generating processes.

What would settle it

A new set of Monte Carlo replications with sample size 20 drawn from a Frank copula with known parameter value in which the mean squared error of the Jeffreys prior estimator is not smaller than that of the maximum likelihood estimator would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2605.06496 by Dung T. Nguyen, Nabendu Pal, Thi-Yen-Anh Pham.

Figure 1.1
Figure 1.1. Figure 1.1: Plots of ρK (solid line) and ρS (dash line) as functions of θ. 5 view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: The priors πF P (θ) and πJP (θ) plotted against the association parameter θ. In the next Section 3 we have presented the results of our simulation study to compare the two Bayes estimators against the MLE in terms of bias and MSE. Computation of the BJPE requires triple integrations, and this is done with utmost care to attain a high precision as discussed later. For this reason, each computed value of b… view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Relative frequency histogram of Sn with n = 25 at θ = −3 and θ = +3 view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Relative frequency histogram of Tn with n = 25 at θ = −3 and θ = +3 Remark 4.2. An interesting feature that has been observed in our comprehensive simu￾lation study is that the critical values of the statistics Sn and Tn tend to decrease as the association parameter θ increases. This behavior can be explained by examining the dis￾tribution function K(t, θ) of the random variable W = C(U, V | θ) (see (1.1… view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: Plots of the critical values with θ ∈ (−25, 25) and n = 25, 50, 75, 100 Remark 4.3. So the big question is: ‘How to implement the GoF test for Frank Copula?’ Since θ is unknown, a lot depends on its estimated value ˆθ, and in this work we have used the MLE of θ to play this role. As seen in Section 3 (comparison of the MLE against BFPE and BJPE) and in Pham et al.(2025) [17] (comparison of the MLE agains… view at source ↗
read the original abstract

This work has two major parts. First, we extend the recent study of Pham et al. (2025) on point estimation of the association parameter of a bivariate Frank copula. We investigate two Bayes estimators under the generalized flat prior and the Jeffreys prior, and compare them with the maximum likelihood estimator (MLE). Simulation results show that, for small sample sizes (n <= 25), the Bayes estimator under the Jeffreys prior uniformly outperforms both the generalized flat prior estimator and the MLE in terms of mean squared error (MSE). For moderate and large sample sizes, all estimators have very similar performances in terms of bias and MSE. We also discuss computational issues in the R package implementation that may significantly affect the computation of the MLE for very small samples. In the second part, we apply the Frank copula to analyze the association between groundwater arsenic concentration and other hydrochemical variables using a recent dataset from Vietnam. We revisit the goodness-of-fit tests proposed by Genest et al. (2006), investigate several non-intuitive behaviors of the test statistics, and provide extensive simulated critical value tables. Our results complement and refine the computational findings reported in the earlier literature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript extends prior work on estimating the association parameter θ of the bivariate Frank copula by deriving and comparing two Bayesian point estimators (under Jeffreys prior and generalized flat prior) against the MLE. Monte Carlo simulations are used to claim that the Jeffreys Bayes estimator uniformly achieves lower MSE for n ≤ 25, with comparable performance for larger n; computational issues with MLE in small samples are also discussed. The second part applies the Frank copula to groundwater arsenic and hydrochemical data from Vietnam, revisits Genest et al. (2006) GOF tests, examines non-intuitive behaviors of the test statistics, and supplies extensive simulated critical-value tables.

Significance. If the simulation results hold under representative designs, the small-sample MSE advantage provides actionable guidance for copula parameter estimation in data-scarce settings common to environmental statistics. The new critical-value tables and analysis of GOF test behavior refine existing computational tools and could improve reliability of copula-based inference for practitioners.

major comments (2)
  1. [Simulation study] Simulation study: the central claim that the Jeffreys Bayes estimator 'uniformly outperforms' both the generalized flat prior estimator and the MLE in MSE for all n ≤ 25 is load-bearing for the paper's first contribution. The manuscript must specify the grid of true θ values (including near 0 and the boundaries where the copula density flattens), whether marginal parameters are treated as known or estimated jointly, the number of Monte Carlo replications, and the exact data-generating process for the margins. Without these details the reported advantage risks being an artifact of the chosen simulation regime rather than a general property.
  2. [Estimator comparison] Comparison of estimators: the paper correctly flags numerical instability of the MLE for very small n, but the MSE tables should isolate the contribution of the Jeffreys prior from MLE failures (e.g., by reporting results under stabilized optimization or by excluding divergent MLE cases). If the advantage disappears under such controls, the attribution to the prior requires revision.
minor comments (3)
  1. [Abstract and simulation section] The abstract and simulation section should briefly state the range of θ, sample sizes, and replication count used to support the 'uniformly outperforms' phrasing.
  2. [GOF critical-value tables] For the GOF tables, indicate the exact grid of θ and n values over which critical values were simulated so readers can assess coverage.
  3. [Notation] Minor typographical inconsistencies in notation for the Frank copula density and prior densities should be harmonized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. The comments identify important areas for improving the transparency and robustness of our simulation results and estimator comparisons. We address each point below and will incorporate the necessary revisions.

read point-by-point responses
  1. Referee: [Simulation study] Simulation study: the central claim that the Jeffreys Bayes estimator 'uniformly outperforms' both the generalized flat prior estimator and the MLE in MSE for all n ≤ 25 is load-bearing for the paper's first contribution. The manuscript must specify the grid of true θ values (including near 0 and the boundaries where the copula density flattens), whether marginal parameters are treated as known or estimated jointly, the number of Monte Carlo replications, and the exact data-generating process for the margins. Without these details the reported advantage risks being an artifact of the chosen simulation regime rather than a general property.

    Authors: We agree that complete specification of the simulation design is required to substantiate the central claim and ensure reproducibility. Although Section 3 of the manuscript outlines the Monte Carlo study, we acknowledge that certain details could be stated more explicitly. In the revised manuscript we will expand this section to include the precise grid of θ values (covering the interior, near-zero, and boundary regions), whether marginal parameters are held fixed or estimated jointly, the exact number of replications performed, and the full data-generating process for the margins. These additions will allow readers to confirm that the reported MSE advantage is not an artifact of the chosen regime. revision: yes

  2. Referee: [Estimator comparison] Comparison of estimators: the paper correctly flags numerical instability of the MLE for very small n, but the MSE tables should isolate the contribution of the Jeffreys prior from MLE failures (e.g., by reporting results under stabilized optimization or by excluding divergent MLE cases). If the advantage disappears under such controls, the attribution to the prior requires revision.

    Authors: We appreciate the referee's suggestion to separate the effect of the Jeffreys prior from MLE optimization failures. The manuscript already notes numerical instability of the MLE for very small samples. In the revision we will augment the results by reporting the proportion of non-convergent MLE cases, providing MSE comparisons both with and without those cases, and attempting stabilized optimization (e.g., improved starting values or constrained routines) where feasible. This will clarify the extent to which the observed advantage stems from the prior versus numerical stability. If the advantage is found to diminish under these controls, we will revise the interpretation and conclusions accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: performance metrics derived from external simulation truth and independent critical-value tables

full rationale

The paper's core claims rest on Monte Carlo simulations that evaluate Bayes and MLE estimators against independently fixed true values of the Frank copula association parameter θ; MSE and bias are computed directly from these known truths rather than from any fitted quantity. Goodness-of-fit procedures invoke externally tabulated critical values from Genest et al. (2006) and new simulated tables, with no equations or definitions that reduce the reported test statistics or estimator rankings to quantities defined by the estimators themselves. Although the work extends Pham et al. (2025), this is a standard incremental extension with fresh simulation grids and real-data application; the cited prior work supplies context but does not supply the load-bearing numerical results or force the current conclusions by construction. No self-definitional loops, fitted-input renamings, or ansatz smuggling appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that data are i.i.d. draws from a bivariate Frank copula and that the simulation designs adequately represent real-data behavior; no new entities are postulated.

free parameters (1)
  • association parameter theta
    The dependence parameter whose point estimate is the object of comparison; it is estimated rather than fixed ad hoc.
axioms (1)
  • domain assumption Observations are independent and identically distributed from the bivariate Frank copula
    Invoked for both the simulation study and the groundwater application.

pith-pipeline@v0.9.0 · 5541 in / 1275 out tokens · 58133 ms · 2026-05-08T07:19:09.880514+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 12 canonical work pages

  1. [1]

    Berg, M., Stengel, C., Trang, P. T. K., Viet, P. H., Sampson, M. L., Leng, M., Samreth, S., and Fredericks, D. (2007). Magnitude of arsenic pollution in the Mekong and Red River Deltas-Cambodia and Vietnam.Science of the Total Environment, 372(2–3), 413–425.https://doi.org/10.1016/j.scitotenv.2006.09.010

  2. [2]

    Bouy´ e, E., Durrleman, V., Nikeghbali, A., Riboulet, G., and Roncalli, T. (2000). Copulas for Finance: A Reading Guide and Some Applications. Groupe de Recherche Op´ erationnelle, Cr´ edit Lyonnais, Paris

  3. [3]

    (2004).Copula Methods in Finance

    Cherubini, U., Luciano, E., and Vecchiato, W. (2004).Copula Methods in Finance. John Wiley & Sons, Chichester, UK

  4. [4]

    J., and Straumann, D

    Embrechts, P., McNeil, A. J., and Straumann, D. (2002). Correlation and depen- dency in risk management: Properties and pitfalls. In Dempster, M. (Ed.),Risk Management: Value at Risk and Beyond, pp. 176–223. Cambridge University Press, Cambridge.https://doi.org/10.1017/CBO9780511615338

  5. [5]

    and MacKay, J

    Genest, C. and MacKay, J. (1986). The Joy of Copulas: Bivariate Distributions with Uniform Marginals.The American Statistician, Vol. 40, No. 4, 280–283.https: //doi.org/10.1080/00031305.1986.1047541

  6. [6]

    Genest, C. (1987). Frank’s Family of Bivariate Distributions.Biometrika, 74, 549- 555.https://doi.org/10.1093/biomet/74.3.549

  7. [7]

    and R´ emillard, B

    Genest, C., Quesy, J.-F. and R´ emillard, B. (2006). Goodness-of-fit Procedures for Copula Models Based on the Probability Integral Transformation,Scandinavian Journal of Statistics, Vol. 33, 337–366,https://doi.org/10.1111/j.1467-9469. 2006.00470.x

  8. [8]

    G., Cool, C

    Huisman, R., Koedijk, C. G., Cool, C. J., and Palm, F. C. (2001). Tail-index esti- mates in small samples.Journal of Business & Economic Statistics, 19(2), 245–254. https://doi.org/10.1198/073500101316970421

  9. [9]

    Jaynes, E. T. (1968). Prior probabilities.IEEE Transactions on Systems Science and Cybernetics, 4(3), 227–241.https://doi.org/10.1109/TSSC.1968.300117

  10. [10]

    Jeffreys, H. (1946). An invariant form for the prior probability in estimation prob- lems.Proceedings of the Royal Society of London. Series A, 186(1007), 453–461. https://doi.org/10.1098/rspa.1946.0056

  11. [11]

    Junker, M., Szimayer, A., and Wagner, N. (2006). Nonlinear term structure de- pendence: Copula functions, empirics and risk implications.Journal of Banking & Finance, 30(4), 1171–1199.https://doi.org/10.1016/j.jbankfin.2005.03.003 26

  12. [12]

    and Vecchiato, W

    Meneguzzo, D. and Vecchiato, W. (2004). Copula sensitivity in collateralized debt obligations and basket default swaps.The Journal of Futures Markets, 24(1), 37–70. https://doi.org/10.1002/fut.10107

  13. [13]

    B., Hien, T

    Merola, R. B., Hien, T. T., Quyen, D. T. T., and Vengosh, A. (2015). Arsenic exposure to drinking water in the Mekong Delta.Science of the Total Environment, 511, 544–552.https://doi.org/10.1016/j.scitotenv.2014.12.091

  14. [14]

    Nguyen, P. K. (2008).Geochemical study of arsenic behavior in aquifer of the Mekong Delta, Vietnam. Ph.D. dissertation, Kyushu University, Japan.https://ere.mine. kyushu-u.ac.jp/old/sotsuron/pdfs/2008/kim.pdf

  15. [15]

    Pham, C. H. V. (2015).Studying the mechanisms of arsenic release in groundwater in An Phu district, An Giang province. Master’s thesis, University of Technology, Ho Chi Minh City, Vietnam

  16. [16]

    Pham, C. H. V., Ho, T. N. H., Frustchi, M., Wang, Y., Bernier, R., and Vo, L. P. (2015). Spatial and temporal variation of arsenic occurrence and physiogeochemical influence on arsenic in groundwater in the Vietnamese Mekong Delta: A case study of An Phu district, An Giang province.Journal of Science and Technology (Vietnam Academy of Science and Technolo...

  17. [17]

    U., and Pal, N

    Pham, T.Y.A., Huynh, T. U., and Pal, N. (2025). Some results on point estimation of the association parameter of a bivariate Frank copula.Communications in Statistics- Simulation and Computation, 1-22.https://doi.org/10.1080/03610918.2025. 2545611

  18. [18]

    Brent”). SolveH(θ) = 0 usingnle- qslv(method = “Broyden

    Sklar, M. (1959). Fonctions de repartition an dimensions et leurs marges.Publ. Inst. Statist. Univ. Paris, 8, 229–231. 27 Appendix A.1 Groundwater data from Dong Thap, Vietnam (Merola, et al.(2015)) South North No. Well ID As (ppb) Cl (ppm) Eh (mV) pH Well ID As (ppb) Cl (ppm) Eh (mV) pH 1 DT7 563.9 107.0 -126 6.78 TH16 0.4 173.6 157 6.14 2 DT6 0.5 56.1 1...