ICCDesign: An R Package for the Design and Analysis of ICC-Based Reliability Studies with Continuous Responses
Pith reviewed 2026-06-28 13:20 UTC · model grok-4.3
The pith
The ICCDesign R package provides an integrated workflow for estimating, planning, and evaluating intraclass correlations in reliability studies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ICCDesign integrates four core functionalities for ICC-based reliability studies with continuous responses: point estimation and ANOVA-based confidence intervals for supported ICC forms following the McGraw and Wong framework with a four-step decision guide, sample size planning based on Zou's closed-form formulas, automated reliability evaluation using Koo and Li criteria, and an interactive Shiny web application.
What carries the argument
The ICCDesign package and its built-in four-step decision framework that guides selection of the appropriate ICC form under the McGraw and Wong framework.
Load-bearing premise
The built-in four-step decision framework correctly maps user study designs to the appropriate ICC form under the McGraw and Wong framework and the package implementations match the cited methods without coding errors.
What would settle it
Compare the package output for ICC point estimate and confidence interval on a standard dataset to results obtained from direct implementation of the McGraw and Wong ANOVA formulas or other established packages.
read the original abstract
The intraclass correlation coefficient (ICC) is among the most widely used statistics in reliability research, playing a central role in medical measurement, psychological assessment, and behavioral science. However, practical application of ICC faces two major obstacles. First, ICC can be organized into multiple forms under the McGraw and Wong (1996) framework -- including six widely reported standard forms and four additional design combinations -- and researchers must select the appropriate form based on their study design, yet existing guidelines are not always operationalized in software interfaces. Second, available R tools are highly fragmented: sample size calculation, ICC estimation with confidence intervals, and reliability evaluation are distributed across separate packages, compelling researchers to switch between tools and increasing the risk of analytical errors. This paper introduces the ICCDesign package, designed specifically to provide an integrated workflow for ICC-based reliability studies with continuous responses, assuming one continuous rating per subject-rater cell. The package integrates four core functionalities: (1) point estimation, ANOVA-based confidence intervals, and implemented hypothesis tests for supported ICC design combinations following the McGraw and Wong (1996) framework, with a built-in four-step decision framework guiding users toward an appropriate ICC form; (2) sample size planning based on Zou's (2012) closed-form formulas, supporting two planning modes and an inverse assurance calculation; (3) automated reliability evaluation based on Koo and Li (2016) criteria, with an uncertainty notification when the confidence interval spans the 0.75 good-reliability threshold; and (4) an interactive Shiny web application covering the main analysis and planning functionalities. ICCDesign is available from GitHub at https://github.com/KlariZhang/ICCDesign.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the ICCDesign R package for ICC-based reliability studies with continuous responses. It claims to integrate (1) point estimation, ANOVA-based CIs, and hypothesis tests for McGraw & Wong (1996) ICC forms via a built-in four-step decision framework, (2) sample-size planning using Zou (2012) closed-form formulas in two modes plus inverse assurance, (3) automated reliability evaluation per Koo & Li (2016) criteria with uncertainty notification, and (4) a Shiny web application, addressing fragmentation across existing R tools.
Significance. If the implementations prove correct, the package would usefully consolidate sample-size planning, estimation, and evaluation into one workflow with usability aids, reducing switching errors for researchers in medical measurement and behavioral sciences. The decision framework and Shiny component add practical value. However, the complete absence of any validation, test cases, or numerical checks against the cited sources substantially lowers the assessed significance, as the contribution rests entirely on the unverified claim of faithful integration.
major comments (2)
- [Abstract] Abstract and overall manuscript: the central claim that the package correctly implements the McGraw & Wong (1996) forms via a four-step decision framework is unsupported, because the manuscript provides neither the decision logic, pseudocode, nor any worked examples showing how user designs are mapped to specific ICC forms.
- [Abstract] Abstract and overall manuscript: no section supplies validation, test cases, side-by-side numerical comparisons against Zou (2012) formulas, McGraw & Wong (1996) CIs, Koo & Li (2016) thresholds, or outputs from other packages; this directly undermines the claim that the integrated functionalities are correctly implemented.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive criticism. The comments correctly identify that the manuscript lacks explicit documentation of the decision framework and any form of validation or numerical checks. We address each point below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Abstract] Abstract and overall manuscript: the central claim that the package correctly implements the McGraw & Wong (1996) forms via a four-step decision framework is unsupported, because the manuscript provides neither the decision logic, pseudocode, nor any worked examples showing how user designs are mapped to specific ICC forms.
Authors: We agree that the four-step decision framework is described only at a high level in the current manuscript. In the revised version we will add (i) the explicit decision logic in both text and pseudocode, (ii) a table mapping common study-design features (number of raters, fixed vs. random, etc.) to the six standard McGraw & Wong forms plus the four additional combinations, and (iii) two fully worked examples that trace a user-specified design through the four steps to the resulting ICC form, ANOVA model, and confidence-interval formula. revision: yes
-
Referee: [Abstract] Abstract and overall manuscript: no section supplies validation, test cases, side-by-side numerical comparisons against Zou (2012) formulas, McGraw & Wong (1996) CIs, Koo & Li (2016) thresholds, or outputs from other packages; this directly undermines the claim that the integrated functionalities are correctly implemented.
Authors: We acknowledge that the manuscript currently contains no validation material. The revised manuscript will include a new “Validation” section containing: (a) unit-test results for the Zou (2012) sample-size formulas against the original closed-form expressions, (b) side-by-side numerical comparisons of ICC point estimates and ANOVA-based CIs with the irr and psych packages for the same data sets, (c) verification that Koo & Li (2016) reliability labels are assigned correctly, including the uncertainty notification when a CI straddles 0.75, and (d) a small set of reproducible R code snippets that readers can run to reproduce the comparisons. revision: yes
Circularity Check
No circularity: software wrapper around externally published methods
full rationale
The paper introduces an R package that integrates four functionalities by wrapping previously published methods: McGraw and Wong (1996) ICC forms with a four-step decision framework, Zou (2012) sample-size formulas, and Koo and Li (2016) reliability criteria. No new derivations, predictions, fitted parameters, or first-principles results appear in the manuscript. The central claim is the provision of an integrated workflow and Shiny app; all load-bearing statistical content is imported from external citations whose validity is independent of the present work. No self-citation chains, ansatzes, or renamings reduce any claim to its own inputs by construction. This is the expected outcome for a software-description paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Brueckl, M. (2022). irrNA: Coefficients of Interrater Reliability – Generalized for Randomly In- complete Datasets . R package version 0.2.2. https://CRAN.R-project.org/package=irrNA 21
2022
-
[2]
Gamer, M., Lemon, J., & Singh, I. F. P. (2019). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84.1. https://CRAN.R-project.org/package=irr
2019
-
[3]
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correla- tion coefficients for reliability research. Journal of Chiropractic Medicine , 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012
-
[4]
Liu, Z., Ma, R., Gao, C., & Zhang, Y. (2026). ICCDesign: An R Package for ICC-Based Reliability Studies. Version 0.1.0. https://github.com/KlariZhang/ICCDesign
2026
-
[5]
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. https://doi.org/10.1037/1082-989X.1.1.30 R Core Team (2024). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
-
[6]
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research . R package version 2.4.3. https://CRAN.R-project.org/package=psych
2024
-
[7]
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin , 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420
-
[8]
Wickham, H., Hester, J., Chang, W., & Bryan, J. (2022). devtools: Tools to Make Developing R Packages Easier . R package version 2.4.5. https://CRAN.R-project.org/package=devtools
2022
-
[9]
E., Fairbairn, D
Wolak, M. E., Fairbairn, D. J., & Paulsen, Y. R. (2012). ICC.Sample.Size: Calcu- lation of Sample Size and Power for ICC . R package version 1.0. https://CRAN.R- project.org/package=ICC.Sample.Size
2012
-
[10]
Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with pre- cision and assurance. Statistics in Medicine , 31(29), 3972–3981. https://doi.org/10.1002/sim.5466 22
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.