arxiv: 2604.02992 · v1 · submitted 2026-04-03 · 📊 stat.OT · stat.AP· stat.ME

Recognition: no theorem link

Why is Regularization Underused? An Empirical Study on Trust and Adoption of Statistical Methods

Konstantin Emil Thiel , Marl\'ene Baumeister , Nicole Kr\"amer , Andreas Groll , Markus Pauly , Magdalena Wischnewski

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:46 UTC · model grok-4.3

classification 📊 stat.OT stat.APstat.ME

keywords regularization methodsadoption intentionstechnology acceptancetrust in statisticssocial normssurvey experimentstatistical practiceease of implementation

0 comments

The pith

Survey of 606 analysts finds recommendations do not increase trust or intended use of regularization methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines why regularization techniques, which help reduce overfitting, remain underused in practice despite wide availability in software. Researchers conducted a large-scale survey of data analysts and embedded a randomized experiment to test whether written recommendations boost trust or adoption intentions. No such effect emerged from the recommendations. Instead, intentions to adopt the methods tracked closely with analysts' views on implementation ease, concrete benefits such as better bias control and interpretability, and perceived social norms among peers. The results point to the need for promotion efforts that address usability and community practice rather than formal endorsements alone.

Core claim

Drawing on a survey of 606 practitioners and a randomized experiment, the authors conclude that written recommendations of regularization methods have no discernible effect on trust or intended use. Adoption intentions instead depend primarily on analysts' views of implementation ease, practical advantages such as improved bias control or interpretability, and prevailing social norms.

What carries the argument

The survey instrument based on technology acceptance frameworks, combined with an embedded randomized experiment that measured trust, acceptance, and factors including perceived ease of use and social norms for regularization techniques.

If this is right

Adoption of statistical methods depends more on perceived ease of implementation and practical benefits than on formal recommendations.
Social norms within analyst communities act as a strong driver of intentions to use regularization.
Promotion of new methods should prioritize demonstrating usability and tangible advantages such as bias control.
Software interfaces that simplify regularization application could raise uptake more effectively than endorsements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If self-reports align with behavior, hands-on training focused on implementation ease would likely raise actual adoption rates.
The pattern may hold for other advanced statistical techniques beyond regularization.
Developers could test whether interface improvements directly increase observed usage in real analyses.

Load-bearing premise

Self-reported survey measures of trust and intended use accurately reflect real-world adoption behavior of statistical methods.

What would settle it

A field study that directly observes analysts' actual code or software logs to compare real regularization usage rates against their survey-reported intentions.

Figures

Figures reproduced from arXiv: 2604.02992 by Andreas Groll, Konstantin Emil Thiel, Magdalena Wischnewski, Markus Pauly, Marl\'ene Baumeister, Nicole Kr\"amer.

**Figure 2.** Figure 2: Empirical Kendall’s τ correlation matrix of the investigated constructs. Constructs are sorted in descending order of their correlation with bi. 3.4 Factorial Analysis As described in Section 2.5, we test the three global hypotheses R1, R2, and R3 that refer to the constructs tr, vi, and bi. In [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Joint distribution of demographic variables with bi. distributions of the construct tr with x tr ℓ ∼ F tr ℓ and let z tr be distributed according to the average distribution, z tr ∼ 1/4 P4 ℓ=1 F tr ℓ . Then, for group comparisons, we use nonparametric relative effects, which can be defined as p tr ℓ = P (z tr < xtr ℓ ) + 1 2 · P (z tr = x tr ℓ ), ℓ ∈ {c, e, j, p}, for the construct tr. If p tr e = p tr c f… view at source ↗

**Figure 4.** Figure 4: Scores for the constructs trust (tr), vigilance (vi), and behavioural intention (bi) per recommendation groups Control, Expert, Journal, and Peer. In line with [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: 10-fold cross validation Brier score in a cumulative logit model with bi as the outcome variable and the following predictors (main effects only): gender, age, ee, su, pe, at, ex, tr, si, and vi. Grey bars: standard error of the respective cross validation average. Vertical dashed orange line: optimum. Vertical dashed blue line: largest λ, where the score is within one standard error of the optimum. . . . … view at source ↗

**Figure 6.** Figure 6: Coefficient paths across lasso penalization in the first step. All variables are scaled to unit variance by dividing through their sample standard deviations. The vertical dashed blue line marks the λ value for the 1-SE Brier score from [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: 10-fold cross validation Brier score in a cumulative logit model with bi as the outcome variable and predictors ee, pe, at, ex, tr, si, vi, plus all pairwise interactions between these variables. Grey bars: standard error of the respective cross validation average. Vertical dashed orange line: optimum. Vertical dashed blue line: largest λ, where the score is within one standard error of the optimum. and t… view at source ↗

read the original abstract

Statistical practice does not automatically follow methodological innovation. Regularization methods, widely advocated to reduce overfitting and stabilize inference, are readily available in modern software, but are not consistently used by data analysts. We investigate this implementation gap in a large-scale empirical study of trust in, and acceptance of, regularization techniques, based on $N = 606$ data analysts. Drawing on measurement frameworks from technology acceptance research, we survey practitioners and embed a randomized experiment to test whether written recommendation of regularization methods increases trust or intended use. We find no evidence of such an effect. Instead, adoption intentions are strongly associated with analysts' perceptions of ease of implementation and practical benefit, such as improved bias control or interpretability. Perceived social norms also emerge as a central driver. These results indicate that uptake of statistical methodology depends less on formal recommendations than on usability, perceived utility, and community practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Recommendations showed no effect on regularization trust or intentions, while ease of use, benefits, and norms correlated with them; self-reports cap how much we can conclude.

read the letter

The main takeaway is that the randomized experiment turned up no evidence that written recommendations increase trust or intended use of regularization. The survey instead ties higher adoption intentions to perceptions of ease, practical payoffs like better bias control or interpretability, and social norms around what peers do. That null on recommendations is the clearest new data point here. The paper does a decent job pulling in 606 analysts and running the experiment inside the survey. Borrowing the technology acceptance framework gives the questions some structure and makes the results comparable to other adoption studies. The sample size is respectable for this kind of work, and the direct test of recommendations is more than most papers on statistical practice manage. The soft spots are exactly where the stress-test note points. Outcomes are all self-reported intentions and perceptions, with no behavioral check such as whether people actually use the methods on a task or in follow-up work. If time pressure, software defaults, or team habits matter more than stated views, the associations and the null result become harder to translate into real uptake. The abstract is thin on response rate, exact items, and effect sizes, though the full paper presumably supplies those. The citation pattern is straightforward and draws on the right prior work without overreach. This is for applied statisticians and people who study how methods actually spread. It deserves peer review because the design is simple and the sample large enough to be worth referee time, even if revisions will need to tighten the measurement claims and add caveats about self-reports.

Referee Report

3 major / 2 minor

Summary. The manuscript reports an empirical study of N=606 data analysts that combines a survey with an embedded randomized experiment. The central finding is that written recommendations for regularization methods produce no detectable increase in trust or intended use; instead, adoption intentions correlate strongly with perceived ease of implementation, practical benefits (e.g., bias control, interpretability), and social norms.

Significance. If the results hold after improved reporting and validation, the work provides evidence that uptake of statistical methods is driven more by usability and community practice than by formal recommendations, extending technology-acceptance frameworks to statistical methodology and offering practical guidance for methodologists seeking wider adoption.

major comments (3)

[Methods] Methods section: the survey instrument, item wording for the trust/intended-use/ease/benefit/norms scales, response rate, and sampling frame are not described in sufficient detail to allow replication or assessment of measurement validity; the abstract and results refer to these constructs but provide no appendix or reference to validated instruments.
[Results] Results section: the null finding for the randomized recommendation experiment is presented without effect sizes, confidence intervals, or a power analysis; without these quantities it is impossible to judge whether the study was powered to detect a practically meaningful recommendation effect.
[Discussion] Discussion section: the claim that adoption intentions reflect real-world underuse of regularization rests entirely on self-reported scales; the manuscript contains no behavioral validation (actual code usage, follow-up task performance, or longitudinal tracking), which is load-bearing for the interpretation that perceived ease and norms, rather than recommendations, explain the implementation gap.

minor comments (2)

[Abstract] Abstract: the reported N=606 should be accompanied by the achieved response rate and any exclusion criteria to give readers immediate context on sample representativeness.
[Methods] Notation for the outcome scales is introduced without a clear table or appendix listing the exact Likert items or reliability coefficients (Cronbach’s α or similar).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight important areas for improving clarity, statistical reporting, and interpretation. We have revised the manuscript to address each point and provide additional details below.

read point-by-point responses

Referee: [Methods] Methods section: the survey instrument, item wording for the trust/intended-use/ease/benefit/norms scales, response rate, and sampling frame are not described in sufficient detail to allow replication or assessment of measurement validity; the abstract and results refer to these constructs but provide no appendix or reference to validated instruments.

Authors: We agree that greater methodological transparency is required. The revised manuscript now includes a new Appendix A that reproduces the complete survey instrument, with verbatim item wording for all scales (trust, intended use, ease of implementation, practical benefits, and social norms). We have added the achieved response rate and a precise description of the sampling frame (recruitment through professional data-science forums, LinkedIn groups, and academic mailing lists). Items were adapted from established technology-acceptance instruments (Davis 1989; Venkatesh et al. 2003); we now cite these sources explicitly and note minor adaptations made for the statistical-methods context. revision: yes
Referee: [Results] Results section: the null finding for the randomized recommendation experiment is presented without effect sizes, confidence intervals, or a power analysis; without these quantities it is impossible to judge whether the study was powered to detect a practically meaningful recommendation effect.

Authors: We accept this criticism. The revised Results section now reports standardized effect sizes (Cohen’s d) and 95% confidence intervals for all between-condition contrasts. We have also added a post-hoc power analysis (using the observed variance and sample size) showing that the design had 80% power to detect effects as small as d = 0.23. These additions allow readers to evaluate the precision and practical significance of the null result. revision: yes
Referee: [Discussion] Discussion section: the claim that adoption intentions reflect real-world underuse of regularization rests entirely on self-reported scales; the manuscript contains no behavioral validation (actual code usage, follow-up task performance, or longitudinal tracking), which is load-bearing for the interpretation that perceived ease and norms, rather than recommendations, explain the implementation gap.

Authors: We acknowledge that the study relies on self-reported intentions rather than direct behavioral measures. This is a recognized limitation of survey-based technology-acceptance research. In the revised Discussion we have added an explicit limitations paragraph that (a) notes the intention–behavior gap documented in the broader literature, (b) qualifies the strength of our causal claims about real-world underuse, and (c) proposes concrete future designs (e.g., analysis of public code repositories or embedded behavioral tasks) that could provide validation. We retain the core finding that perceived ease, utility, and norms are strongly associated with stated adoption intentions, which remain theoretically and practically relevant even if they are imperfect proxies for behavior. revision: partial

Circularity Check

0 steps flagged

No significant circularity: purely empirical survey and experiment

full rationale

The paper reports results from a survey (N=606) and a randomized experiment testing recommendation effects on trust/intended use of regularization. No mathematical derivations, equations, fitted parameters presented as predictions, or self-citation chains appear in the load-bearing claims. Central findings rest on standard statistical associations from self-reported scales; these are independent of the inputs by construction and do not reduce to self-definition or renaming. The study is self-contained against external benchmarks of survey/experimental methodology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on standard survey-research assumptions about self-report validity; no free parameters, new axioms, or invented entities.

axioms (1)

domain assumption Self-reported survey responses validly capture perceptions, trust, and behavioral intentions
The reported associations and lack of recommendation effect rest on this premise.

pith-pipeline@v0.9.0 · 5473 in / 985 out tokens · 60740 ms · 2026-05-13T18:46:21.363742+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Al-Ateeq, B., Sawan, N., Al-Hajaya, K., Altarawneh, M., & Al-Makhadmeh, A. (2022). Big Data Analytics in Auditing and the Consequences for Audit Quality: A Study Using the Technology Acceptance Model (TAM).Corporate Governance and Or- ganizational Behavior Review, 6(1), 64–78. Berger, J. O. (1985).Statistical Decision Theory and Bayesian Analysis. Springe...

work page doi:10.1007/978-1-4757-4286-2 2022
[2]

org/10.2307/258792 McCraw, B

https://doi. org/10.2307/258792 McCraw, B. W. (2015). The Nature of Epistemic Trust.Social epistemology, 29(4), 413–

work page doi:10.2307/258792 2015
[3]

H., Choudhury, V., & Kacmar, C

25 References Trust in Regularisation McKnight, D. H., Choudhury, V., & Kacmar, C. (2002). Developing and Validating Trust Measures for E-Commerce: An Integrative Typology.Information Systems Research, 13(3), 334–359. https://doi.org/10.1287/isre.13.3.334.81 McNeish, D. M. (2015). Using Lasso for Predictor Selection and to Assuage Overfitting: A MethodLon...

work page doi:10.1287/isre.13.3.334.81 2002
[4]

Sharpe, D. (2013). Why the Resistance to Statistical Innovations? Bridging the Commu- nication Gap.Psychological methods, 18(4),

work page 2013
[5]

Why Practitioners (Do Not) Use Regular- izations? An Empirical Study of Trust and Statistical Methodology Acceptance

StataCorp LLC. (2025).Stata Statistical Software Release 19(Software). College Station. Thiel, K. E., Baumeister, M., Krämer, N., Groll, A., Pauly, M., & Wischnewski, M. (2026). Supplementary Material of "Why Practitioners (Do Not) Use Regular- izations? An Empirical Study of Trust and Statistical Methodology Acceptance". https://doi.org/10.17877/RCTRUST-...

work page doi:10.17877/rctrust-2026-jp3fq6 2025
[6]

Wanner, J., Herm, L.-V., Heinrich, K., & Janiesch, C. (2022). The Effect of Transparency and Trust on Intelligent System Acceptance: Evidence from a User-Based Study. Electronic Markets, 32(4), 2079–2102. https://doi.org/10.1007/s12525-022- 00593-5 Wischnewski, M., Doebler, P., & Krämer, N. (2025, February). Development and valida- tion of the Trust in AI...

work page doi:10.1007/s12525-022- 2022