arxiv: 2605.11044 · v1 · submitted 2026-05-11 · 📊 stat.ME · stat.AP

Recognition: 2 theorem links

· Lean Theorem

Rethinking Factor Loading Thresholds: A Case for a Strict {λ} >= .70 Rule

Duygu Toplu Yaslioglu, M.Murat Yaslioglu

Pith reviewed 2026-05-13 01:03 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords factor loadingsconfirmatory factor analysisaverage variance extractedcommunalitymeasurement modelsstructural equation modelingconstruct validity

0 comments

The pith

Only factor loadings of 0.70 or higher should be retained in final measurement models because lower values mean more error than explained variance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the common practice of accepting standardized factor loadings down to 0.50 in confirmatory factor analysis. It draws on the concepts of average variance extracted and communality to argue that each indicator must explain at least half its variance, which requires a loading of at least 0.70. Indicators below this threshold introduce more error than true signal, weakening construct validity and the stability of the entire factor solution. The authors review the theoretical basis, cite simulation evidence, and outline how this change would improve model fit and interpretability in structural equation modeling. The proposal aims to bring item-level rules into line with existing construct-level standards.

Core claim

In confirmatory factor analysis, only indicators with standardized loadings of λ >= .70 (and thus λ² >= .50) should be retained in final measurement models. Loadings below this threshold contain more error variance than explained variance according to the logic of average variance extracted and communality, which undermines both construct validity and the stability of factor solutions.

What carries the argument

The λ >= .70 threshold based on the requirement that squared loadings reach at least .50 to satisfy item-level communality and AVE criteria.

If this is right

Indicators below .70 degrade measurement quality and reduce factor score determinacy.
Retaining weak loadings undermines the stability of factor solutions and overall model fit.
Adopting the stricter rule aligns item-level standards with established construct-level criteria for AVE.
Final measurement models become more rigorous and interpretable when only strong indicators are kept.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers may need to generate more items upfront or accept shorter scales when many candidates fall below the threshold.
The rule could interact with sample size, where larger samples might tolerate slightly weaker loadings while still meeting other fit criteria.
Alternative approaches such as item parceling or different estimation methods might become more common to avoid discarding substantive content.

Load-bearing premise

The average variance extracted and communality logic developed at the construct level can be applied directly as a universal cutoff at the individual item level without considering trade-offs in scale length, sample size, or item content.

What would settle it

A large-scale simulation or real-data reanalysis that shows measurement models with loadings between .50 and .70 still achieve acceptable reliability, factor score determinacy, and valid structural estimates would challenge the proposed cutoff.

read the original abstract

This paper challenges the prevailing practice of accepting standardized factor loadings as low as .50 in confirmatory factor analysis. Drawing on the logic of Average Variance Extracted (AVE) and communality, the author argues for a stricter item level threshold: only indicators with loadings of {\lambda} >= .70 (implying {\lambda}sq >= .50) should be retained in final measurement models. The rationale is that indicators with {\lambda} < .70 contain more error than explained variance, undermining both construct validity and the stability of factor solutions. The paper reviews theoretical foundations, simulation evidence, and implications for structural equation modeling, showing that weak loadings degrade measurement quality, factor score determinacy, and model fit. Adopting a minimum {\lambda} >= .70 rule aligns item level standards with established construct level criteria and enhances the rigor and interpretability of latent variable models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper restates the standard communality argument to push a hard .70 loading cutoff, but leaves the practical trade-offs with content validity unexamined.

read the letter

The core idea here is straightforward: since average variance extracted at the construct level requires at least half the variance explained, the same logic should apply item by item, so drop anything below .70. That arithmetic checks out from the definitions of communality and AVE. The paper walks through this clearly and points to simulation studies showing that weaker loadings can hurt factor determinacy and overall model fit. It makes a direct case against the looser .50 or .60 thresholds that many people still use in CFA work. That part is useful for anyone who wants a simple, defensible rule to cite when cleaning scales. The soft spot is the assumption that every item must clear .50 explained variance on its own. In practice, scales often keep one or two items around .60 when they cover key content and the construct-level AVE still exceeds .50. The paper does not show concrete cases or numbers on when the stricter cutoff improves prediction or stability versus when it shrinks domain coverage too much. It also rests on reviewed simulations without adding fresh checks for sample size or scale length effects. This is aimed at applied SEM users in psychology or education who are looking for tighter measurement standards. A reader who already knows the AVE formula will follow it fast, but will not see a new derivation or dataset. It deserves a serious referee because the recommendation could shift routine practice and the per-item versus aggregate issue needs direct scrutiny. I would send it for review and ask the authors to address those trade-offs explicitly.

Referee Report

2 major / 2 minor

Summary. The paper argues for adopting a strict threshold of standardized factor loadings λ ≥ 0.70 in confirmatory factor analysis (CFA) and structural equation modeling (SEM). Drawing on the definitions of communality (λ²) and Average Variance Extracted (AVE), it posits that indicators with λ < 0.70 have more error variance than explained variance, which compromises construct validity, factor score determinacy, and overall model fit. The author reviews theoretical foundations, cites simulation evidence, and recommends retaining only high-loading items to align item-level standards with construct-level criteria like AVE ≥ 0.50.

Significance. If the central recommendation holds, the paper could influence measurement practices by encouraging stricter standards, potentially leading to more reliable latent variable models. It correctly notes the arithmetic link between λ ≥ 0.70 and λ² ≥ 0.50 from communality definitions. However, its significance is limited by reliance on existing concepts without new derivations or extensive original simulations, and the universal applicability remains debatable given content validity concerns.

major comments (2)

[Abstract and §3] Abstract and §3: The claim that weak loadings degrade factor score determinacy and model fit rests on reviewed simulation evidence, but the manuscript does not provide sufficient details on the simulation parameters (e.g., sample sizes, number of factors, or specific fit indices affected) to allow independent verification of the effect sizes or generalizability.
[§4 (Implications)] §4 (Implications): The recommendation for a universal λ ≥ .70 cutoff does not quantify the trade-offs with scale length and content validity; for instance, in a multi-item scale, retaining one or two items with λ ≈ .60 can maintain overall AVE > .50, composite reliability, and predictive utility while preserving domain coverage, yet no analysis shows when the marginal gain in per-item signal outweighs the cost of item deletion.

minor comments (2)

[Abstract] In the abstract, 'λsq' should be formatted as λ² for clarity and consistency with mathematical notation used elsewhere.
[References] Ensure explicit citation of foundational AVE work (e.g., Fornell and Larcker, 1981) is present when discussing construct-level criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We have revised the manuscript to address the concerns about providing more details on the reviewed simulation evidence and expanding the discussion of trade-offs with content validity and scale length. Our responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3: The claim that weak loadings degrade factor score determinacy and model fit rests on reviewed simulation evidence, but the manuscript does not provide sufficient details on the simulation parameters (e.g., sample sizes, number of factors, or specific fit indices affected) to allow independent verification of the effect sizes or generalizability.

Authors: We appreciate the referee for noting this gap in transparency. The manuscript reviews and cites existing simulation studies from the literature to support claims regarding impacts on factor score determinacy and model fit. In the revised version, we have added a dedicated paragraph in §3 (and updated the abstract accordingly) that summarizes the key parameters of the primary cited simulations. This includes typical sample sizes (N ranging from 200 to 1000), number of factors (1 to 4), and specific fit indices affected (e.g., CFI, RMSEA, SRMR, and chi-square). A summary table has also been included to facilitate independent verification and assessment of generalizability. revision: yes
Referee: [§4 (Implications)] §4 (Implications): The recommendation for a universal λ ≥ .70 cutoff does not quantify the trade-offs with scale length and content validity; for instance, in a multi-item scale, retaining one or two items with λ ≈ .60 can maintain overall AVE > .50, composite reliability, and predictive utility while preserving domain coverage, yet no analysis shows when the marginal gain in per-item signal outweighs the cost of item deletion.

Authors: The referee correctly identifies that the original §4 did not fully address these practical trade-offs. Our core rationale remains that item-level communality (λ²) ≥ 0.50 ensures each indicator contributes more true variance than error, aligning with AVE standards and supporting stable factor solutions. In the revised manuscript, we have expanded §4 with a new subsection discussing content validity and scale length considerations. We acknowledge that in some short scales, retaining a few items with λ ≈ .60 may preserve overall AVE > .50 and domain coverage, but we explain that this comes at the cost of elevated per-item error variance, which can still undermine factor score determinacy. We provide qualitative guidance on when deletion may be warranted (e.g., when more than 20% of items fall below .70) and recommend sensitivity checks, while noting that a full quantitative optimization of marginal gains versus deletion costs would require additional empirical modeling beyond the scope of this paper. revision: partial

Circularity Check

0 steps flagged

No significant circularity; argument extends pre-existing AVE/communality definitions

full rationale

The paper's core recommendation for a λ >= .70 item threshold is presented as a direct application of the standard definitions of Average Variance Extracted (AVE = average of squared loadings) and communality, which predate this work and are not redefined here. No equations reduce the proposed cutoff to a fitted parameter or self-referential prediction, no self-citations are invoked as load-bearing uniqueness theorems, and the derivation chain does not rename or smuggle in an ansatz from the authors' prior work. The claim remains an interpretive recommendation about applying construct-level criteria at the item level, with independent content regarding trade-offs in measurement quality.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper draws on the standard variance decomposition identity (communality = λ²) and the conventional AVE threshold of 0.50 at the construct level. No new free parameters or invented entities are introduced.

axioms (2)

standard math Communality of an indicator equals the square of its standardized factor loading.
Invoked when equating λ >= .70 with λ² >= .50.
domain assumption An item should explain at least as much variance as it leaves as error.
The normative premise that justifies translating the AVE rule to the item level.

pith-pipeline@v0.9.0 · 5454 in / 1198 out tokens · 28811 ms · 2026-05-13T01:03:20.478824+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
AVE = (1/k) Σ λᵢ² … λ² ≥ .50 ⇒ |λ| ≥ √.50 ≈ .707 … indicators with λ below .70 contain more error than explained variance
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
communalities below about .40 are problematic … factors are most reliable when there are several items with loadings ≥ .60

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

W., Cooper-Thomas, H

Cheung, G. W., Cooper-Thomas, H. D., Lau, R. S., & Wang, L. C. (2024). Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations.Asia pacific journal of management,41(2), 745-783

work page 2024
[2]

Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50

work page 1981
[3]

Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275

work page 1988
[4]

F., Black, W

Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2018). Multivariate data analysis (8th ed.). Cengage

work page 2018
[5]

T., & Bentler, P

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55

work page 1999
[6]

C., Widaman, K

MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84–99

work page 1999