Recognition: 2 theorem links
· Lean TheoremRethinking Factor Loading Thresholds: A Case for a Strict {λ} >= .70 Rule
Pith reviewed 2026-05-13 01:03 UTC · model grok-4.3
The pith
Only factor loadings of 0.70 or higher should be retained in final measurement models because lower values mean more error than explained variance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In confirmatory factor analysis, only indicators with standardized loadings of λ >= .70 (and thus λ² >= .50) should be retained in final measurement models. Loadings below this threshold contain more error variance than explained variance according to the logic of average variance extracted and communality, which undermines both construct validity and the stability of factor solutions.
What carries the argument
The λ >= .70 threshold based on the requirement that squared loadings reach at least .50 to satisfy item-level communality and AVE criteria.
If this is right
- Indicators below .70 degrade measurement quality and reduce factor score determinacy.
- Retaining weak loadings undermines the stability of factor solutions and overall model fit.
- Adopting the stricter rule aligns item-level standards with established construct-level criteria for AVE.
- Final measurement models become more rigorous and interpretable when only strong indicators are kept.
Where Pith is reading between the lines
- Researchers may need to generate more items upfront or accept shorter scales when many candidates fall below the threshold.
- The rule could interact with sample size, where larger samples might tolerate slightly weaker loadings while still meeting other fit criteria.
- Alternative approaches such as item parceling or different estimation methods might become more common to avoid discarding substantive content.
Load-bearing premise
The average variance extracted and communality logic developed at the construct level can be applied directly as a universal cutoff at the individual item level without considering trade-offs in scale length, sample size, or item content.
What would settle it
A large-scale simulation or real-data reanalysis that shows measurement models with loadings between .50 and .70 still achieve acceptable reliability, factor score determinacy, and valid structural estimates would challenge the proposed cutoff.
read the original abstract
This paper challenges the prevailing practice of accepting standardized factor loadings as low as .50 in confirmatory factor analysis. Drawing on the logic of Average Variance Extracted (AVE) and communality, the author argues for a stricter item level threshold: only indicators with loadings of {\lambda} >= .70 (implying {\lambda}sq >= .50) should be retained in final measurement models. The rationale is that indicators with {\lambda} < .70 contain more error than explained variance, undermining both construct validity and the stability of factor solutions. The paper reviews theoretical foundations, simulation evidence, and implications for structural equation modeling, showing that weak loadings degrade measurement quality, factor score determinacy, and model fit. Adopting a minimum {\lambda} >= .70 rule aligns item level standards with established construct level criteria and enhances the rigor and interpretability of latent variable models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues for adopting a strict threshold of standardized factor loadings λ ≥ 0.70 in confirmatory factor analysis (CFA) and structural equation modeling (SEM). Drawing on the definitions of communality (λ²) and Average Variance Extracted (AVE), it posits that indicators with λ < 0.70 have more error variance than explained variance, which compromises construct validity, factor score determinacy, and overall model fit. The author reviews theoretical foundations, cites simulation evidence, and recommends retaining only high-loading items to align item-level standards with construct-level criteria like AVE ≥ 0.50.
Significance. If the central recommendation holds, the paper could influence measurement practices by encouraging stricter standards, potentially leading to more reliable latent variable models. It correctly notes the arithmetic link between λ ≥ 0.70 and λ² ≥ 0.50 from communality definitions. However, its significance is limited by reliance on existing concepts without new derivations or extensive original simulations, and the universal applicability remains debatable given content validity concerns.
major comments (2)
- [Abstract and §3] Abstract and §3: The claim that weak loadings degrade factor score determinacy and model fit rests on reviewed simulation evidence, but the manuscript does not provide sufficient details on the simulation parameters (e.g., sample sizes, number of factors, or specific fit indices affected) to allow independent verification of the effect sizes or generalizability.
- [§4 (Implications)] §4 (Implications): The recommendation for a universal λ ≥ .70 cutoff does not quantify the trade-offs with scale length and content validity; for instance, in a multi-item scale, retaining one or two items with λ ≈ .60 can maintain overall AVE > .50, composite reliability, and predictive utility while preserving domain coverage, yet no analysis shows when the marginal gain in per-item signal outweighs the cost of item deletion.
minor comments (2)
- [Abstract] In the abstract, 'λsq' should be formatted as λ² for clarity and consistency with mathematical notation used elsewhere.
- [References] Ensure explicit citation of foundational AVE work (e.g., Fornell and Larcker, 1981) is present when discussing construct-level criteria.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We have revised the manuscript to address the concerns about providing more details on the reviewed simulation evidence and expanding the discussion of trade-offs with content validity and scale length. Our responses to the major comments are provided below.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3: The claim that weak loadings degrade factor score determinacy and model fit rests on reviewed simulation evidence, but the manuscript does not provide sufficient details on the simulation parameters (e.g., sample sizes, number of factors, or specific fit indices affected) to allow independent verification of the effect sizes or generalizability.
Authors: We appreciate the referee for noting this gap in transparency. The manuscript reviews and cites existing simulation studies from the literature to support claims regarding impacts on factor score determinacy and model fit. In the revised version, we have added a dedicated paragraph in §3 (and updated the abstract accordingly) that summarizes the key parameters of the primary cited simulations. This includes typical sample sizes (N ranging from 200 to 1000), number of factors (1 to 4), and specific fit indices affected (e.g., CFI, RMSEA, SRMR, and chi-square). A summary table has also been included to facilitate independent verification and assessment of generalizability. revision: yes
-
Referee: [§4 (Implications)] §4 (Implications): The recommendation for a universal λ ≥ .70 cutoff does not quantify the trade-offs with scale length and content validity; for instance, in a multi-item scale, retaining one or two items with λ ≈ .60 can maintain overall AVE > .50, composite reliability, and predictive utility while preserving domain coverage, yet no analysis shows when the marginal gain in per-item signal outweighs the cost of item deletion.
Authors: The referee correctly identifies that the original §4 did not fully address these practical trade-offs. Our core rationale remains that item-level communality (λ²) ≥ 0.50 ensures each indicator contributes more true variance than error, aligning with AVE standards and supporting stable factor solutions. In the revised manuscript, we have expanded §4 with a new subsection discussing content validity and scale length considerations. We acknowledge that in some short scales, retaining a few items with λ ≈ .60 may preserve overall AVE > .50 and domain coverage, but we explain that this comes at the cost of elevated per-item error variance, which can still undermine factor score determinacy. We provide qualitative guidance on when deletion may be warranted (e.g., when more than 20% of items fall below .70) and recommend sensitivity checks, while noting that a full quantitative optimization of marginal gains versus deletion costs would require additional empirical modeling beyond the scope of this paper. revision: partial
Circularity Check
No significant circularity; argument extends pre-existing AVE/communality definitions
full rationale
The paper's core recommendation for a λ >= .70 item threshold is presented as a direct application of the standard definitions of Average Variance Extracted (AVE = average of squared loadings) and communality, which predate this work and are not redefined here. No equations reduce the proposed cutoff to a fitted parameter or self-referential prediction, no self-citations are invoked as load-bearing uniqueness theorems, and the derivation chain does not rename or smuggle in an ansatz from the authors' prior work. The claim remains an interpretive recommendation about applying construct-level criteria at the item level, with independent content regarding trade-offs in measurement quality.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Communality of an indicator equals the square of its standardized factor loading.
- domain assumption An item should explain at least as much variance as it leaves as error.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearAVE = (1/k) Σ λᵢ² … λ² ≥ .50 ⇒ |λ| ≥ √.50 ≈ .707 … indicators with λ below .70 contain more error than explained variance
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclearcommunalities below about .40 are problematic … factors are most reliable when there are several items with loadings ≥ .60
Reference graph
Works this paper leans on
-
[1]
Cheung, G. W., Cooper-Thomas, H. D., Lau, R. S., & Wang, L. C. (2024). Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations.Asia pacific journal of management,41(2), 745-783
work page 2024
-
[2]
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50
work page 1981
-
[3]
Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275
work page 1988
-
[4]
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2018). Multivariate data analysis (8th ed.). Cengage
work page 2018
-
[5]
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55
work page 1999
-
[6]
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84–99
work page 1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.