pith. machine review for the scientific record. sign in

arxiv: 2604.25664 · v1 · submitted 2026-04-28 · 📊 stat.ML · cs.LG· math.OC

Recognition: unknown

Deflation-Free Optimal Scoring

Brendan Ames, Sharmin Afroz

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:25 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC
keywords deflation-freesparse optimal scoringdiscriminant analysishigh-dimensional dataBregman iterationglobal orthogonalityfeature selectionclassification accuracy
0
0 comments X

The pith

A deflation-free approach estimates all sparse discriminant vectors simultaneously under a global orthogonality constraint.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing sparse optimal scoring methods compute discriminant vectors sequentially via deflation, which can accumulate errors and yield suboptimal feature selections when the number of features greatly exceeds observations. The paper introduces Deflation-Free Sparse Optimal Scoring (DFSOS) that solves for all vectors at the same time while enforcing orthogonality globally. It does this by combining Bregman iteration with orthogonality-constrained optimization and breaking the task into subproblems for scoring vectors, discriminant vectors, and the orthogonality condition. The method is shown to converge to stationary points under mild conditions. Experiments on synthetic data and real time series demonstrate classification accuracy that is comparable to or better than deflation-based alternatives.

Core claim

The paper proposes Deflation-Free Sparse Optimal Scoring (DFSOS) that estimates all discriminant vectors simultaneously under an explicit global orthogonality constraint by combining Bregman iteration with orthogonality-constrained optimization, decomposing the problem into tractable subproblems for scoring vectors, discriminant vectors, and orthogonality enforcement, establishing convergence to stationary points of the augmented Lagrangian under mild conditions, and showing through experiments that it achieves classification accuracy comparable to or better than existing deflation-based methods.

What carries the argument

The simultaneous estimation of all discriminant vectors under a global orthogonality constraint, decomposed via Bregman iteration into subproblems for scoring vectors, discriminant vectors, and orthogonality enforcement.

If this is right

  • Eliminates error propagation that arises when discriminant vectors are computed one after another.
  • Maintains or improves classification accuracy in settings where features outnumber observations.
  • Provides convergence guarantees to stationary points under mild conditions on the augmented Lagrangian.
  • Applies directly to high-dimensional classification tasks including time series data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same simultaneous-plus-global-constraint pattern could replace deflation in other sequential sparse optimization routines such as sparse PCA variants.
  • In domains with very high feature-to-sample ratios, the absence of sequential error buildup may become more pronounced as dimension grows.
  • The decomposition into independent subproblems suggests the method could be parallelized more readily than deflation sequences.

Load-bearing premise

Enforcing the global orthogonality constraint simultaneously does not introduce new instabilities or reduce the sparsity benefits compared to sequential deflation.

What would settle it

A high-dimensional dataset where DFSOS yields lower classification accuracy, less sparse solutions, or fails to converge while deflation-based methods succeed would show the simultaneous approach does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2604.25664 by Brendan Ames, Sharmin Afroz.

Figure 1
Figure 1. Figure 1: Visualisation of discriminant vectors returned by each of DFSOS-1, DFSOS-2, and ASDA with view at source ↗
Figure 2
Figure 2. Figure 2: Visualisation of discriminant vectors returned by each of DFSOS-1, DFSOS-2, and ASDA with view at source ↗
Figure 3
Figure 3. Figure 3: Nearest centroid classification decision boundaries and predictions for testing data in the span view at source ↗
Figure 4
Figure 4. Figure 4: Nearest centroid classification decision boundaries and predictions for testing data in the view at source ↗
Figure 5
Figure 5. Figure 5: Number of (K, r)-pairs for which we observe statistically significant improvement in clas￾sification accuracy, computational efficiency, and cardinality of discriminants when using row method compared to column method. That is, the (i, j)-entry is the number of (K, r)-pair where we reject the null hypothesis H0 : there is no difference between method i and method j using the Wilcoxon test with one-sided al… view at source ↗
Figure 6
Figure 6. Figure 6: Average cosine similarity between out-of-sample predictions for each pair of methods. The view at source ↗
Figure 7
Figure 7. Figure 7: Number of time-series datasets from UCR repository for which we observe statistically signif view at source ↗
Figure 8
Figure 8. Figure 8: Average cosine similarity between out-of-sample predictions for each pair of methods. The view at source ↗
read the original abstract

Sparse Optimal Scoring (SOS) reformulates linear discriminant analysis to enable feature selection through elastic net regularization, making it well-suited for high-dimensional settings where the number of features exceeds observations. Most existing SOS methods use deflation-based strategies that compute discriminant vectors sequentially, which can propagate errors and produce suboptimal solutions. We propose a novel approach that estimates all discriminant vectors simultaneously under an explicit global orthogonality constraint, which we call Deflation-Free Sparse Optimal Scoring (DFSOS). DFSOS combines Bregman iteration with orthogonality-constrained optimization, decomposing the problem into tractable subproblems for scoring vectors, discriminant vectors, and orthogonality enforcement. We establish convergence to stationary points of the augmented Lagrangian under mild conditions. Extensive experiments using synthetic data and real-world time series data demonstrate that DFSOS achieves classification accuracy comparable to or better than existing deflation-based methods. These results indicate that deflation-free approaches offer a robust and effective framework for sparse discriminant analysis in high-dimensional problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper proposes Deflation-Free Sparse Optimal Scoring (DFSOS) for high-dimensional sparse linear discriminant analysis. It reformulates the problem to estimate all discriminant vectors simultaneously under an explicit global orthogonality constraint by combining Bregman iteration with orthogonality-constrained optimization, decomposing the task into subproblems for scoring vectors, discriminant vectors, and orthogonality enforcement. The authors claim convergence to stationary points of the augmented Lagrangian under mild conditions and report that experiments on synthetic and real-world time-series data yield classification accuracy comparable to or better than deflation-based SOS methods.

Significance. If the convergence result and empirical claims hold, the work provides a technically interesting alternative to sequential deflation in sparse LDA, potentially mitigating error propagation while maintaining the feature-selection benefits of elastic-net regularization. The simultaneous global orthogonality enforcement via Bregman decomposition is a clear methodological contribution if it produces stationary points that are at least as good as deflation baselines in objective value and sparsity.

major comments (3)
  1. [Convergence Analysis] The convergence claim to stationary points under mild conditions (abstract and convergence section) does not verify that these conditions remain satisfied in the high-dimensional regimes (p ≫ n) used in the experiments; the augmented Lagrangian could become ill-conditioned when elastic-net regularization interacts with the orthogonality penalty, risking unstable or suboptimal stationary points.
  2. [Experimental Results] The central empirical claim that DFSOS achieves comparable or better accuracy relies on experiments (synthetic and time-series sections), but the manuscript provides no quantitative accuracy values, standard deviations, number of runs, or direct objective-value/sparsity comparisons to deflation baselines; without these, it is impossible to confirm that the simultaneous solutions avoid deflation's error propagation or preserve sparsity benefits.
  3. [Method and Experiments] The assertion that simultaneous estimation under global orthogonality yields superior or equivalent solutions to sequential deflation (introduction and method sections) is not supported by any demonstration that the obtained stationary points achieve lower or equal values of the original SOS objective or maintain comparable sparsity levels; this comparison is load-bearing for the claim of robustness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, providing honest responses and indicating the revisions we will make to strengthen the presentation and evidence.

read point-by-point responses
  1. Referee: [Convergence Analysis] The convergence claim to stationary points under mild conditions (abstract and convergence section) does not verify that these conditions remain satisfied in the high-dimensional regimes (p ≫ n) used in the experiments; the augmented Lagrangian could become ill-conditioned when elastic-net regularization interacts with the orthogonality penalty, risking unstable or suboptimal stationary points.

    Authors: We agree that the convergence theorem is presented under mild conditions without an explicit verification or discussion tailored to the high-dimensional p ≫ n regimes in the experiments. The conditions (bounded penalty parameters, Lipschitz continuity of the objective) are dimension-independent in the proof, but we did not address potential ill-conditioning from the interaction of elastic-net and orthogonality terms. In the revision, we will expand the convergence section with a remark on applicability to high dimensions and add numerical monitoring of convergence behavior and penalty parameters in the experimental results to confirm stability. revision: yes

  2. Referee: [Experimental Results] The central empirical claim that DFSOS achieves comparable or better accuracy relies on experiments (synthetic and time-series sections), but the manuscript provides no quantitative accuracy values, standard deviations, number of runs, or direct objective-value/sparsity comparisons to deflation baselines; without these, it is impossible to confirm that the simultaneous solutions avoid deflation's error propagation or preserve sparsity benefits.

    Authors: The referee is correct that the experimental sections report only qualitative statements on accuracy without providing mean values, standard deviations, the number of repetitions, or direct comparisons of the SOS objective and sparsity levels. This limits the ability to assess the claims quantitatively. We will revise the synthetic and time-series experiments to include tables with these statistics (e.g., averages and standard deviations over 20 runs) and add direct comparisons of objective values and selected feature counts against deflation baselines. revision: yes

  3. Referee: [Method and Experiments] The assertion that simultaneous estimation under global orthogonality yields superior or equivalent solutions to sequential deflation (introduction and method sections) is not supported by any demonstration that the obtained stationary points achieve lower or equal values of the original SOS objective or maintain comparable sparsity levels; this comparison is load-bearing for the claim of robustness.

    Authors: We acknowledge that while classification accuracy serves as an indirect indicator, the manuscript does not explicitly demonstrate that the stationary points of the augmented Lagrangian achieve objective values lower than or equal to deflation-based solutions or preserve comparable sparsity. To support the robustness claim, the revised manuscript will include additional analysis in the experimental section with side-by-side objective value and sparsity comparisons, showing that DFSOS solutions are at least as good as the baselines in these metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic reformulation with independent convergence analysis

full rationale

The paper's derivation chain consists of reformulating sparse optimal scoring as a joint optimization problem with explicit global orthogonality, then applying Bregman iteration to decompose into subproblems whose stationary points are shown to satisfy the augmented Lagrangian under stated mild conditions. No quoted step reduces a claimed result to a fitted parameter renamed as prediction, a self-referential definition, or a load-bearing self-citation whose content is unverified. The convergence claim is presented as a theorem derived from the algorithm's structure rather than presupposing the target performance metrics, and experiments are treated as external validation rather than inputs to the derivation. This is the normal case of a self-contained algorithmic contribution.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method builds on standard optimization theory for Bregman iteration and augmented Lagrangians without introducing new postulated entities. Elastic net regularization parameters are free parameters that require tuning but are not explicitly fitted or reported in the abstract.

free parameters (1)
  • elastic net regularization parameters
    The approach inherits elastic net regularization from SOS, which involves tunable parameters for sparsity and smoothness that must be selected or cross-validated.
axioms (1)
  • domain assumption Bregman iteration converges to stationary points of the augmented Lagrangian under mild conditions
    The paper invokes this to establish convergence of the proposed decomposition into subproblems for scoring vectors, discriminant vectors, and orthogonality.

pith-pipeline@v0.9.0 · 5454 in / 1382 out tokens · 89296 ms · 2026-05-07T14:25:00.917071+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 3 canonical work pages

  1. [1]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008

  2. [2]

    Atkins, G

    S. Atkins, G. Einarsson, L. Clemmensen, and B. Ames. Proximal methods for sparse optimal scoring and discriminant analysis.Advances in Data Analysis and Classification, 17(4):983–1036, 2023

  3. [3]

    Riemannian Adaptive Optimization Methods

    G. B´ ecigneul and O.-E. Ganea. Riemannian adaptive optimization methods.arXiv preprint arXiv:1810.00760, 2018

  4. [4]

    Bonnabel

    S. Bonnabel. Stochastic gradient descent on Riemannian manifolds.IEEE Transactions on Auto- matic Control, 58(9):2217–2229, 2013

  5. [5]

    Boumal.An introduction to optimization on smooth manifolds

    N. Boumal.An introduction to optimization on smooth manifolds. Cambridge University Press, Cambridge, 2023

  6. [6]

    Burer and R

    S. Burer and R. D. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical programming, 95(2):329–357, 2003

  7. [7]

    Cai and W

    T. Cai and W. Liu. A direct estimation approach to sparse linear discriminant analysis.Journal of the American statistical association, 106(496):1566–1577, 2011. 19

  8. [8]

    S. Chen, S. Ma, A. Man-Cho So, and T. Zhang. Nonsmooth optimization over the Stiefel manifold and beyond: Proximal gradient method and recent variants.SIAM Review, 66(2):319–352, 2024

  9. [9]

    Clemmensen, T

    L. Clemmensen, T. Hastie, D. Witten, and B. Ersbøll. Sparse discriminant analysis.Technometrics, 53(4):406–413, 2011

  10. [10]

    H. A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh. The UCR time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293– 1305, 2019

  11. [11]

    Efron, T

    B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression.Annals of Statistics, pages 407–451, 2004

  12. [12]

    J. Fan, Y. Feng, and X. Tong. A road to classification in high dimensional space: the regularized optimal affine discriminant.Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(4):745–771, 2012

  13. [13]

    B. Gao, N. T. Son, P.-A. Absil, and T. Stykel. Riemannian optimization on the symplectic Stiefel manifold.SIAM Journal on Optimization, 31(2):1546–1575, 2021

  14. [14]

    Y. Guo, T. Hastie, and R. Tibshirani. Regularized linear discriminant analysis and its application in microarrays.Biostatistics, 8(1):86–100, 2007

  15. [15]

    Hastie, A

    T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis.The Annals of Statistics, 23(1):73–102, 1995

  16. [16]

    Hastie, R

    T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis by optimal scoring.Journal of the American statistical association, 89(428):1255–1270, 1994

  17. [17]

    Hastie, R

    T. Hastie, R. Tibshirani, and J. H. Friedman.The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009

  18. [18]

    C. He, J. Li, B. Jiang, S. Ma, and S. Zhang. On relatively smooth optimization over Riemannian manifolds.arXiv preprint arXiv:2508.03048, 2025

  19. [19]

    Jiang, X

    B. Jiang, X. Meng, Z. Wen, and X. Chen. An exact penalty approach for optimization with non- negative orthogonality constraints.Mathematical Programming, 198(1):855–897, 2023

  20. [20]

    Lai and S

    R. Lai and S. Osher. A splitting method for orthogonality constrained problems.Journal of Scientific Computing, 58:431–449, 2014

  21. [21]

    Lai, L.-H

    Z. Lai, L.-H. Lim, and T. Tang. Stiefel optimization is NP-hard.arXiv preprint arXiv:2507.02839, 2025

  22. [22]

    Lai, L.-H

    Z. Lai, L.-H. Lim, and K. Ye. Grassmannian optimization is NP-hard.SIAM Journal on Optimiza- tion, 35(3):1939–1962, 2025

  23. [23]

    X. Li, S. Chen, Z. Deng, Q. Qu, Z. Zhu, and A. Man-Cho So. Weakly convex optimization over Stiefel manifold using Riemannian subgradient-type methods.SIAM Journal on Optimization, 31(3):1605–1634, 2021

  24. [24]

    Z. Lin, H. Li, and C. Fang.Alternating direction method of multipliers for machine learning. Springer, Singapore, 2022

  25. [25]

    Liu and N

    C. Liu and N. Boumal. Simple algorithms for optimization on Riemannian manifolds with con- straints.Applied Mathematics & Optimization, 82(3):949–981, 2020

  26. [26]

    Mai and H

    Q. Mai and H. Zou. A note on the connection and equivalence of three sparse linear discriminant analysis methods.Technometrics, 55(2):243–246, 2013

  27. [27]

    Q. Mai, H. Zou, and M. Yuan. A direct approach to sparse discriminant analysis in ultra-high dimensions.Biometrika, 99(1):29–42, 2012

  28. [28]

    Rosset and J

    S. Rosset and J. Zhu. Piecewise linear regularized solution paths.The Annals of Statistics, pages 1012–1030, 2007. 20

  29. [29]

    Sato.Riemannian optimization and its applications, volume 670

    H. Sato.Riemannian optimization and its applications, volume 670. Springer, Berlin, 2021

  30. [30]

    J. Shao, Y. Wang, X. Deng, and S. Wang. Sparse linear discriminant analysis by thresholding for high dimensional data.The Annals of Statistics, 39(2):1241–1265, 2011

  31. [31]

    Statistics and Machine Learning Toolbox, 2022

    The MathWorks Inc. Statistics and Machine Learning Toolbox, 2022

  32. [32]

    MATLAB version: 9.14.0 (R2023a), 2023

    The MathWorks Inc. MATLAB version: 9.14.0 (R2023a), 2023

  33. [33]

    Statistics and Machine Learning Toolbox (R2023a), 2023

    The MathWorks Inc. Statistics and Machine Learning Toolbox (R2023a), 2023

  34. [34]

    Y. Wang, W. Yin, and J. Zeng. Global convergence of admm in nonconvex nonsmooth optimization. Journal of Scientific Computing, 78:29–63, 2019

  35. [35]

    Wen and W

    Z. Wen and W. Yin. A feasible method for optimization with orthogonality constraints.Mathematical Programming, 142(1):397–434, 2013

  36. [36]

    D. M. Witten and R. Tibshirani. Penalized classification using Fisher’s linear discriminant.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(5):753–772, 2011

  37. [37]

    Zhang and S

    H. Zhang and S. Sra. First-order methods for geodesically convex optimization. InConference on learning theory, pages 1617–1638. PMLR, 2016

  38. [38]

    Zou and T

    H. Zou and T. Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005. 21