pith. machine review for the scientific record. sign in

arxiv: 2604.23833 · v1 · submitted 2026-04-26 · 💱 q-fin.PM

Recognition: unknown

Beyond De Prado and Cotton: Hierarchical and Iterative Methods for General Mean-Variance Portfolios

Bernd Johannes Wuebben

Pith reviewed 2026-05-08 04:55 UTC · model grok-4.3

classification 💱 q-fin.PM
keywords portfolio optimizationmean variancehierarchical risk parityshrinkageregularizationexpected returnsmonte carlorisk management
0
0 comments X

The pith

CRISP, an iterative correlation-shrinkage method, outperforms existing hierarchical and regularised approaches for mean-variance portfolios that incorporate return forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces extensions of hierarchical risk parity that accept expected-return signals and a new iterative solver, CRISP, for regularised mean-variance problems. These methods allow portfolio construction to use both covariance estimates and alpha signals without the instability of unregularised optimisation. Monte Carlo experiments across different regimes show that CRISP with intermediate regularisation strength produces the highest out-of-sample Sharpe ratios. Readers interested in quantitative portfolio management would care because the approach offers a computationally efficient way to blend signals and risk estimates more effectively than prior art.

Core claim

The author establishes that solving the mean-variance problem iteratively on a covariance matrix whose off-diagonal elements are shrunk toward zero while diagonal variances stay fixed yields better risk-adjusted returns than hierarchical risk parity or other shrinkage techniques when return signals are present, with the shrinkage intensity chosen by out-of-sample performance rather than covariance fit.

What carries the argument

CRISP, the correlation-regularised iterative shrinkage portfolio solver that converges to the solution of P_gamma w equals mu, where P_gamma is a convex blend of the covariance matrix and its diagonal.

If this is right

  • The signal-aware hierarchical methods HRP-mu and HRP-Sigma mu improve upon standard HRP while preserving its tree structure.
  • CRISP is equivalent to applying Markowitz optimisation to a shrunk covariance that preserves variances but reduces correlations.
  • Intermediate values of the regularisation parameter gamma in CRISP deliver the best performance in the tested Monte Carlo settings.
  • The dominance of CRISP holds for both low and high ratios of observations to assets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • One could explore whether CRISP's gamma choice transfers across different asset classes or market regimes.
  • The iterative nature of CRISP might be combined with other regularisation techniques for further gains.
  • Real-market backtests would be needed to confirm if the Monte Carlo results hold without the controlled simulation assumptions.

Load-bearing premise

That the out-of-sample Sharpe ratio maximised over gamma in Monte Carlo simulations with specific covariance structures will correspond to good choices in actual financial markets.

What would settle it

A study applying the methods to historical asset returns and observing that CRISP no longer outperforms when gamma is tuned on the same out-of-sample criterion using rolling windows instead of Monte Carlo draws.

Figures

Figures reproduced from arXiv: 2604.23833 by Bernd Johannes Wuebben.

Figure 1
Figure 1. Figure 1: The shrinkage operator Pγ = (1 − γ)D + γΣ and the CRISP convergence rate on a block-structured 60-asset test covariance with four clusters, within-block correlation 0.65, cross-block correlation 0.20, and idiosyncratic volatilities uniform on [0.15, 0.40]. Left: the preconditioned condition number κ(D−1Pγ) = [(1 − γ) + γλ1]/[(1 − γ) + γλN ] from Theorem 5.3(ii), monotone from 1 at γ = 0 to κ(C) ≈ 54.6 at γ… view at source ↗
Figure 2
Figure 2. Figure 2: Bias–variance decomposition of total direction error along the shrinkage axis, illustrat view at source ↗
Figure 3
Figure 3. Figure 3: Shrinkage trajectory γ 7→ dir(w (p) (γ), w⋆ ) across three difficulty regimes, N = 200. Left (easy, κ(C) ≈ 9): even 50 CRISP sweeps converge; no interior optimum. Centre (moderate, κ(C) ≈ 121, the paper’s base case): a mild interior optimum appears at p = 100 with γ ⋆ ≈ 0.65; 500+ sweeps essentially converge. Right (hard, κ(C) ≈ 3,800, adversarial µ): the interior γ ⋆ (p) is pronounced and moves rightward … view at source ↗
Figure 4
Figure 4. Figure 4: Horizontal plateau-width bars per cell (plateau = set of view at source ↗
Figure 5
Figure 5. Figure 5: Experiment 8: OOS mean Sharpe vs T under structural sector-tilt µ, N = 100. Lines: CRISP γ = 0.7, HRP-Σµ γ = 1, HRP-µ γ = 0.5, direct Markowitz, 1/N, HRP. Dashed horizontal line: oracle Sharpe 0.645. CRISP’s lead widens with T; HRP-Σµ γ = 1 crosses over CRISP at T = 60 view at source ↗
Figure 6
Figure 6. Figure 6: Experiment 11: mean OOS Sharpe over (γ, p) per regime. Lighter shades are higher Sharpe. The (γ=0.5, p=100) cell sits in the high-Sharpe ridge of every panel; the γ = 1 column shows a pronounced interior peak at intermediate p on the spiked regime. diagonal dominance is preserved at any γ < 1. To our knowledge this is the first documented empirical failure mode of Cotton’s allocation on its native problem … view at source ↗
Figure 7
Figure 7. Figure 7: Experiment 11: γ-slices of the (γ, p) surface. γ = 0.3 is flat in p for p ≥ 5; γ = 0.5 plateaus at p ≈ 50; γ = 1.0 has an interior p ⋆ that decreases in T /N view at source ↗
Figure 8
Figure 8. Figure 8: Histogram of the signed cosine cos(wA1, w⋆ ) at T ∈ {60, 240, 1000}, structured sector-tilt signal, γ = 0.5. The distribution is approximately symmetric about zero with standard deviation near 0.78 at every sample size, documenting that Method A1 is not a consistent estimator of the Markowitz direction. C.5 The fix: L 1 normalisation The fix is a single character. Replace the algebraic-sum denominator in (… view at source ↗
read the original abstract

Hierarchical Risk Parity (De Pardo) and the Schur-complement generalization of Cotton are among the most widely adopted regularised portfolio construction methods, yet both are signal-blind: they solve only the minimum-variance problem and cannot accommodate an arbitrary expected-return forecast. This paper introduces three methods that incorporate alpha signals into hierarchical and regularised portfolio construction. HRP-$\mu$ is a hierarchical allocator that accepts an arbitrary signal $\mu$ and nests standard HRP when $\gamma = 0$ and $\mu=\mathbf{1}$. It preserves the tree-based structure of HRP while extending it beyond the minimum-variance setting. HRP-$\Sigma\mu$ strengthens this construction by replacing inverse-variance representatives with recursive local mean-variance optima, thereby using richer within-cluster covariance information at the same $O(N^2)$ asymptotic cost. CRISP (Correlation-Regularised Iterative Shrinkage Portfolios) is an iterative solver for $P_\gamma w = \mu$ with $P_\gamma = (1-\gamma)\operatorname{diag}(\Sigma) + \gamma \Sigma$, so that $\gamma$ interpolates between a diagonal portfolio rule and full Markowitz. At convergence, CRISP is Markowitz applied to a variance-preserving shrunk covariance-diagonal variances unchanged, off-diagonal correlations shrunk-with $\gamma$ tuned for out-of-sample Sharpe rather than covariance-estimation loss. In Monte Carlo experiments across multiple covariance regimes and estimation ratios, HRP-$\mu$ and HRP-$\Sigma\mu$ both outperform plain HRP with HRP-$\Sigma\mu$ consistently improving on HRP-$\mu$. CRISP at intermediate $\gamma$ is the dominant method in both regimes, outperforming HRP, Cotton, Ledoit-Wolf shrinkage, direct Markowitz, and the signal-aware hierarchical methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes two signal-aware extensions to Hierarchical Risk Parity (HRP-μ and HRP-Σμ) that incorporate arbitrary expected-return forecasts while preserving the hierarchical tree structure and O(N²) cost, and introduces CRISP, an iterative solver for the linear system P_γ w = μ where P_γ = (1-γ)diag(Σ) + γ Σ interpolates between a diagonal rule and full Markowitz. Monte Carlo experiments across covariance regimes and estimation ratios are used to claim that HRP-Σμ improves on HRP-μ (which improves on plain HRP) and that CRISP at intermediate γ dominates HRP, the Cotton Schur-complement method, Ledoit-Wolf shrinkage, direct Markowitz, and the new hierarchical variants.

Significance. If the empirical ranking can be reproduced under a protocol that selects γ without reference to the reported test Sharpe, the hierarchical extensions would usefully address the signal-blind limitation of standard HRP while retaining its interpretability and computational scaling; CRISP would supply a simple, variance-preserving shrinkage rule whose tuning can be studied separately. The preservation of exact diagonal variances under the shrinkage is a clean technical feature.

major comments (2)
  1. [Monte Carlo experiments] Monte Carlo experiments: the reported dominance of CRISP at intermediate γ is obtained by choosing γ to maximize realized out-of-sample Sharpe on the very simulation draws whose performance is being reported. Because this selection uses the test metric itself, the superiority over untuned baselines (HRP, Cotton, Ledoit-Wolf, direct Markowitz) is not an out-of-sample prediction and may not generalize to other data-generating processes or live markets.
  2. [CRISP description] CRISP definition and evaluation protocol: the abstract states that γ is tuned for out-of-sample Sharpe rather than for covariance-estimation loss. This choice makes the central performance claim a fitted quantity; any revision must either (a) fix γ on a separate validation set before reporting test Sharpe or (b) demonstrate that a single γ chosen on covariance loss alone still yields the reported ranking.
minor comments (2)
  1. [Abstract] Abstract: the Monte Carlo design is summarized only as “multiple covariance regimes and estimation ratios”; the manuscript should report the exact factor-model parameters, correlation lengths, N/T ratios, number of replications, and whether error bars or statistical tests accompany the Sharpe comparisons.
  2. [CRISP description] Notation: the claim that CRISP leaves diagonal variances unchanged while shrinking off-diagonal correlations should be accompanied by an explicit statement of the fixed-point equation and a short proof that the converged covariance has the same diagonal as the input Σ.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the experimental protocol. We agree that the current tuning of γ for CRISP requires revision to ensure strictly out-of-sample evaluation, and we will update the Monte Carlo design accordingly while preserving the contributions of the hierarchical methods.

read point-by-point responses
  1. Referee: [Monte Carlo experiments] Monte Carlo experiments: the reported dominance of CRISP at intermediate γ is obtained by choosing γ to maximize realized out-of-sample Sharpe on the very simulation draws whose performance is being reported. Because this selection uses the test metric itself, the superiority over untuned baselines (HRP, Cotton, Ledoit-Wolf, direct Markowitz) is not an out-of-sample prediction and may not generalize to other data-generating processes or live markets.

    Authors: We agree that selecting γ to maximize Sharpe on the same simulation draws used for reporting constitutes in-sample tuning with respect to the evaluation metric. In the revision we will introduce a nested validation procedure: for each Monte Carlo trial we will generate separate validation draws, select γ by maximizing validation Sharpe, and evaluate final performance on an independent test set. All tables and figures will be regenerated under this protocol, and the methods section will document the change. We expect the qualitative ranking to be robust, but the quantitative results will be updated to reflect the corrected procedure. revision: yes

  2. Referee: [CRISP description] CRISP definition and evaluation protocol: the abstract states that γ is tuned for out-of-sample Sharpe rather than for covariance-estimation loss. This choice makes the central performance claim a fitted quantity; any revision must either (a) fix γ on a separate validation set before reporting test Sharpe or (b) demonstrate that a single γ chosen on covariance loss alone still yields the reported ranking.

    Authors: We acknowledge that the present abstract and experiments tune γ directly on the out-of-sample Sharpe, rendering the performance claims fitted. We will adopt option (a) by implementing a validation-set selection of γ before test evaluation, as described above. The abstract will be revised to state that γ is chosen on a held-out validation set. For completeness we will also add a short supplementary check using a single γ selected by covariance estimation loss (Frobenius norm to the true Σ), confirming whether the reported dominance persists under that alternative rule. revision: yes

Circularity Check

1 steps flagged

CRISP dominance claim obtained by tuning γ directly on the out-of-sample Sharpe metric within the reported Monte Carlo regimes

specific steps
  1. fitted input called prediction [Abstract]
    "CRISP at intermediate γ is the dominant method in both regimes, outperforming HRP, Cotton, Ledoit-Wolf shrinkage, direct Markowitz, and the signal-aware hierarchical methods. ... with γ tuned for out-of-sample Sharpe rather than covariance-estimation loss."

    γ is chosen to maximize the Sharpe ratio on the identical out-of-sample Monte Carlo periods that are later used to declare CRISP dominant. The reported superiority is therefore the result of fitting the hyperparameter to the evaluation metric rather than an independent prediction or first-principles derivation.

full rationale

The paper's central empirical result—that CRISP at intermediate γ outperforms HRP, Cotton, Ledoit-Wolf, direct Markowitz and the signal-aware hierarchical methods—is produced by selecting γ to maximize realized Sharpe on the out-of-sample periods of the same simulations used for evaluation. This matches the fitted-input-called-prediction pattern: the parameter is optimized against the exact performance quantity whose superiority is then asserted. The methods themselves (HRP-μ, HRP-Σμ, CRISP definition) are derived independently without circularity, but the load-bearing claim of dominance reduces to a tuned quantity inside the experimental design. No self-citation load-bearing, self-definitional equations, or ansatz smuggling appear in the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Claims rest on the domain assumption that hierarchical clustering on correlations remains useful once return signals are introduced, and that variance-preserving shrinkage yields stable out-of-sample portfolios. Gamma is a free parameter whose value is chosen by direct optimization on the target performance metric.

free parameters (1)
  • gamma
    Shrinkage intensity in CRISP; chosen to maximize out-of-sample Sharpe rather than covariance estimation accuracy.
axioms (2)
  • domain assumption Hierarchical clustering on pairwise correlations captures economically meaningful asset groupings for allocation purposes.
    Invoked to justify preservation of the HRP tree structure in the new signal-aware variants.
  • domain assumption Shrinking only off-diagonal correlations while leaving variances unchanged produces a usable regularized covariance for mean-variance optimization.
    Explicit in the definition of P_gamma for CRISP.

pith-pipeline@v0.9.0 · 5628 in / 1509 out tokens · 74742 ms · 2026-05-08T04:55:59.129424+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 2 canonical work pages

  1. [1]

    Santa-Clara, 2015, Beyond the Carry Trade: Optimal Currency Portfolios, Journal of Financial and Quantitative Analysis 50(5), 1037--1056

    Barroso, P., and P. Santa-Clara, 2015, Beyond the Carry Trade: Optimal Currency Portfolios, Journal of Financial and Quantitative Analysis 50(5), 1037--1056

  2. [2]

    Litterman, 1992, Global Portfolio Optimization, Financial Analysts Journal 48(5), 28--43

    Black, F., and R. Litterman, 1992, Global Portfolio Optimization, Financial Analysts Journal 48(5), 28--43

  3. [3]

    Brandt, M. W., P. Santa-Clara, and R. Valkanov, 2009, Parametric Portfolio Policies: Exploiting Characteristics in the Cross-Section of Equity Returns, Review of Financial Studies 22(9), 3411--3447

  4. [4]

    Wiesel, Y

    Chen, Y., A. Wiesel, Y. C. Eldar, and A. O. Hero, 2010, Shrinkage Algorithms for MMSE Covariance Estimation, IEEE Transactions on Signal Processing 58(10), 5016--5029

  5. [5]

    Clarabel: An interior-point solver for conic programs with quadratic objectives,

    Goulart, P. J., and Y. Chen, 2024, Clarabel: An Interior-Point Solver for Conic Programs with Quadratic Objectives, arXiv preprint arXiv:2405.12762

  6. [6]

    Cotton, P., 2024, Schur Complementary Allocation: A Unification of Hierarchical Risk Parity and Minimum Variance Portfolios, arXiv preprint arXiv:2411.05807

  7. [7]

    Garlappi, and R

    DeMiguel, V., L. Garlappi, and R. Uppal, 2009, Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?, Review of Financial Studies 22(5), 1915--1953

  8. [8]

    F., and K

    Fama, E. F., and K. R. French, 1993, Common Risk Factors in the Returns on Stocks and Bonds, Journal of Financial Economics 33(1), 3--56

  9. [9]

    Liao, and M

    Fan, J., Y. Liao, and M. Mincheva, 2013, Large Covariance Estimation by Thresholding Principal Orthogonal Complements, Journal of the Royal Statistical Society, Series B 75(4), 603--680

  10. [10]

    G\^arleanu, N., and L. H. Pedersen, 2013, Dynamic Trading with Predictable Returns and Transaction Costs, Journal of Finance 68(6), 2309--2340

  11. [11]

    C., and R

    Grinold, R. C., and R. N. Kahn, 1999, Active Portfolio Management, 2nd edition, McGraw-Hill

  12. [12]

    Hackbusch, W., 2016, Iterative Solution of Large Sparse Systems of Equations, 2nd edition, Springer Applied Mathematical Sciences vol. 95

  13. [13]

    Ma, 2003, Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps, Journal of Finance 58(4), 1651--1683

    Jagannathan, R., and T. Ma, 2003, Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps, Journal of Finance 58(4), 1651--1683

  14. [14]

    Nagel, and S

    Kozak, S., S. Nagel, and S. Santosh, 2020, Shrinking the Cross-Section, Journal of Financial Economics 135(2), 271--292

  15. [15]

    Wolf, 2003, Improved Estimation of the Covariance Matrix of Stock Returns with an Application to Portfolio Selection, Journal of Empirical Finance 10(5), 603--621

    Ledoit, O., and M. Wolf, 2003, Improved Estimation of the Covariance Matrix of Stock Returns with an Application to Portfolio Selection, Journal of Empirical Finance 10(5), 603--621

  16. [16]

    Wolf, 2004, Honey, I Shrunk the Sample Covariance Matrix, Journal of Portfolio Management 30(4), 110--119

    Ledoit, O., and M. Wolf, 2004, Honey, I Shrunk the Sample Covariance Matrix, Journal of Portfolio Management 30(4), 110--119

  17. [17]

    Wolf, 2017, Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks, Review of Financial Studies 30(12), 4349--4388

    Ledoit, O., and M. Wolf, 2017, Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selection: Markowitz Meets Goldilocks, Review of Financial Studies 30(12), 4349--4388

  18. [18]

    Wolf, 2020, Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices, Annals of Statistics 48(5), 3043--3065

    Ledoit, O., and M. Wolf, 2020, Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices, Annals of Statistics 48(5), 3043--3065

  19. [19]

    L\'opez de Prado, M., 2016, Building Diversified Portfolios that Outperform Out of Sample, Journal of Portfolio Management 42(4), 59--69

  20. [20]

    L\'opez de Prado, M., and M. J. Lewis, 2019, Detection of False Investment Strategies Using Unsupervised Learning Methods, Quantitative Finance 19(9), 1555--1565

  21. [21]

    A., and L

    Marchenko, V. A., and L. A. Pastur, 1967, Distribution of Eigenvalues for Some Sets of Random Matrices, Matematicheskii Sbornik 72(4), 507--536

  22. [22]

    Markowitz, H., 1952, Portfolio Selection, Journal of Finance 7(1), 77--91

  23. [23]

    O., 1989, The Markowitz Optimization Enigma: Is `Optimized' Optimal?, Financial Analysts Journal 45(1), 31--42

    Michaud, R. O., 1989, The Markowitz Optimization Enigma: Is `Optimized' Optimal?, Financial Analysts Journal 45(1), 31--42

  24. [24]

    Banjac, P

    Stellato, B., G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, 2020, OSQP: An Operator Splitting Solver for Quadratic Programs, Mathematical Programming Computation 12(4), 637--672

  25. [25]

    M., 1954, On the Linear Iteration Procedures for Symmetric Matrices, Rendiconti di Matematica e delle sue Applicazioni 14, 140--163

    Ostrowski, A. M., 1954, On the Linear Iteration Procedures for Symmetric Matrices, Rendiconti di Matematica e delle sue Applicazioni 14, 140--163

  26. [26]

    Raffinot, T., 2018, Hierarchical Clustering-Based Asset Allocation, Journal of Portfolio Management 44(2), 89--99

  27. [27]

    Saad, Y., 2003, Iterative Methods for Sparse Linear Systems, 2nd edition, SIAM

  28. [28]

    Shumway, T., 1997, The Delisting Bias in CRSP Data, Journal of Finance 52(1), 327--340

  29. [29]

    Stein, C., 1956, Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution, Proc.\ Third Berkeley Symposium on Mathematical Statistics and Probability, 197--206

  30. [30]

    S., 2000, Matrix Iterative Analysis, 2nd edition, Springer Series in Computational Mathematics vol

    Varga, R. S., 2000, Matrix Iterative Analysis, 2nd edition, Springer Series in Computational Mathematics vol. 27

  31. [31]

    Vershynin, R., 2012, Introduction to the Non-Asymptotic Analysis of Random Matrices, in Compressed Sensing: Theory and Applications, edited by Y. C. Eldar and G. Kutyniok, Cambridge University Press, pp. 210--268

  32. [32]

    M., 1971, Iterative Solution of Large Linear Systems, Academic Press

    Young, D. M., 1971, Iterative Solution of Large Linear Systems, Academic Press