Recognition: unknown
Post-Screening Portfolio Selection
Pith reviewed 2026-05-10 05:30 UTC · model grok-4.3
The pith
A two-step post-screening approach using Lasso allows effective mean-variance portfolio selection from many assets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By screening assets through Lasso regression of a constant on excess returns without an intercept and then estimating portfolio weights on the reduced set, we achieve a practical and theoretically grounded solution to high-dimensional mean-variance portfolio selection, with an extension that handles strong factors by defactoring beforehand.
What carries the argument
Post-screening portfolio selection (PS²), the two-step process where Lasso screens assets via constant-on-excess-returns regression without intercept, enabling low-dimensional optimization on the subset.
Load-bearing premise
The Lasso screening must identify a subset of assets for which the subsequent mean-variance optimization recovers the relevant risk-return tradeoffs without significant distortion from the selection process.
What would settle it
A simulation study in which the true sparse support and optimal portfolio are known, showing that the post-screening method produces portfolios with substantially lower realized Sharpe ratios than the oracle benchmark, would falsify the effectiveness of the approach.
Figures
read the original abstract
We propose post-screening portfolio selection (PS$^2$), a two-step framework for high-dimensional mean--variance investing. First, assets are screened by Lasso-type regression of a constant on excess returns without an intercept. Second, portfolio weights are estimated on the selected set using standard low-dimensional methods. Because strong factors can destroy sparsity in real data, we further introduce PS$^2$ with factors (FPS$^2$), which defactors returns before screening and allows factor investing in the final step. We establish theoretical guarantees, and simulations and an empirical application show competitive performance, especially when sparse screening is appropriate or strong factors are explicitly accommodated.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes post-screening portfolio selection (PS²), a two-step framework for high-dimensional mean-variance portfolio optimization. Assets are screened via Lasso-type regression of a constant on excess returns (no intercept), after which standard low-dimensional mean-variance weights are estimated on the selected subset. An extension FPS² defactors returns before screening to accommodate strong factors and permits explicit factor investing in the final step. Theoretical guarantees are claimed for both variants, supported by simulation studies and an empirical application demonstrating competitive performance when sparsity is appropriate.
Significance. If the central claims hold, the framework provides a computationally tractable route to high-dimensional mean-variance investing by exploiting sparsity in asset means while retaining standard MV estimation on the reduced set. The explicit treatment of factors via FPS² addresses a known practical limitation. The combination of Lasso-based screening, theoretical bounds, and real-data validation could be useful for practitioners facing large asset universes, provided the screening step reliably preserves near-optimal MV properties.
major comments (2)
- [§2] §2 (screening step): The Lasso objective min_β ||1 − Rβ||₂² + λ||β||₁ selects assets solely according to their contribution to matching a target mean vector. Because the subsequent mean-variance problem depends on both the mean vector and the full covariance matrix, the selected support need not contain the assets that are most valuable for the efficient frontier. When covariances are dense or the MV-optimal weights are not sparse in the mean-weighted sense, the post-screening MV estimator can inherit both an incomplete feasible set and selection bias. The paper should supply a concrete bound or counter-example clarifying when the mean-only screening still yields near-optimal MV performance.
- [§3] §3 (theoretical guarantees): The stated error bounds appear to combine standard Lasso consistency for the screening step with classical low-dimensional MV rates on the selected set. It is not clear how post-selection bias or the fact that screening ignores covariance structure is controlled in the final portfolio-error bound. If the selected set systematically omits high-Sharpe assets, the claimed rates may not hold uniformly; a revised proof sketch or additional assumption on the joint distribution of means and covariances is needed to make the guarantees load-bearing for the central claim.
minor comments (2)
- [§2] Notation for the screening regression (constant on excess returns, no intercept) should be stated explicitly with the precise loss function and penalty term in the first paragraph of §2 to avoid ambiguity with standard Lasso formulations.
- [Simulation section] In the simulation section, the choice of λ (e.g., cross-validation versus theoretical rate) and its sensitivity should be reported more transparently, as this directly affects the sparsity level and downstream MV performance.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. The comments correctly identify that our screening step is mean-focused and that the theoretical bounds require care with post-selection effects. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§2] §2 (screening step): The Lasso objective min_β ||1 − Rβ||₂² + λ||β||₁ selects assets solely according to their contribution to matching a target mean vector. Because the subsequent mean-variance problem depends on both the mean vector and the full covariance matrix, the selected support need not contain the assets that are most valuable for the efficient frontier. When covariances are dense or the MV-optimal weights are not sparse in the mean-weighted sense, the post-screening MV estimator can inherit both an incomplete feasible set and selection bias. The paper should supply a concrete bound or counter-example clarifying when the mean-only screening still yields near-optimal MV performance.
Authors: We agree that the screening criterion is driven by the mean vector and does not explicitly optimize for the covariance structure. This choice is motivated by the high-dimensional regime in which means are sparse while covariances may be dense; under such sparsity the assets with non-zero means dominate the location of the efficient frontier. Nevertheless, the referee's concern is valid. In the revision we will add a short subsection to §2 that (i) presents a simple counter-example with two zero-mean assets that are perfectly negatively correlated and therefore valuable for risk reduction, and (ii) derives an explicit excess-risk bound showing that the sub-optimality is controlled by the Lasso estimation error times the restricted eigenvalue of the selected covariance submatrix. These additions will clarify the regime in which PS² remains near-optimal. revision: yes
-
Referee: [§3] §3 (theoretical guarantees): The stated error bounds appear to combine standard Lasso consistency for the screening step with classical low-dimensional MV rates on the selected set. It is not clear how post-selection bias or the fact that screening ignores covariance structure is controlled in the final portfolio-error bound. If the selected set systematically omits high-Sharpe assets, the claimed rates may not hold uniformly; a revised proof sketch or additional assumption on the joint distribution of means and covariances is needed to make the guarantees load-bearing for the central claim.
Authors: The current proof proceeds by first establishing that the Lasso step recovers the support of the mean vector with high probability, after which the low-dimensional MV estimator on the fixed selected set inherits standard rates. Post-selection bias is avoided because the final mean and covariance estimates are computed on the entire sample once the support is fixed. We acknowledge, however, that uniformity over all possible covariance configurations is not guaranteed without further conditions. In the revised version we will introduce an additional assumption (new Assumption 4) that the covariance matrix satisfies a uniform restricted-eigenvalue condition on the union of the mean support and any high-Sharpe assets, and we will supply a revised proof sketch in the appendix that bounds the probability of omitting such assets by the Lasso rate. This makes the guarantees conditional on the stated joint-distribution assumption, which we view as a natural strengthening rather than a weakening of the result. revision: partial
Circularity Check
No circularity: derivation rests on external Lasso theory and standard MV results
full rationale
The paper defines PS² as a two-step procedure (Lasso screening of a constant on excess returns without intercept, followed by low-dimensional MV optimization on the selected subset) and extends it to FPS² with explicit defactoring. Theoretical guarantees are invoked from standard high-dimensional Lasso consistency results and classical mean-variance theory rather than being derived from quantities fitted inside the same dataset. Simulations and the empirical application serve as external checks, not as self-referential predictions that recover the screening criterion by construction. No self-citation chain, fitted-input renaming, or ansatz smuggling appears in the load-bearing steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of Econometrics , year=2008, volume=
De Mol, Christine and Giannone, Domenico and Reichlin, Lucrezia , title=. Journal of Econometrics , year=2008, volume=
2008
-
[2]
and Fan, Y
Fan, J. and Fan, Y. , title =. Annals of Statistics , volume =. 2008 , pages =
2008
-
[3]
, title =
Lv, J. , title =. Annals of Statistics , volume =. 2013 , pages =
2013
-
[4]
and Lv, J
Fan, Y. and Lv, J. , title =. Journal of the American Statistical Association , volume =. 2013 , pages =
2013
-
[5]
The Annals of Statistics , number =
Yang Ning and Han Liu , title =. The Annals of Statistics , number =
-
[6]
Journal of the Royal Statistical Society Series B , volume=
Sure independence screening for ultrahigh dimensional feature space (with discussion) , author=. Journal of the Royal Statistical Society Series B , volume=
-
[7]
Journal of American Statistical Association , volume =
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence (with discussion) , author =. Journal of American Statistical Association , volume =
-
[8]
Journal of Bussiness & Economic Statistics, to appear , year=
Inference in Approximately Sparse Correlated Random Effects Probit Models , author=. Journal of Bussiness & Economic Statistics, to appear , year=
-
[9]
arXiv preprint arXiv:1703.00469 , year=
Confidence Bands for Coefficients in High Dimensional Linear Models with Error-in-Variables , author=. arXiv preprint arXiv:1703.00469 , year=
-
[10]
Econometrics Journal , volume=
Double/Debiased Machine Learning for Treatment and Structural Parameters , author=. Econometrics Journal , volume=
-
[11]
and Newey, W
Chernozhukov, V. and Newey, W. and Robins, J. , journal=. Double/De-Biased Machine Learning Using Regularized
-
[12]
Cemmap working paper CWP40/18, The Institute for Fiscal Studies, Department of Economics, UCL , year=
Simultaneous Inference for Best Linear Predictor of the Conditional Average Treatment Effect and Other Structural Functions , author=. Cemmap working paper CWP40/18, The Institute for Fiscal Studies, Department of Economics, UCL , year=
-
[13]
Journal of the Royal Statistical Society Series B , volume=
Goodness-of-Fit Tests for High Dimensional Linear Models , author=. Journal of the Royal Statistical Society Series B , volume=
-
[14]
arXiv preprint arXiv:1804.03274 , year=
Efficient Predictor Ranking and False Discovery Proportion Control in High-Dimensional Regression , author=. arXiv preprint arXiv:1804.03274 , year=
-
[15]
Journal of the American Statistical Association , volume=
Simultaneous Inference for High-Dimensional Linear Models , author=. Journal of the American Statistical Association , volume=
-
[16]
arXiv preprint arXiv:1708.05499 , year=
Inference for High-Dimensional Instrumental Variables Regression , author=. arXiv preprint arXiv:1708.05499 , year=
-
[17]
arXiv preprint arXiv:1807.10100 , year=
Two-Step Estimation and Inference with Possibly Many Included Covariates , author=. arXiv preprint arXiv:1807.10100 , year=
-
[18]
Journal of the American Statistical Association, to appear , year=
Linear Hypothesis Testing in Dense High-Dimensional Linear Models , author=. Journal of the American Statistical Association, to appear , year=
-
[19]
Journal of the Royal Statistical Society Series B , volume=
Confidence Intervals for Causal Effects with Invalid Instruments by Using Two-Stage Hard Thresholding with Voting , author=. Journal of the Royal Statistical Society Series B , volume=
-
[20]
The Annals of Statistics , volume=
Accuracy Assessment for High-Dimensional Linear Regression , author=. The Annals of Statistics , volume=
-
[21]
Journal of the American Statistical Association, to appear , year=
Inference in Linear Regression Models with Many Covariates and Heteroscedasticity , author=. Journal of the American Statistical Association, to appear , year=
-
[22]
IEEE Transactions on Signal Processing , volume=
Asymptotic Confidence Regions for High-Dimensional Structured Sparsity , author=. IEEE Transactions on Signal Processing , volume=
-
[23]
Algorithmic Learning in a RandomWorld , author=
-
[24]
Journal of the American Statistical Association, to appear , year=
Distribution-Free Predictive Inference for Regression , author=. Journal of the American Statistical Association, to appear , year=
-
[25]
Journal of the American Statistical Association, to appear , year=
Sparsity Oriented Importance Learning for High-Dimensional Linear Regression , author=. Journal of the American Statistical Association, to appear , year=
-
[26]
Annals of Statistics , volume=
Selective Inference with a Randomized Response , author=. Annals of Statistics , volume=
-
[27]
Econometric Theory , volume=
Uniform Inference in High-Dimensional Dynamic Panel Data Models with Approximately Sparse Fixed Effects , author=. Econometric Theory , volume=
-
[28]
SSRN: https://ssrn.com/abstract=2665374 , year=
High-Dimensional Panel Data with Time Heterogeneity: Estimation and Inference , author=. SSRN: https://ssrn.com/abstract=2665374 , year=
-
[29]
Journal of Econometrics , volume=
Inferences in Panel Data with Interactive Effects Using Large Covariance Matrices , author=. Journal of Econometrics , volume=
-
[30]
arXiv preprint arXiv:1806.05081 , year=
LASSO-Driven Inference in Time and Space , author=. arXiv preprint arXiv:1806.05081 , year=
-
[31]
Journal of Econometrics , volume=
Minimum Distance Approach to Inference with Many Instruments , author=. Journal of Econometrics , volume=
-
[32]
and Chernozhukov, V
Belloni, A. and Chernozhukov, V. and Chetverikov, D. and Hansen, C. and Kato, K. , journal=. High-Dimensional Econometrics and Regularized
-
[33]
Machine Learning , volume=
Random forests , author=. Machine Learning , volume=
-
[34]
, title =
Billingsley, P. , title =. 1995 , publisher =
1995
-
[35]
arXiv preprint arXiv:1705.03604 , year=
Nonuniformity of p-values can occur early in diverging dimensions , author=. arXiv preprint arXiv:1705.03604 , year=
-
[36]
Annals of Statistics , volume=
On asymptotically optimal confidence regions and tests for high-dimensional models , author=. Annals of Statistics , volume=
-
[37]
Annals of Statistics , volume=
Valid post-selection inference , author=. Annals of Statistics , volume=
-
[38]
Journal of Machine Learning Research , volume=
Confidence intervals and hypothesis testing for high-dimensional regression , author=. Journal of Machine Learning Research , volume=
-
[39]
Annals of Statistics , volume=
Exact post-selection inference, with application to the lasso , author=. Annals of Statistics , volume=
-
[40]
Optimal inference after model selection.arXiv preprint arXiv:1410.2597, 2014
Optimal inference after model selection , author=. arXiv preprint arXiv:1410.2597 , year=
-
[41]
Journal of the Royal Statistical Society Series B , volume=
Confidence intervals for low dimensional parameters in high dimensional linear models , author=. Journal of the Royal Statistical Society Series B , volume=
-
[42]
Econometric Theory , volume=
Model selection and inference: Facts and fiction , author=. Econometric Theory , volume=
-
[43]
Annals of Statistics , volume=
Can one estimate the conditional distribution of post-model-selection estimators? , author=. Annals of Statistics , volume=
-
[44]
Leeb, H. and P. Sparse estimators and the oracle property, or the return of Hodges. Journal of Econometrics , volume=
-
[45]
On the distribution of penalized maximum likelihood estimators: The
P. On the distribution of penalized maximum likelihood estimators: The. Journal of Multivariate Analysis , volume=
-
[46]
and Delyon, B
Bercu, B. and Delyon, B. and Rio, E. , title =. 2015 , publisher =
2015
-
[47]
Horn, R. A. and Johnson, C. R. , title =. 2012 , publisher =
2012
-
[48]
, booktitle =
Vershynin, R. , booktitle =
-
[49]
Vizcarra, A. B. and Viens, F. G. , booktitle =. Some Applications of the
-
[50]
2010 , publisher=
Probability: Theory and Examples (4th ed.) , author=. 2010 , publisher=
2010
-
[51]
2017 , publisher=
High Dimensional Statistics , author=. 2017 , publisher=
2017
-
[52]
Econometrica , volume=
Eigenvalue ratio test for the number of factors , author=. Econometrica , volume=
-
[53]
and Demirkaya, E
Fan, Y. and Demirkaya, E. and Li, G. and Lv, J. , journal=
-
[54]
Econometrica, to appear , year=
A One Covariate at a Time, Multiple Testing Approach to Variable Selection in High-Dimensional Linear Regression Models , author=. Econometrica, to appear , year=
-
[55]
Annals of Statistics , volume=
Controlling the false discovery rate via knockoffs , author=. Annals of Statistics , volume=
-
[56]
arXiv preprint arXiv:1602.03574 , year=
A knockoff filter for high-dimensional selective inference , author=. arXiv preprint arXiv:1602.03574 , year=
-
[57]
2004 , journal=
Consistency for a simple model of random forests , author=. 2004 , journal=
2004
-
[58]
Journal of the Royal Statistical Society Series B , volume=
Controlling the false discovery rate: a practical and powerful approach to multiple testing , author=. Journal of the Royal Statistical Society Series B , volume=
-
[59]
Annals of Statistics , volume=
The control of the false discovery rate in multiple testing under dependency , author=. Annals of Statistics , volume=
-
[60]
Journal of the Royal Statistical Society Series B , volume=
Discovering the false discovery rate , author=. Journal of the Royal Statistical Society Series B , volume=
-
[61]
Studi in Onore del Professore Salvatore Ortu Carboni , pages=
Il calcolo delle assicurazioni su gruppi di teste , author=. Studi in Onore del Professore Salvatore Ortu Carboni , pages=
-
[62]
Scandinavian Journal of Statistics , volume=
A simple sequentially rejective multiple test procedure , author=. Scandinavian Journal of Statistics , volume=
-
[63]
Journal of the American Statistical Association , volume=
Exact and approximate stepdown methods for multiple hypothesis testing , author=. Journal of the American Statistical Association , volume=
-
[64]
Journal of the Royal Statistical Society Series B , volume=
Panning for gold: `model-X' knockoffs for high dimensional controlled variable selection , author=. Journal of the Royal Statistical Society Series B , volume=
-
[65]
arXiv preprint arXiv:1801.03896 , year=
Robust inference with knockoffs , author=. arXiv preprint arXiv:1801.03896 , year=
-
[66]
Journal of the Royal Statistical Society Series B , pages=
Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society Series B , pages=
-
[67]
Journal of Business & Economic Statistics , volume=
Generalized shrinkage methods for forecasting using many predictors , author=. Journal of Business & Economic Statistics , volume=
-
[68]
Journal of Business & Economic Statistics , volume=
Macroeconomic forecasting using diffusion indexes , author=. Journal of Business & Economic Statistics , volume=
-
[69]
Econometrica , volume=
Determining the number of factors in approximate factor models , author=. Econometrica , volume=
-
[70]
McCracken and Serena Ng , title =
Michael W. McCracken and Serena Ng , title =. Journal of Business & Economic Statistics , volume =
-
[71]
, title =
Bai, J. , title =. Econometrica , year =
-
[72]
Journal of Business & Economic Statistics , volume=
Comparing predictive accuracy , author=. Journal of Business & Economic Statistics , volume=
-
[73]
Negahban, S. N. and Ravikumar, P. and Wainwright, M. J. and Yu, B. , journal=. A Unified Framework for High-Dimensional Analysis of
-
[74]
Econometrics Journal , volume=
High-dimensional macroeconomic forecasting and variable selection via penalized regression , author=. Econometrics Journal , volume=
-
[75]
Review of Economics and Statistics , volume=
Determining the number of factors from empirical distribution of eigenvalues , author=. Review of Economics and Statistics , volume=
-
[76]
and Fan, Y
Uematsu,Y. and Fan, Y. and Chen, K. and Lv, J. and Lin, W. , journal=
-
[77]
and Lv, J
Fan, Y. and Lv, J. and Sharifvaghefi, M. and Uematsu,Y. , journal=
-
[78]
Econometrica , volume=
Arbitrage, Factor Structure and Mean-Variance Analysis in Large Asset Markets , author=. Econometrica , volume=
-
[79]
Journal of Financial Economics , volume=
Performance Measurement With the Arbitrage Pricing Theory: A New Framework for Analysis , author=. Journal of Financial Economics , volume=
-
[80]
Annals of Statistics , volume=
Adaptive Robust Variable Selection , author=. Annals of Statistics , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.