arxiv: 2604.17593 · v1 · submitted 2026-04-19 · 💱 q-fin.PM · stat.ME

Recognition: unknown

Post-Screening Portfolio Selection

Shinya Tanaka, Yoshimasa Uematsu

Pith reviewed 2026-05-10 05:30 UTC · model grok-4.3

classification 💱 q-fin.PM stat.ME

keywords high-dimensional portfolio selectionLasso screeningmean-variance optimizationpost-screeningfactor modelssparse regressionportfolio performance

0 comments

The pith

A two-step post-screening approach using Lasso allows effective mean-variance portfolio selection from many assets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a two-step framework for high-dimensional mean-variance portfolio selection. In the first step, assets are screened by performing a Lasso regression of a constant on their excess returns, omitting the intercept. In the second step, standard mean-variance optimization is applied to the screened assets. This is extended to a factor version by defactoring returns first. The method provides theoretical guarantees and demonstrates strong performance in simulations and empirical tests when sparsity or factor structure is present.

Core claim

By screening assets through Lasso regression of a constant on excess returns without an intercept and then estimating portfolio weights on the reduced set, we achieve a practical and theoretically grounded solution to high-dimensional mean-variance portfolio selection, with an extension that handles strong factors by defactoring beforehand.

What carries the argument

Post-screening portfolio selection (PS²), the two-step process where Lasso screens assets via constant-on-excess-returns regression without intercept, enabling low-dimensional optimization on the subset.

Load-bearing premise

The Lasso screening must identify a subset of assets for which the subsequent mean-variance optimization recovers the relevant risk-return tradeoffs without significant distortion from the selection process.

What would settle it

A simulation study in which the true sparse support and optimal portfolio are known, showing that the post-screening method produces portfolios with substantially lower realized Sharpe ratios than the oracle benchmark, would falsify the effectiveness of the approach.

Figures

Figures reproduced from arXiv: 2604.17593 by Shinya Tanaka, Yoshimasa Uematsu.

**Figure 1.** Figure 1: Conceptual diagram of PS2 : we first screen a large set of potential assets to select eligible candidates, then form the portfolio using only the selected assets. 1.2 Modifying PS2 : Strong factors and the failure of sparse screening The success of PS2 hinges on an economically meaningful form of sparsity: the population efficient portfolio weight must load on only a relatively small subset of assets, at l… view at source ↗

**Figure 2.** Figure 2: Correlation Matrix before/after defactoring: Original (Left), defactored (Right) [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 3.** Figure 3: Power of the Lasso when all the N = 500 elements of β(1) are nonzeros for each pair of T and signal strength. Under this setting, the Lasso should find all entries to be nonzero; however, when the signal is strong, it identifies only a small fraction as nonzero even with reasonably large sample sizes. 22 [PITH_FULL_IMAGE:figures/full_fig_p055_3.png] view at source ↗

read the original abstract

We propose post-screening portfolio selection (PS$^2$), a two-step framework for high-dimensional mean--variance investing. First, assets are screened by Lasso-type regression of a constant on excess returns without an intercept. Second, portfolio weights are estimated on the selected set using standard low-dimensional methods. Because strong factors can destroy sparsity in real data, we further introduce PS$^2$ with factors (FPS$^2$), which defactors returns before screening and allows factor investing in the final step. We establish theoretical guarantees, and simulations and an empirical application show competitive performance, especially when sparse screening is appropriate or strong factors are explicitly accommodated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Lasso screening here targets a sparse fit to constant returns and may miss assets that matter most for the covariance part of mean-variance optimality.

read the letter

The main thing to know is that the paper gives a two-step method: first screen assets by running Lasso on a constant target against excess returns with no intercept, then run ordinary mean-variance optimization on the reduced set. They add a defactored version (FPS2) that removes strong factors before screening and puts them back at the end. This specific pipeline is new and gives a clean way to handle more assets than observations without full high-dimensional solvers. Their simulations and one empirical example show it can compete with existing methods when the data really is sparse or factor-driven. That is useful framing for a common practical problem in quant finance. The soft spot is the screening step itself. The Lasso criterion fits portfolio returns to a constant, so it leans heavily on the vector of means and does not directly incorporate the covariance structure that determines which assets actually improve the efficient frontier. In markets where correlations are dense or high-Sharpe assets are not the ones with the largest means, the selected subset can drop useful assets or keep noise, and the low-dimensional step cannot fix a bad support. The defactoring step changes both means and covariances in the residuals, yet the paper does not fully show that the final mean-variance solution stays close to the original high-dimensional optimum. The theoretical guarantees are stated but rest on conditions that may not hold when covariances are realistic rather than sparse. This work is for people who build or study high-dimensional portfolios and want a simple screening tool. A reader already familiar with Lasso in finance will see the practical angle and the factor handling. It deserves peer review because the core idea is clear and the empirical checks are there, even if the selection bias question needs more attention in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes post-screening portfolio selection (PS²), a two-step framework for high-dimensional mean-variance portfolio optimization. Assets are screened via Lasso-type regression of a constant on excess returns (no intercept), after which standard low-dimensional mean-variance weights are estimated on the selected subset. An extension FPS² defactors returns before screening to accommodate strong factors and permits explicit factor investing in the final step. Theoretical guarantees are claimed for both variants, supported by simulation studies and an empirical application demonstrating competitive performance when sparsity is appropriate.

Significance. If the central claims hold, the framework provides a computationally tractable route to high-dimensional mean-variance investing by exploiting sparsity in asset means while retaining standard MV estimation on the reduced set. The explicit treatment of factors via FPS² addresses a known practical limitation. The combination of Lasso-based screening, theoretical bounds, and real-data validation could be useful for practitioners facing large asset universes, provided the screening step reliably preserves near-optimal MV properties.

major comments (2)

[§2] §2 (screening step): The Lasso objective min_β ||1 − Rβ||₂² + λ||β||₁ selects assets solely according to their contribution to matching a target mean vector. Because the subsequent mean-variance problem depends on both the mean vector and the full covariance matrix, the selected support need not contain the assets that are most valuable for the efficient frontier. When covariances are dense or the MV-optimal weights are not sparse in the mean-weighted sense, the post-screening MV estimator can inherit both an incomplete feasible set and selection bias. The paper should supply a concrete bound or counter-example clarifying when the mean-only screening still yields near-optimal MV performance.
[§3] §3 (theoretical guarantees): The stated error bounds appear to combine standard Lasso consistency for the screening step with classical low-dimensional MV rates on the selected set. It is not clear how post-selection bias or the fact that screening ignores covariance structure is controlled in the final portfolio-error bound. If the selected set systematically omits high-Sharpe assets, the claimed rates may not hold uniformly; a revised proof sketch or additional assumption on the joint distribution of means and covariances is needed to make the guarantees load-bearing for the central claim.

minor comments (2)

[§2] Notation for the screening regression (constant on excess returns, no intercept) should be stated explicitly with the precise loss function and penalty term in the first paragraph of §2 to avoid ambiguity with standard Lasso formulations.
[Simulation section] In the simulation section, the choice of λ (e.g., cross-validation versus theoretical rate) and its sensitivity should be reported more transparently, as this directly affects the sparsity level and downstream MV performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments correctly identify that our screening step is mean-focused and that the theoretical bounds require care with post-selection effects. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§2] §2 (screening step): The Lasso objective min_β ||1 − Rβ||₂² + λ||β||₁ selects assets solely according to their contribution to matching a target mean vector. Because the subsequent mean-variance problem depends on both the mean vector and the full covariance matrix, the selected support need not contain the assets that are most valuable for the efficient frontier. When covariances are dense or the MV-optimal weights are not sparse in the mean-weighted sense, the post-screening MV estimator can inherit both an incomplete feasible set and selection bias. The paper should supply a concrete bound or counter-example clarifying when the mean-only screening still yields near-optimal MV performance.

Authors: We agree that the screening criterion is driven by the mean vector and does not explicitly optimize for the covariance structure. This choice is motivated by the high-dimensional regime in which means are sparse while covariances may be dense; under such sparsity the assets with non-zero means dominate the location of the efficient frontier. Nevertheless, the referee's concern is valid. In the revision we will add a short subsection to §2 that (i) presents a simple counter-example with two zero-mean assets that are perfectly negatively correlated and therefore valuable for risk reduction, and (ii) derives an explicit excess-risk bound showing that the sub-optimality is controlled by the Lasso estimation error times the restricted eigenvalue of the selected covariance submatrix. These additions will clarify the regime in which PS² remains near-optimal. revision: yes
Referee: [§3] §3 (theoretical guarantees): The stated error bounds appear to combine standard Lasso consistency for the screening step with classical low-dimensional MV rates on the selected set. It is not clear how post-selection bias or the fact that screening ignores covariance structure is controlled in the final portfolio-error bound. If the selected set systematically omits high-Sharpe assets, the claimed rates may not hold uniformly; a revised proof sketch or additional assumption on the joint distribution of means and covariances is needed to make the guarantees load-bearing for the central claim.

Authors: The current proof proceeds by first establishing that the Lasso step recovers the support of the mean vector with high probability, after which the low-dimensional MV estimator on the fixed selected set inherits standard rates. Post-selection bias is avoided because the final mean and covariance estimates are computed on the entire sample once the support is fixed. We acknowledge, however, that uniformity over all possible covariance configurations is not guaranteed without further conditions. In the revised version we will introduce an additional assumption (new Assumption 4) that the covariance matrix satisfies a uniform restricted-eigenvalue condition on the union of the mean support and any high-Sharpe assets, and we will supply a revised proof sketch in the appendix that bounds the probability of omitting such assets by the Lasso rate. This makes the guarantees conditional on the stated joint-distribution assumption, which we view as a natural strengthening rather than a weakening of the result. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation rests on external Lasso theory and standard MV results

full rationale

The paper defines PS² as a two-step procedure (Lasso screening of a constant on excess returns without intercept, followed by low-dimensional MV optimization on the selected subset) and extends it to FPS² with explicit defactoring. Theoretical guarantees are invoked from standard high-dimensional Lasso consistency results and classical mean-variance theory rather than being derived from quantities fitted inside the same dataset. Simulations and the empirical application serve as external checks, not as self-referential predictions that recover the screening criterion by construction. No self-citation chain, fitted-input renaming, or ansatz smuggling appears in the load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; full text unavailable, so free parameters, axioms, and invented entities cannot be exhaustively audited. The Lasso regularization parameter is implicitly present as a tuning choice, and standard assumptions on sparsity and factor structure are invoked but not enumerated.

pith-pipeline@v0.9.0 · 5393 in / 1266 out tokens · 71630 ms · 2026-05-10T05:30:30.159771+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

220 extracted references · 15 canonical work pages

[1]

Journal of Econometrics , year=2008, volume=

De Mol, Christine and Giannone, Domenico and Reichlin, Lucrezia , title=. Journal of Econometrics , year=2008, volume=

2008
[2]

and Fan, Y

Fan, J. and Fan, Y. , title =. Annals of Statistics , volume =. 2008 , pages =

2008
[3]

, title =

Lv, J. , title =. Annals of Statistics , volume =. 2013 , pages =

2013
[4]

and Lv, J

Fan, Y. and Lv, J. , title =. Journal of the American Statistical Association , volume =. 2013 , pages =

2013
[5]

The Annals of Statistics , number =

Yang Ning and Han Liu , title =. The Annals of Statistics , number =
[6]

Journal of the Royal Statistical Society Series B , volume=

Sure independence screening for ultrahigh dimensional feature space (with discussion) , author=. Journal of the Royal Statistical Society Series B , volume=
[7]

Journal of American Statistical Association , volume =

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence (with discussion) , author =. Journal of American Statistical Association , volume =
[8]

Journal of Bussiness & Economic Statistics, to appear , year=

Inference in Approximately Sparse Correlated Random Effects Probit Models , author=. Journal of Bussiness & Economic Statistics, to appear , year=
[9]

arXiv preprint arXiv:1703.00469 , year=

Confidence Bands for Coefficients in High Dimensional Linear Models with Error-in-Variables , author=. arXiv preprint arXiv:1703.00469 , year=

work page arXiv
[10]

Econometrics Journal , volume=

Double/Debiased Machine Learning for Treatment and Structural Parameters , author=. Econometrics Journal , volume=
[11]

and Newey, W

Chernozhukov, V. and Newey, W. and Robins, J. , journal=. Double/De-Biased Machine Learning Using Regularized
[12]

Cemmap working paper CWP40/18, The Institute for Fiscal Studies, Department of Economics, UCL , year=

Simultaneous Inference for Best Linear Predictor of the Conditional Average Treatment Effect and Other Structural Functions , author=. Cemmap working paper CWP40/18, The Institute for Fiscal Studies, Department of Economics, UCL , year=
[13]

Journal of the Royal Statistical Society Series B , volume=

Goodness-of-Fit Tests for High Dimensional Linear Models , author=. Journal of the Royal Statistical Society Series B , volume=
[14]

arXiv preprint arXiv:1804.03274 , year=

Efficient Predictor Ranking and False Discovery Proportion Control in High-Dimensional Regression , author=. arXiv preprint arXiv:1804.03274 , year=

work page arXiv
[15]

Journal of the American Statistical Association , volume=

Simultaneous Inference for High-Dimensional Linear Models , author=. Journal of the American Statistical Association , volume=
[16]

arXiv preprint arXiv:1708.05499 , year=

Inference for High-Dimensional Instrumental Variables Regression , author=. arXiv preprint arXiv:1708.05499 , year=

work page arXiv
[17]

arXiv preprint arXiv:1807.10100 , year=

Two-Step Estimation and Inference with Possibly Many Included Covariates , author=. arXiv preprint arXiv:1807.10100 , year=

work page arXiv
[18]

Journal of the American Statistical Association, to appear , year=

Linear Hypothesis Testing in Dense High-Dimensional Linear Models , author=. Journal of the American Statistical Association, to appear , year=
[19]

Journal of the Royal Statistical Society Series B , volume=

Confidence Intervals for Causal Effects with Invalid Instruments by Using Two-Stage Hard Thresholding with Voting , author=. Journal of the Royal Statistical Society Series B , volume=
[20]

The Annals of Statistics , volume=

Accuracy Assessment for High-Dimensional Linear Regression , author=. The Annals of Statistics , volume=
[21]

Journal of the American Statistical Association, to appear , year=

Inference in Linear Regression Models with Many Covariates and Heteroscedasticity , author=. Journal of the American Statistical Association, to appear , year=
[22]

IEEE Transactions on Signal Processing , volume=

Asymptotic Confidence Regions for High-Dimensional Structured Sparsity , author=. IEEE Transactions on Signal Processing , volume=
[23]

Algorithmic Learning in a RandomWorld , author=
[24]

Journal of the American Statistical Association, to appear , year=

Distribution-Free Predictive Inference for Regression , author=. Journal of the American Statistical Association, to appear , year=
[25]

Journal of the American Statistical Association, to appear , year=

Sparsity Oriented Importance Learning for High-Dimensional Linear Regression , author=. Journal of the American Statistical Association, to appear , year=
[26]

Annals of Statistics , volume=

Selective Inference with a Randomized Response , author=. Annals of Statistics , volume=
[27]

Econometric Theory , volume=

Uniform Inference in High-Dimensional Dynamic Panel Data Models with Approximately Sparse Fixed Effects , author=. Econometric Theory , volume=
[28]

SSRN: https://ssrn.com/abstract=2665374 , year=

High-Dimensional Panel Data with Time Heterogeneity: Estimation and Inference , author=. SSRN: https://ssrn.com/abstract=2665374 , year=
[29]

Journal of Econometrics , volume=

Inferences in Panel Data with Interactive Effects Using Large Covariance Matrices , author=. Journal of Econometrics , volume=
[30]

arXiv preprint arXiv:1806.05081 , year=

LASSO-Driven Inference in Time and Space , author=. arXiv preprint arXiv:1806.05081 , year=

work page arXiv
[31]

Journal of Econometrics , volume=

Minimum Distance Approach to Inference with Many Instruments , author=. Journal of Econometrics , volume=
[32]

and Chernozhukov, V

Belloni, A. and Chernozhukov, V. and Chetverikov, D. and Hansen, C. and Kato, K. , journal=. High-Dimensional Econometrics and Regularized
[33]

Machine Learning , volume=

Random forests , author=. Machine Learning , volume=
[34]

, title =

Billingsley, P. , title =. 1995 , publisher =

1995
[35]

arXiv preprint arXiv:1705.03604 , year=

Nonuniformity of p-values can occur early in diverging dimensions , author=. arXiv preprint arXiv:1705.03604 , year=

work page arXiv
[36]

Annals of Statistics , volume=

On asymptotically optimal confidence regions and tests for high-dimensional models , author=. Annals of Statistics , volume=
[37]

Annals of Statistics , volume=

Valid post-selection inference , author=. Annals of Statistics , volume=
[38]

Journal of Machine Learning Research , volume=

Confidence intervals and hypothesis testing for high-dimensional regression , author=. Journal of Machine Learning Research , volume=
[39]

Annals of Statistics , volume=

Exact post-selection inference, with application to the lasso , author=. Annals of Statistics , volume=
[40]

Optimal inference after model selection.arXiv preprint arXiv:1410.2597, 2014

Optimal inference after model selection , author=. arXiv preprint arXiv:1410.2597 , year=

work page arXiv
[41]

Journal of the Royal Statistical Society Series B , volume=

Confidence intervals for low dimensional parameters in high dimensional linear models , author=. Journal of the Royal Statistical Society Series B , volume=
[42]

Econometric Theory , volume=

Model selection and inference: Facts and fiction , author=. Econometric Theory , volume=
[43]

Annals of Statistics , volume=

Can one estimate the conditional distribution of post-model-selection estimators? , author=. Annals of Statistics , volume=
[44]

Leeb, H. and P. Sparse estimators and the oracle property, or the return of Hodges. Journal of Econometrics , volume=
[45]

On the distribution of penalized maximum likelihood estimators: The

P. On the distribution of penalized maximum likelihood estimators: The. Journal of Multivariate Analysis , volume=
[46]

and Delyon, B

Bercu, B. and Delyon, B. and Rio, E. , title =. 2015 , publisher =

2015
[47]

Horn, R. A. and Johnson, C. R. , title =. 2012 , publisher =

2012
[48]

, booktitle =

Vershynin, R. , booktitle =
[49]

Vizcarra, A. B. and Viens, F. G. , booktitle =. Some Applications of the
[50]

2010 , publisher=

Probability: Theory and Examples (4th ed.) , author=. 2010 , publisher=

2010
[51]

2017 , publisher=

High Dimensional Statistics , author=. 2017 , publisher=

2017
[52]

Econometrica , volume=

Eigenvalue ratio test for the number of factors , author=. Econometrica , volume=
[53]

and Demirkaya, E

Fan, Y. and Demirkaya, E. and Li, G. and Lv, J. , journal=
[54]

Econometrica, to appear , year=

A One Covariate at a Time, Multiple Testing Approach to Variable Selection in High-Dimensional Linear Regression Models , author=. Econometrica, to appear , year=
[55]

Annals of Statistics , volume=

Controlling the false discovery rate via knockoffs , author=. Annals of Statistics , volume=
[56]

arXiv preprint arXiv:1602.03574 , year=

A knockoff filter for high-dimensional selective inference , author=. arXiv preprint arXiv:1602.03574 , year=

work page arXiv
[57]

2004 , journal=

Consistency for a simple model of random forests , author=. 2004 , journal=

2004
[58]

Journal of the Royal Statistical Society Series B , volume=

Controlling the false discovery rate: a practical and powerful approach to multiple testing , author=. Journal of the Royal Statistical Society Series B , volume=
[59]

Annals of Statistics , volume=

The control of the false discovery rate in multiple testing under dependency , author=. Annals of Statistics , volume=
[60]

Journal of the Royal Statistical Society Series B , volume=

Discovering the false discovery rate , author=. Journal of the Royal Statistical Society Series B , volume=
[61]

Studi in Onore del Professore Salvatore Ortu Carboni , pages=

Il calcolo delle assicurazioni su gruppi di teste , author=. Studi in Onore del Professore Salvatore Ortu Carboni , pages=
[62]

Scandinavian Journal of Statistics , volume=

A simple sequentially rejective multiple test procedure , author=. Scandinavian Journal of Statistics , volume=
[63]

Journal of the American Statistical Association , volume=

Exact and approximate stepdown methods for multiple hypothesis testing , author=. Journal of the American Statistical Association , volume=
[64]

Journal of the Royal Statistical Society Series B , volume=

Panning for gold: `model-X' knockoffs for high dimensional controlled variable selection , author=. Journal of the Royal Statistical Society Series B , volume=
[65]

arXiv preprint arXiv:1801.03896 , year=

Robust inference with knockoffs , author=. arXiv preprint arXiv:1801.03896 , year=

work page arXiv
[66]

Journal of the Royal Statistical Society Series B , pages=

Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society Series B , pages=
[67]

Journal of Business & Economic Statistics , volume=

Generalized shrinkage methods for forecasting using many predictors , author=. Journal of Business & Economic Statistics , volume=
[68]

Journal of Business & Economic Statistics , volume=

Macroeconomic forecasting using diffusion indexes , author=. Journal of Business & Economic Statistics , volume=
[69]

Econometrica , volume=

Determining the number of factors in approximate factor models , author=. Econometrica , volume=
[70]

McCracken and Serena Ng , title =

Michael W. McCracken and Serena Ng , title =. Journal of Business & Economic Statistics , volume =
[71]

, title =

Bai, J. , title =. Econometrica , year =
[72]

Journal of Business & Economic Statistics , volume=

Comparing predictive accuracy , author=. Journal of Business & Economic Statistics , volume=
[73]

Negahban, S. N. and Ravikumar, P. and Wainwright, M. J. and Yu, B. , journal=. A Unified Framework for High-Dimensional Analysis of
[74]

Econometrics Journal , volume=

High-dimensional macroeconomic forecasting and variable selection via penalized regression , author=. Econometrics Journal , volume=
[75]

Review of Economics and Statistics , volume=

Determining the number of factors from empirical distribution of eigenvalues , author=. Review of Economics and Statistics , volume=
[76]

and Fan, Y

Uematsu,Y. and Fan, Y. and Chen, K. and Lv, J. and Lin, W. , journal=
[77]

and Lv, J

Fan, Y. and Lv, J. and Sharifvaghefi, M. and Uematsu,Y. , journal=
[78]

Econometrica , volume=

Arbitrage, Factor Structure and Mean-Variance Analysis in Large Asset Markets , author=. Econometrica , volume=
[79]

Journal of Financial Economics , volume=

Performance Measurement With the Arbitrage Pricing Theory: A New Framework for Analysis , author=. Journal of Financial Economics , volume=
[80]

Annals of Statistics , volume=

Adaptive Robust Variable Selection , author=. Annals of Statistics , volume=

Showing first 80 references.