arxiv: 2605.02326 · v2 · submitted 2026-05-04 · 📊 stat.AP · q-fin.PM

Recognition: no theorem link

Large-Scale Asset Selection via Metric Dependence with Enriched High Frequency Information

Yangzhou Chen , Shuaida He , Xin Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:58 UTC · model grok-4.3

classification 📊 stat.AP q-fin.PM

keywords asset selectionhigh frequency datametric dependenceFréchet variationportfolio optimizationsure screeningpoint-curve objectsultrahigh dimensionality

0 comments

The pith

Metric dependence screening that incorporates intraday risk curves as point-curve objects improves asset selection for large portfolios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops Metric Dependence Screening to handle the challenge of selecting assets from thousands of candidates when building portfolios, where estimation errors can ruin performance. It models each asset's daily observation as a point-curve object that pairs the closing return with the entire intraday risk curve, connected by a weighted product metric that respects both pieces of information. Assets are then ranked by how strongly a chosen risk-adjusted target depends on the dispersion of these objects, measured through Fréchet variation. The resulting smaller set of assets feeds into ordinary mean-variance optimization. Readers should care because most existing selection methods throw away the detailed intraday patterns that could better control risk.

Core claim

The central discovery is that representing asset days as point-curve objects under a weighted product metric and screening them with a Fréchet variation dependence score allows reliable reduction of ultrahigh-dimensional asset universes while retaining relevant reward and risk dynamics, with theoretical guarantees of concentration and sure selection under alpha-mixing conditions.

What carries the argument

The point-curve object equipped with a weighted product metric, which combines scalar daily returns and functional intraday risk curves to enable Fréchet variation based dependence ranking in Metric Dependence Screening.

If this is right

MDS reduces the investable universe before applying mean-variance or minimum variance allocation.
Concentration, sure selection, and rank consistency are established for the target slicing estimator under alpha-mixing time series and ultrahigh dimensionality.
Simulations demonstrate effective performance across Euclidean and non-Euclidean settings.
Application to high frequency data of 2938 Chinese A-share stocks from 2023 to 2025 shows improved out-of-sample portfolio performance compared to return-based and scalar dependence benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This suggests that object-valued data representations could replace scalar summaries in other areas of high-frequency finance such as volatility modeling or liquidity analysis.
If the weighted metric generalizes across markets, the approach might help in global asset allocation where intraday patterns differ by region.
The two-stage procedure highlights a practical way to balance computational feasibility with information retention in large-scale optimization problems.

Load-bearing premise

The chosen weighted product metric on point-curve objects adequately captures the intraday risk dynamics that matter for risk-adjusted portfolio allocation, and the time series meet the alpha-mixing assumption in ultrahigh dimensions.

What would settle it

If out-of-sample tests on the 2938 Chinese stocks or similar large datasets show that MDS portfolios do not improve upon those from scalar return or low-dimensional high-frequency summary selections, the advantage of preserving full intraday curves would be called into question.

Figures

Figures reproduced from arXiv: 2605.02326 by Shuaida He, Xin Chen, Yangzhou Chen.

**Figure 1.** Figure 1: Daily returns of the target series {Yt} and the Shanghai Composite Index over the sample period. the screening step. The objects defined above form the basis of our screening method. In the next section, we introduce the dependence based relevance score for metric space valued objects and the resulting screening rule for ranking and selecting assets. 2.2. Metric dependence screening We construct the first … view at source ↗

**Figure 2.** Figure 2: Accumulated net value paths for d = 60 under four weighting schemes (EW, SEVW, GOC, and GOCS) in 2024–2025 view at source ↗

**Figure 2.** Figure 2: Accumulated net value paths for d = 30 under four weighting schemes (EW, SEVW, GOC, and GOCS) in 2024–2025 [PITH_FULL_IMAGE:figures/full_fig_p031_2.png] view at source ↗

read the original abstract

Large-scale portfolio choice is highly sensitive to estimation error, making the preliminary asset selection essential in empirical implementation. Existing selection rules typically rely on scalar returns or low dimensional high frequency summaries, and thus discard intraday risk dynamics that may be relevant for risk adjusted allocation. We propose Metric Dependence Screening (MDS), an asset selection procedure that incorporates high frequency information as object valued data. Each asset day observation is represented as a point-curve object combining daily return with an intraday risk state curve, equipped with a weighted product metric that preserves both reward information and within day risk dynamics. MDS ranks assets by a Fr\'echet variation based dependence score, measuring how much a risk adjusted target explains the metric dispersion of the asset representations. This yields a simple two stage portfolio procedure: MDS first reduces the investable universe, and standard mean-variance or minimum variance allocation is then applied. We develop a target slicing estimator and establish concentration, sure selection, and rank consistency guarantees under $\alpha$-mixing time series dependence and ultrahigh dimensionality. Simulations show that MDS performs well across both Euclidean and non-Euclidean settings. Using high frequency data for $2938$ Chinese A-share stocks from July 2023 to December 2025, we demonstrate that MDS improves out of sample portfolio performance over return based and scalar dependence based benchmarks, highlighting the value of preserving intraday risk dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MDS gives a concrete way to screen assets by treating daily returns plus intraday curves as metric objects, with decent empirical gains on Chinese stocks, but the α-mixing guarantees look shaky for real high-frequency series.

read the letter

The paper's main contribution is Metric Dependence Screening, which turns each asset-day into a point-curve object, equips it with a weighted product metric, and ranks assets by a Fréchet variation dependence score that measures how much a target explains the dispersion. That construction is new relative to the scalar-return or low-dimensional summaries it contrasts against, and the two-stage procedure (screen then mean-variance) is straightforward to implement.

Referee Report

2 major / 2 minor

Summary. The paper proposes Metric Dependence Screening (MDS), an asset selection procedure that represents each asset-day observation as a point-curve object (daily return combined with an intraday risk-state curve) equipped with a weighted product metric. Assets are ranked by a Fréchet variation-based dependence score relative to a risk-adjusted target; a target slicing estimator is introduced, and concentration, sure selection, and rank consistency guarantees are derived under α-mixing dependence and ultrahigh dimensionality. Simulations and an empirical study on 2938 Chinese A-share stocks (July 2023–December 2025) are used to show improved out-of-sample portfolio performance relative to return-based and scalar-dependence benchmarks.

Significance. If the theoretical guarantees apply and the mixing conditions hold for the object-valued process, the method would provide a principled way to incorporate rich intraday risk dynamics into large-scale asset selection, potentially improving risk-adjusted portfolio construction without discarding high-frequency information. The combination of object-valued data, Fréchet variation scoring, and explicit consistency results under mixing is a notable contribution if the assumptions are realistic.

major comments (2)

[theoretical guarantees section (concentration and consistency theorems)] The concentration, sure selection, and rank consistency results for the target slicing estimator are derived under α-mixing time series dependence (stated in the abstract and the theoretical development). High-frequency intraday risk curves typically exhibit volatility clustering and persistence akin to GARCH processes, which produce mixing coefficients that decay too slowly for the exponential inequalities to remain informative when p ≫ n. This is load-bearing for the central claim of reliable screening in the ultrahigh-dimensional regime; a concrete test (e.g., empirical estimation of mixing rates on the risk curves or additional simulations under persistent dependence) is needed to assess applicability.
[methodology section defining the weighted product metric] The weighted product metric on the point-curve objects includes a free weight parameter that balances the scalar return component against the curve component. The dependence score and subsequent screening results depend on this choice, yet no selection rule, cross-validation procedure, or sensitivity analysis is provided. This affects the reproducibility of the empirical ranking and the interpretation of the reported performance gains.

minor comments (2)

[empirical study section] The data period extends to December 2025; clarify the exact sample end date, any forward-looking elements, and the precise rules for stock inclusion/exclusion to allow replication.
[preliminaries and estimator definition] Notation for the Fréchet variation and the target slicing estimator should be introduced with explicit definitions before the theoretical results to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: The concentration, sure selection, and rank consistency results for the target slicing estimator are derived under α-mixing time series dependence (stated in the abstract and the theoretical development). High-frequency intraday risk curves typically exhibit volatility clustering and persistence akin to GARCH processes, which produce mixing coefficients that decay too slowly for the exponential inequalities to remain informative when p ≫ n. This is load-bearing for the central claim of reliable screening in the ultrahigh-dimensional regime; a concrete test (e.g., empirical estimation of mixing rates on the risk curves or additional simulations under persistent dependence) is needed to assess applicability.

Authors: We appreciate the referee's concern regarding the applicability of the α-mixing assumption to high-frequency financial data. Our concentration and consistency theorems are derived under this standard assumption for dependent time series, which enables the use of exponential inequalities in the ultrahigh-dimensional regime. We acknowledge that volatility clustering may lead to slower mixing rates in practice. While direct empirical estimation of mixing coefficients is technically challenging and often unreliable for object-valued processes, we will add new simulation experiments in the revised manuscript that generate data under persistent dependence structures (e.g., GARCH-type models with slow-decaying autocorrelations) to evaluate the finite-sample performance of MDS when the theoretical mixing conditions are only approximately satisfied. revision: yes
Referee: The weighted product metric on the point-curve objects includes a free weight parameter that balances the scalar return component against the curve component. The dependence score and subsequent screening results depend on this choice, yet no selection rule, cross-validation procedure, or sensitivity analysis is provided. This affects the reproducibility of the empirical ranking and the interpretation of the reported performance gains.

Authors: We agree that the balancing weight in the weighted product metric is a key tuning parameter whose choice affects both the dependence scores and the downstream portfolio results. In the current manuscript the weight was chosen via a preliminary grid search, but the procedure was not formalized. In the revision we will add a data-driven selection rule based on cross-validation: the weight will be chosen to maximize the out-of-sample Sharpe ratio of the resulting MDS-selected portfolio on a rolling validation window. We will also include a sensitivity analysis that reports portfolio performance across a grid of weights, thereby improving reproducibility and clarifying the robustness of the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity: MDS procedure, target slicing estimator, and guarantees are derived independently from stated assumptions.

full rationale

The paper defines a new weighted product metric on point-curve objects, introduces the Fréchet variation dependence score, proposes the target slicing estimator, and proves concentration/sure selection/rank consistency results under α-mixing and ultrahigh-dimensional regimes. None of these steps reduce by construction to fitted parameters, self-citations, or renamed inputs; the theoretical claims rest on standard mixing inequalities applied to the defined objects rather than tautological reparameterization. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on a new data representation and dependence measure whose theoretical support is conditioned on time-series mixing; a tunable weight in the metric is implicit but not quantified.

free parameters (1)

weight parameter in the product metric
Balances daily return point against intraday risk curve; must be chosen or tuned for the dependence score to be well-defined.

axioms (1)

domain assumption Data satisfy α-mixing time series dependence
Invoked to obtain concentration, sure selection, and rank consistency results under ultrahigh dimensionality.

pith-pipeline@v0.9.0 · 5550 in / 1428 out tokens · 70416 ms · 2026-05-12T01:58:31.572354+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

Fréchet Regression with Mondrian Forests: Finite-Sample Guarantees and Ensemble Benefits , year=

Qiu, Rui and Yao, Fang and Yu, Zhou , journal=. Fréchet Regression with Mondrian Forests: Finite-Sample Guarantees and Ensemble Benefits , year=

work page
[2]

Leslie Lamport , title =

work page
[3]

Journal of Econometrics , volume=

Out of sample forecasts of quadratic variation , author=. Journal of Econometrics , volume=. 2008 , publisher=

work page 2008
[4]

Journal of empirical finance , volume=

Intraday periodicity and volatility persistence in financial markets , author=. Journal of empirical finance , volume=. 1997 , publisher=

work page 1997
[5]

Journal of financial economics , volume=

The distribution of realized stock return volatility , author=. Journal of financial economics , volume=. 2001 , publisher=

work page 2001
[6]

Econometrica , volume=

Modeling and forecasting realized volatility , author=. Econometrica , volume=. 2003 , publisher=

work page 2003
[7]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Econometric analysis of realized volatility and its use in estimating stochastic volatility models , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2002 , publisher=

work page 2002
[8]

Journal of financial econometrics , volume=

Power and bipower variation with stochastic volatility and jumps , author=. Journal of financial econometrics , volume=. 2004 , publisher=

work page 2004
[9]

The review of financial studies , volume=

On the sensitivity of mean-variance-efficient portfolios to changes in asset means: some analytical and computational results , author=. The review of financial studies , volume=. 1991 , publisher=

work page 1991
[10]

The Journal of Finance , volume=

The sampling error in estimates of mean-variance efficient portfolio weights , author=. The Journal of Finance , volume=. 1999 , publisher=

work page 1999
[11]

Single index Fr

Bhattacharjee, Satarupa and M. Single index Fr. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023
[12]

Journal of econometrics , volume=

Generalized autoregressive conditional heteroskedasticity , author=. Journal of econometrics , volume=. 1986 , publisher=

work page 1986
[13]

Journal of empirical finance , volume=

Intraday periodicity, long memory volatility, and macroeconomic announcement effects in the US Treasury bond market , author=. Journal of empirical finance , volume=. 2000 , publisher=

work page 2000
[14]

2002 , publisher=

Strategic asset allocation: portfolio choice for long-term investors , author=. 2002 , publisher=

work page 2002
[15]

Journal of Nonparametric Statistics , volume=

Sure explained variability and independence screening , author=. Journal of Nonparametric Statistics , volume=. 2017 , publisher=

work page 2017
[16]

Journal of Portfolio Management , volume=

The effect of errors in means, variances, and covariances on optimal portfolio choice , author=. Journal of Portfolio Management , volume=. 1993 , publisher=

work page 1993
[17]

Quantitative finance , volume=

Empirical properties of asset returns: stylized facts and statistical issues , author=. Quantitative finance , volume=. 2001 , publisher=

work page 2001
[18]

The review of Financial studies , volume=

Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? , author=. The review of Financial studies , volume=. 2009 , publisher=

work page 2009
[19]

Econometrica: Journal of the econometric society , pages=

Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , author=. Econometrica: Journal of the econometric society , pages=. 1982 , publisher=

work page 1982
[20]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Sure independence screening for ultrahigh dimensional feature space , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=

work page 2008
[21]

Statistica Sinica , volume=

A selective overview of variable selection in high dimensional feature space , author=. Statistica Sinica , volume=

work page
[22]

The Annals of Statistics , volume=

Sure independence screening in generalized linear models with NP-dimensionality , author=. The Annals of Statistics , volume=. 2010 , publisher=

work page 2010
[23]

Journal of the American Statistical Association , volume=

Vast portfolio selection with gross-exposure constraints , author=. Journal of the American Statistical Association , volume=. 2012 , publisher=

work page 2012
[24]

Wiley StatsRef: Statistics Reference Online , year=

Sure independence screening , author=. Wiley StatsRef: Statistics Reference Online , year=

work page
[25]

The Journal of Machine Learning Research , volume=

Ultrahigh dimensional feature selection: beyond the linear model , author=. The Journal of Machine Learning Research , volume=. 2009 , publisher=

work page 2009
[26]

The Journal of Finance , volume=

The economic value of volatility timing , author=. The Journal of Finance , volume=. 2001 , publisher=

work page 2001
[27]

Journal of Financial and Quantitative Analysis , volume=

An empirical Bayes approach to efficient portfolio selection , author=. Journal of Financial and Quantitative Analysis , volume=. 1986 , publisher=

work page 1986
[28]

The Review of Financial Studies , volume=

Portfolio selection with parameter and model uncertainty: A multi-prior approach , author=. The Review of Financial Studies , volume=. 2007 , publisher=

work page 2007
[29]

The Journal of Finance , volume=

When will mean-variance efficient portfolios be well diversified? , author=. The Journal of Finance , volume=. 1992 , publisher=

work page 1992
[30]

International conference on algorithmic learning theory , pages=

Measuring statistical dependence with Hilbert-Schmidt norms , author=. International conference on algorithmic learning theory , pages=. 2005 , organization=

work page 2005
[31]

Advances in neural information processing systems , volume=

A kernel statistical test of independence , author=. Advances in neural information processing systems , volume=

work page
[32]

Journal of Computational and Graphical Statistics , volume=

Using generalized correlation to effect variable selection in very high dimensional problems , author=. Journal of Computational and Graphical Statistics , volume=. 2009 , publisher=

work page 2009
[33]

Annals of Statistics , volume=

Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , author=. Annals of Statistics , volume=

work page
[34]

Shuaida He and Yangzhou Chen and Xin Chen , year=. The Fr

work page
[35]

The journal of finance , volume=

Risk reduction in large portfolios: Why imposing the wrong constraints helps , author=. The journal of finance , volume=. 2003 , publisher=

work page 2003
[36]

arXiv preprint arXiv:1910.13358 , year=

On distance covariance in metric and Hilbert spaces , author=. arXiv preprint arXiv:1910.13358 , year=

work page arXiv 1910
[37]

Journal of the American Statistical Association , volume=

Estimation for Markowitz efficient portfolios , author=. Journal of the American Statistical Association , volume=. 1980 , publisher=

work page 1980
[38]

Journal of Financial and Quantitative analysis , volume=

Bayes-Stein estimation for portfolio analysis , author=. Journal of Financial and Quantitative analysis , volume=. 1986 , publisher=

work page 1986
[39]

Journal of Financial and Quantitative Analysis , volume=

Optimal portfolio choice with parameter uncertainty , author=. Journal of Financial and Quantitative Analysis , volume=. 2007 , publisher=

work page 2007
[40]

The Journal of Portfolio Management , volume=

Honey, I Shrunk the Sample Covariance Matrix , author=. The Journal of Portfolio Management , volume=. 2004 , publisher=

work page 2004
[41]

Journal of empirical finance , volume=

Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , author=. Journal of empirical finance , volume=. 2003 , publisher=

work page 2003
[42]

SIAM Journal on Matrix Analysis and Applications , volume=

Riemannian geometry of symmetric positive definite matrices via Cholesky decomposition , author=. SIAM Journal on Matrix Analysis and Applications , volume=. 2019 , publisher=

work page 2019
[43]

Journal of the American Statistical Association , volume=

Feature screening via distance correlation learning , author=. Journal of the American Statistical Association , volume=. 2012 , publisher=

work page 2012
[44]

The Annals of Probability , pages=

Distance covariance in metric spaces , author=. The Annals of Probability , pages=. 2013 , publisher=

work page 2013
[45]

Biometrika , volume=

The Kolmogorov filter for variable screening in high-dimensional binary classification , author=. Biometrika , volume=. 2013 , publisher=

work page 2013
[46]

The Journal of Finance , volume =

Markowitz, Harry , title =. The Journal of Finance , volume =

work page
[47]

2008 , publisher=

Portfolio selection: efficient diversification of investments , author=. 2008 , publisher=

work page 2008
[48]

Journal of Econometrics , volume=

Measuring volatility with the realized range , author=. Journal of Econometrics , volume=. 2007 , publisher=

work page 2007
[49]

Financial analysts journal , volume=

The Markowitz optimization enigma: Is ‘optimized’optimal? , author=. Financial analysts journal , volume=. 1989 , publisher=

work page 1989
[50]

Journal of the American Statistical Association , year=

Ball covariance: A generic measure of dependence in banach space , author=. Journal of the American Statistical Association , year=

work page
[51]

Petersen, Alexander and M. Fr. The Annals of Statistics , volume=. 2019 , publisher=

work page 2019
[52]

Statistical applications in genetics and molecular biology , volume=

A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics , author=. Statistical applications in genetics and molecular biology , volume=. 2005 , publisher=

work page 2005
[53]

The journal of finance , volume=

Capital asset prices: A theory of market equilibrium under conditions of risk , author=. The journal of finance , volume=. 1964 , publisher=

work page 1964
[54]

The Journal of business , volume=

Mutual fund performance , author=. The Journal of business , volume=. 1966 , publisher=

work page 1966
[55]

Annals of Statistics , volume=

Measuring and testing dependence by correlation of distances , author=. Annals of Statistics , volume=. 2007 , publisher=

work page 2007
[56]

The Annals of Applied Statistics , volume=

Brownian distance covariance , author=. The Annals of Applied Statistics , volume=. 2009 , publisher=

work page 2009
[57]

Journal of the American Statistical Association , volume=

Forward regression for ultra-high dimensional variable screening , author=. Journal of the American Statistical Association , volume=. 2009 , publisher=

work page 2009
[58]

Journal of Econometrics , volume=

Asset selection based on high frequency Sharpe ratio , author=. Journal of Econometrics , volume=. 2022 , publisher=

work page 2022
[59]

Ying, Chao and Yu, Zhou , journal=. Fr. 2022 , publisher=

work page 2022
[60]

Journal of the American Statistical Association , volume=

A tale of two time scales: Determining integrated volatility with noisy high-frequency data , author=. Journal of the American Statistical Association , volume=. 2005 , publisher=

work page 2005
[61]

Dimension reduction for Fr

Zhang, Qi and Xue, Lingzhou and Li, Bing , journal=. Dimension reduction for Fr. 2024 , publisher=

work page 2024
[62]

Journal of the American Statistical Association , volume=

Model-free feature screening for ultrahigh-dimensional data , author=. Journal of the American Statistical Association , volume=. 2011 , publisher=

work page 2011