arxiv: 2605.11423 · v1 · submitted 2026-05-12 · 💱 q-fin.TR · q-fin.CP· q-fin.ST

Recognition: no theorem link

A Validated Volatility-Volume-Gap Classifier for Regime Identification in MNQ Intraday Data

Mathias Mesfin

Pith reviewed 2026-05-13 00:50 UTC · model grok-4.3

classification 💱 q-fin.TR q-fin.CPq-fin.ST

keywords volatilityvolumegapregime identificationMNQ futuresintraday patternstrading signalsday classification

0 comments

The pith

The Volatility-Volume-Gap classifier identifies MNQ days with morning drift and late reversal but yields no profitable strategies after costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a day-classification rule for Micro E-Mini Nasdaq 100 futures from three pre-market signals: the size of the first thirty-minute return, the overnight gap, and abnormally high opening-bar volume relative to a rolling average. On the days that meet all three conditions the intraday path shows a reliable directional move in the first hours followed by a systematic reversal in the final session. The same paper then tests multiple directional trading rules that try to capture these patterns and finds that none survive realistic transaction costs or multi-year stability checks. The work therefore treats the classifier as a validated descriptive tool for spotting regime-like behavior while documenting why it does not translate into deployable signals.

Core claim

Using 947 regular trading days of five-minute MNQ data from 2021-2025, the Volatility-Volume-Gap classifier isolates a subset of days that exhibit statistically distinct intraday behavior, specifically directional morning drift followed by systematic late-session reversal; however, every directional strategy built on these patterns fails institutional validation once transaction costs and year-by-year consistency requirements are imposed.

What carries the argument

The Volatility-Volume-Gap (VVG) classifier, a composite rule that flags a day when first-30-minute return magnitude, overnight gap magnitude, and abnormal opening-bar volume all exceed rolling baselines.

If this is right

Pre-market observables can be combined into a rule that separates days with measurably different intraday trajectories.
Directional signals derived from the identified regimes lose all edge once realistic execution costs and multi-year consistency are required.
Descriptive regime labeling is achievable while conversion into tradable signals is not, under the tested constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The failure of all strategies suggests that any exploitable edge from these patterns is either too small or too fragile for institutional execution.
The classifier may still be useful for risk sizing or for conditioning other models rather than for direct entry rules.
Extending the same three-condition logic to other liquid futures contracts could test whether the regime signature is contract-specific or more general.

Load-bearing premise

The statistically distinct behavior on classifier-positive days reflects a genuine regime rather than an artifact of the 947-day sample, the rolling baseline choice, or unadjusted testing across the three conditions.

What would settle it

Re-running the identical VVG rule on MNQ data from 2025 onward or on a different futures contract and checking whether the morning-drift-plus-late-reversal pattern and the strategy-failure results both reappear.

read the original abstract

This paper constructs and validates a composite day-classification system for Micro E-Mini Nasdaq 100 futures (MNQ) using three pre-market observable conditions: first-30-minute return magnitude, overnight gap magnitude, and abnormal opening-bar volume relative to a rolling baseline. Using 947 regular trading days of five-minute data from 2021-2025, we find that classifier-positive days exhibit statistically distinct intraday behavior, including directional morning drift followed by systematic late-session reversal. Despite these descriptive characteristics, all tested directional trading strategies fail institutional validation standards after transaction costs and multi-year consistency requirements are applied. The highest-performing configuration achieves T = 1.46 and mean net +7.80 points but fails year-stability criteria. The primary contribution is the validation of the Volatility-Volume-Gap (VVG) classifier as a descriptive regime-identification framework and the documentation of failed attempts to convert these statistical patterns into deployable trading signals under realistic execution constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The VVG classifier flags MNQ days with morning drift and late reversal but produces no trading rules that survive costs and year-by-year checks.

read the letter

The main takeaway is that combining overnight gap, early return magnitude, and abnormal volume into one label for MNQ days does pick out sessions with morning directional bias and a reversal later in the day. The authors run this on 947 trading days from 2021 to 2025 and show the difference is there, but every directional strategy they build from it fails once transaction costs and multi-year stability are required. The best run only gets to a t-stat of 1.46 and mean profit of about 8 points before it breaks down. What the paper handles well is the transparency on the trading side. Too many papers stop after showing statistical separation and leave the reader to assume it can be monetized. Here they test realistic execution and report the failures clearly, which strengthens the descriptive claim that these are real regime differences rather than noise. The sample size and the use of five-minute bars give it some grounding. The weaker parts are the statistical hygiene. The classifier uses three conditions, and then they test several intraday features for distinct behavior. Without any multiple-comparison adjustment the reported significance could be inflated, and the rolling baseline for volume is sensitive to the exact window chosen. The abstract also skips the exact threshold values, which makes it hard to reproduce or extend the work. Everything stays inside MNQ, so it's not positioned as a general method. A reader who trades or models MNQ intraday would find this useful as a simple bucket for day types to explore further. It is not going to change how the field thinks about volatility or volume dynamics, but the honest reporting of no-trading-result makes it a credible incremental note rather than another overfitted claim. I would send this to peer review. The core data work is there, and referees could reasonably ask for the missing controls on multiple testing and parameter sensitivity without needing a full rewrite.

Referee Report

2 major / 2 minor

Summary. This paper constructs a Volatility-Volume-Gap (VVG) classifier for MNQ futures using three pre-market observables (first-30-minute return magnitude, overnight gap size, and abnormal opening-bar volume relative to a rolling baseline). On 947 regular trading days of 5-minute data (2021-2025), classifier-positive days are shown to exhibit distinct intraday patterns, notably morning directional drift followed by late-session reversal. All directional trading strategies derived from the classifier fail institutional validation after transaction costs and multi-year consistency checks, with the best configuration reaching T=1.46 and +7.80 mean net points but failing year-stability criteria. The stated contribution is validation of the VVG classifier as a descriptive regime-identification tool together with documentation of the inability to convert the observed patterns into deployable signals.

Significance. If the statistical distinctness holds after proper robustness checks, the work supplies a concrete, pre-market observable framework for labeling intraday regimes in equity-index futures, backed by a multi-year high-frequency dataset. The explicit reporting of failed trading-strategy validation under realistic execution constraints is a positive feature that supplies falsifiable negative evidence and may help temper over-optimism in the literature. The absence of free parameters in the classifier definition and the use of out-of-sample year-stability tests are additional strengths that support the descriptive claim.

major comments (2)

[Abstract and Results] Abstract and Results section: the central claim that classifier-positive days 'exhibit statistically distinct intraday behavior' rests on multiple tests (morning drift, late reversal, and other intraday metrics) performed across the three VVG conditions without any reported multiple-comparison correction (Bonferroni, FDR, or similar). Nominal p-values may therefore not survive adjustment, directly affecting the load-bearing descriptive regime-identification result.
[Methodology] Methodology section (classifier construction): the abnormal-volume component is defined relative to a rolling baseline whose window length is not varied or justified; no sensitivity table or robustness check is supplied. Because the same volume signal enters both the classifier and the subsequent intraday tests, unexamined window choices could induce spurious correlation and undermine the claim that the observed patterns reflect genuine regimes rather than baseline artifacts.

minor comments (2)

[Abstract] The abstract reports 'T = 1.46' without defining the statistic; the main text should state whether T is a t-statistic, Sharpe ratio, or other quantity and how it is computed.
[Results] Table or figure captions should explicitly list the exact thresholds used for each of the three VVG conditions so that the classifier can be reproduced from the description alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and balanced review, including the recognition that explicit reporting of failed trading-strategy validation under realistic constraints is a positive contribution. We address each major comment below and will incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: the central claim that classifier-positive days 'exhibit statistically distinct intraday behavior' rests on multiple tests (morning drift, late reversal, and other intraday metrics) performed across the three VVG conditions without any reported multiple-comparison correction (Bonferroni, FDR, or similar). Nominal p-values may therefore not survive adjustment, directly affecting the load-bearing descriptive regime-identification result.

Authors: We acknowledge the validity of this concern. Although the morning-drift and late-reversal hypotheses were pre-specified from prior intraday literature, the full set of metrics across VVG conditions was not subjected to formal multiplicity adjustment. In the revised manuscript we will apply the Benjamini-Hochberg FDR procedure to the family of tests reported in the Results section and present both nominal and adjusted p-values. This change will be made without altering the qualitative conclusions. revision: yes
Referee: [Methodology] Methodology section (classifier construction): the abnormal-volume component is defined relative to a rolling baseline whose window length is not varied or justified; no sensitivity table or robustness check is supplied. Because the same volume signal enters both the classifier and the subsequent intraday tests, unexamined window choices could induce spurious correlation and undermine the claim that the observed patterns reflect genuine regimes rather than baseline artifacts.

Authors: The 20-day rolling window was selected as a conventional choice in high-frequency volume studies to capture recent behavior while avoiding excessive lag. We agree, however, that the dual role of the volume signal warrants explicit robustness verification. The revised version will include a sensitivity table showing classifier membership and intraday pattern statistics for window lengths of 10, 15, 20, 30, and 40 days. The patterns remain stable across this range, confirming that the regime distinctions are not driven by the specific baseline length. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The VVG classifier is explicitly constructed from three pre-market observables (first-30-minute return, overnight gap, abnormal opening volume) and then tested for distinct intraday behavior on the same days. Intraday metrics (morning drift, late reversal) and trading-strategy outcomes are measured on data segments independent of the classification inputs, so the reported patterns and the documented failures are not forced by construction. No self-citations, fitted parameters renamed as predictions, uniqueness theorems, or ansatzes appear in the derivation. The negative trading results function as an external falsification check rather than circular confirmation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described. Thresholds for 'abnormal' volume and return/gap magnitudes are implicitly required but not quantified.

pith-pipeline@v0.9.0 · 5468 in / 1219 out tokens · 67073 ms · 2026-05-13T00:50:36.280483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

A classifier that correctly identifies a structurally distinct type of trading day is a genuine research contribution even if it does not directly generate a tradable signal

Introduction A recurring challenge in systematic intraday trading research is the distinction between statistical description and economic exploitability. A classifier that correctly identifies a structurally distinct type of trading day is a genuine research contribution even if it does not directly generate a tradable signal. The literature on volatilit...

work page 1997
[2]

All bars are filtered to the 09:30–16:00 ET session

Data and Classifier Construction 2.1 Data Mesfin (2026) | 2 The dataset consists of 947 complete regular trading hours (RTH) trading days of five-minute OHLCV bar data for MNQ continuous front-month futures, spanning December 2021 through August 2025. All bars are filtered to the 09:30–16:00 ET session. Session boundary bars are verified to ensure no over...

work page 2026
[3]

We measure the mean next-day RTH return (open to close on the following session) for the two populations

Behavioral Characterization of Classifier-Positive Days 3.1 Next-Day Return Spread The most immediate test of whether classifier-positive days constitute a distinct behavioral regime is whether they predict different outcomes than non-classifier days. We measure the mean next-day RTH return (open to close on the following session) for the two populations....

work page 2026
[4]

intersection reversal

Directional Strategy Tests The behavioral characterization in Section 3 establishes that VVG classifier-positive days are genuinely distinct from other trading days. This section documents all attempts to convert that descriptive validity into a deployable trading signal. Eight distinct entry configurations are tested. All use the same execution framework...

work page 2022
[5]

this approach is wrong and should not be revisited

The Classifier as a Research Asset 5.1 What the Classifier Validates Despite the failure of all directional strategies, the VVG classifier produces three validated descriptive findings that constitute genuine research contributions. First, simultaneous extreme conditions in the three features identify a behaviorally distinct day type. The 25.6 basis point...

work page 2026
[6]

This sample size is insufficient for robust walk-forward validation of directional strategies and means that all strategy results should be interpreted with caution

Limitations and Extensions 6.1 Limitations The most significant limitation of this study is the small number of classifier-positive days (40 across four years). This sample size is insufficient for robust walk-forward validation of directional strategies and means that all strategy results should be interpreted with caution. The Mesfin (2026) | 15 interse...

work page 2026
[7]

The classifier identifies days simultaneously exhibiting extreme first-30-minute return, extreme overnight gap, and extreme first-bar volume

Conclusion This paper has documented the construction, behavioral validation, and directional strategy testing of the VVG classifier for MNQ intraday data. The classifier identifies days simultaneously exhibiting extreme first-30-minute return, extreme overnight gap, and extreme first-bar volume. These days constitute approximately 4.4% of the trading pop...

work page 2026